Skip to main content

Text and Data Mining (TDM)

Context

  • Different publishers have different requirements in text mining. You can see a list of Publishers which support text mining on the Publisher resources page.

  • Meta-data standards can still vary making it difficult to extract data from some sources.

  • Not all publishers have a clearly developed policy (or process) for text/data mining of their resources.

  • Copyright provisions may not adequately cover your intended use of the resource and it may not be easy to establish absolute access rights with the publisher.

Data storage

USB and memory card

Data and text mining often involves working with and storing large data sets. In order to perform text and data mining one of the requirements placed on researchers is that all research data, regardless of format, is stored securely. When considering storage not just security should be kept in mind, it is recommended you also consider:

  • Data formats

  • Expected volume of data

  • Privacy and/or security requirements

  • Access requirements

  • Period of time storage will be required

  • Backing up important data and original data sources

For more information see:

Ethical issues

keyhole symbol on computer screen

Access: An increasing number of publishers are allowing data and text mining of their licensed resources by members of subscribing institutions, and of available open access material. Generally this access is governed by existing usage terms and conditions and existing copyright provisions. Some publishers will require you to use tools they provide to mine their content, or will conduct the process for you. In this way they can manage the quantity of data being accessed and the impact on their servers. Downloading large amounts of data can trigger automatic lockouts and prevent access resources by other users. In some instances the publisher may apply a fee for the additional usage that sits outside of our existing agreement.

Privacy: When accessing research data made available by other organisations it is important that mining activities do not inadvertently disclose confidential information or breach the privacy of research subjects. Although the primary responsibility for the ethical collection, storage and access to research data sits with the research owner, it may be possible to filter data in ways that can reveal confidential or identifying details. This is why some data owners require researchers to make application to use their data or may license its use via a formal agreement or Creative Commons license. Researchers need to ensure that they abide by the terms of use of any data they access.

If you are unsure whether your intended use of text/data mining may constitute a breach of the university's licence agreement with a specific publisher, the Library can either advise you or contact the publisher on your behalf.

Copyright

  • Depending on how the process of mining is conducted e.g. whether the material is copied, reformatted or digitised without permission, it could be considered a copyright infringement. The ability to data mine relies heavily on technologies that are considered 'copy-reliant' where copies must be made of the data in order for it to be analysed. Currently the Copyright Act 1968 makes no specific exemption for text or data mining. 

  • Limited text mining might be covered by the fair dealing exceptions however if an entire dataset needed to be copied this would clearly exceed a'reasonable portion' of the work.

  • While copyright does not apply to raw data or factual information it does cover the arrangement of data within a database or the 'expression' of data eg presentation in a table.

For further information on how Australian copyright law may apply to your research please visit: