Subject and Research Guides: Managing Research Data @MQ: End of Project / Project Milestones

Introduction

Publication of data outputs (dissemination) throughout the project or end-of-project archiving of data to appropriate data repositories must be planned and documented and be based on the principle of

‘As open as possible, as closed as necessary’

(EU Horizon2020).

Data Dissemination

In the absence of justifiable reasons (such as respect for cultural sensitivity or unmanageable risks regarding sensitive or highly sensitive information), researchers should lodge datasets publicly within a data repository, without mediation, under an open license.

If datasets cannot be exposed without mediation, they should be lodged in a repository supporting mediated access with clear instructions describing access conditions (see here). Where valid reasons for mediating or restricting data access exist or for more restrictive licensing, they must be articulated explicitly.

Other digital research objects, such as code, presentation slide decks, descriptions of procedures, etc., can also be archived and linked to relevant datasets, increasing their value. Code, for example, can be developed in GitHub and then archived in Zenodo using automated connectors developed by the latter. OSF can be used as single ‘portal’ or ‘landing page’ that connects various project outputs (data, bibliography, code, reports, presentations, etc.). OSF can even include a project description, a wiki, or other information and serve as a website for a project or dataset, improving its findability and accessibility

Benefits of Data Dissemination

All research data has value beyond the original project. By publishing your data and making it available to others, you can improve the impact of your research significantly.

Positive outcomes of data publication include:

Improving the body of knowledge in your discipline - all research is built on earlier understanding; by contributing the growth of that understanding, your whole discipline or field of research will improve.
Reliability - your research is more reliable with increased reproducibility; publication of research data is a key factor in improving reproducibility.
Citations - making the data underlying your publication available has been proven to improve article citation counts.
Professional connections - When other researchers can access, understand and reuse your data and are able to contact you regarding your research, your professional networks will improve and expand.

Open Data Recommendations & Mandates

Funders are increasingly requiring data to be openly published. By making the data or metadata available, the research findings become far more findable and reusable by other researchers, maximising its value and impact.

The Australian Research Council encourages researchers to deposit data arising from research projects in publicly accessible Data Repositories.

Journals and funders are adopting data sharing guidelines as a requirement for publication. “The TOP Guidelines were created by journals, funders, and societies to align scientific ideals with practices”. If you want to be able to publish in this growing list of journals in the future, you will need to be prepared to share your data.

In order to address concerns about research reproducibility and integrity, some journals and major publishers also require or strongly recommend that the data supporting any articles published is made publicly available:

Data publication requirements can be sometimes waived if the data is sensitive or confidential, or if the data cannot be anonymised BUT researchers are expected to make their data ‘As open as possible, as closed as necessary’ (EU Horizon 2020).

This is the approach being adopted at Macquarie University. It was originally explored and advocated in an Open Research Data Pilot with the European Commission's Horizon 2020 programme.

Data Repositories

A Data Repository is a tool that allows you to share, preserve, and disseminate your research data. Domain-specific data repositories cater to particular research disciplines. Data should be submitted to a domain-specific repository where one is available. To locate a domain-specific repository for your discipline, please see the Registry of Research Data Repositories and/or consult a Data Steward.

If data is sensitive or highly sensitive, the repository must be able to meet the requirements above under ‘Data dissemination and access restrictions’.
Australian social science data should be submitted to the Australian Data Archive.
If your discipline routinely submits data to a domain-general repository like Dryad, you may do so.

If a domain-specific repository is unavailable data should be placed in the Macquarie University Research Data Repository with appropriate access restrictions.

Duplicate copies (e.g. backup copies) of data can be disposed of once the data is archived or a data output lodged into a domain-specific or institutional research data repository, provided doing so complies with relevant legal, regulatory requirements or research agreements pertaining to that data.

Data Sovereignty and Data Dissemination or Archiving

As is the case during earlier stages of the data lifecycle, any sensitive or highly sensitive data subject to mediated access that contains personal information must be stored in Australia unless explicit consent for ‘overseas disclosure’ has been obtained; such consent should only be sought in rare circumstances (consult the Digitally Enabled Research and/or Ethics teams for guidance).

Persistent Identifiers & Data Citation

One of the benefits of publishing your data on a Macquarie University research data infrastructure is that the data will be given a DOI (Digital Object Identifier). Persistent identifiers such as DOIs are used to identify a particular resource and avoid any ambiguity - you can be certain that if people are referring to an item with the same DOI, they are referring to the exact same resource. DOI links are easily created by adding the https://doi.org/ address before a DOI. If the digital location of the resource changes, the DOI can be updated to point to the new location.

Another identifier becoming more common are IGSNs (International Geo Sample Number). These provide unique identifiers for physical samples and are currently used in mineral sampling and processing. More information can be found about IGSNs.

Persistent Identifies make your data more easily cited.

Data citation refers to the practice of providing a reference to data in the same way as researchers routinely provide a bibliographic reference to other scholarly resources. While data has often been shared in the past, it was seldom cited in the same way as journal articles or other publications. This culture is, however, rapidly changing.

Many archives or repositories will give guidance on how to cite data. There is even an online data citation formatter provided by DataCite for any data that has a DOI.

Did you know?

Many journal publishers now encourage or require citation of research data.
There is a global network of discipline and institutional data repositories where research data collections are described with a preformatted citation statement provided.
Only cited data can be counted and tracked (in a similar manner to journal articles) to measure impact.
Data citation information may soon be incorporated into practices for research evaluation and reward.

For more information on data citation, see the ANDS guide to Data Citation.

Data Access and Use

In order for data to be disseminated safely, if the data has been classified as either sensitive or highly sensitive, restrictions can to be applied to data access.

Prior to publishing a data output, the selected data repository should be checked for the ability to impose the required access restrictions applicable to the sensitivity of the data.

Data Access and Use

A variety of terms are used to describe the different levels of access that may be applied to a data input. These include open, mediated (which could also be restricted or specialised), embargoed and closed. Different institutions or data repositories may apply slightly different terms or definitions.

Access Level	Description	Applicable Data
Open	A data output available for anyone to access, use and share, without restrictive licences, terms of use, copyright, or other restrictions. Open Access data outputs must be subject to an attribution/share-alike license such as a Creative Commons licence 4.0. Whenever possible, data should be Open Access.	This type of access is typically applied to data which does not contain any sensitive or highly sensitive information.
Mediated (or restricted)	A data output only available to others with the prior approval of a designated Data Custodian (or another authorised person). A restrictive license and/or restrictive terms of use are usually applied to the data to limit its redistribution and govern its reuse. Depositors may place other conditions on the use of Mediated Access data. A potential user may be able to discover a dataset held in a repository through exposed metadata, but then be required to obtain the permission of a designated custodian before accessing that data. In some cases, partial data may be available through Open Access, with some data reserved as Mediated Access (for example, environmental or heritage data may be distributed openly but stripped of location data, which can only be obtained by bone fide researchers applying through a Mediated Access process). Protecting data that has mediated access can be done in one of two ways: (a) ‘specialised access’ approved by a designated custodian on the research team, who must be available to respond to access requests in a timely manner (b) ‘restricted access’ where decisions are delegated to an organisational Data Steward with clear instructions about access and use. Access and use conditions and a procedure for acquiring data must be properly described within the metadata and implemented by the repository. This data must still be available for scrutiny in case research results are challenged or in allegations of a breach of the Code are alleged. Metadata should still be exposed, except in rare cases where the metadata itself reveals sensitive information. Good practice dictates that sensitive or highly sensitive data should be deposited in a repository supporting Mediated Access, alongside metadata describing the dataset and how to access it. For data archiving and dissemination purposes, data stored locally and only available on direct request from the researcher is no longer standard practice. Mediated Access via a repository should be used when Open Access to data is not appropriate.	Often applied to data containing sensitive information data that cannot be effectively de-identified, data which may be re-identified now or in the future, data in which the original participant’s consent prohibits Open Access.
Embargoed	Metadata is published, but the complete data output is only accessible after a nominated period of time has passed. At the end of the embargo period, data will become available by either open or mediated access.	Often applied to data which accompanies a research publication, or to commercially sensitive data (eg. a pending patent).
Closed	A description of the data and certain metadata is published but the data is inaccessible and there is no process in place to allow others to apply for access to it.	Should only be utilised under exceptional circumstances.

Embargoes

An embargo is a request by a researcher to delay the publication of their dataset until a specified time. Embargoes (or embargo periods) are most commonly used when researchers want to publish datasets but are currently unable to due to reasons such as data sensitivity, impending publication plans or industry/funder agreements. This delay can help researchers who still wish to receive the benefits of publishing data when external factors currently prohibit its publication.

Additionally, researchers may choose to make the metadata for a dataset available immediately but only provide access to actual dataset after the embargo period. As applying an embargo restricts the accessibility of the data, researchers will need to provide a case or justification for the application of an embargo.

Permissions and Licensing

To make it easy for your published dataset be reused appropriately, it's critical to let other researchers know clearly what they can and can't do with the dataset. These permissions are usually set out and clarified by applying a license or terms and conditions of use to the published dataset.

Applying a standard, robust, well-defined license to a dataset means that anyone reusing that dataset will be confident in what they have permission to do and they can act with legal certainty. At Macquarie, open data is generally licensed with a permissive Creative Commons license (e.g., CC-BY 4.0 International). Sensitive data requiring mediated access is instead governed by a Macquarie University Data Licence Terms and Conditions of Use.