Publication of data outputs (dissemination) throughout the project or end-of-project archiving of data to appropriate data repositories must be planned and documented and be based on the principle of
‘As open as possible, as closed as necessary’
In the absence of justifiable reasons (such as respect for cultural sensitivity or unmanageable risks regarding sensitive or highly sensitive information), researchers should lodge datasets publicly within a data repository, without mediation, under an open license.
If datasets cannot be exposed without mediation, they should be lodged in a repository supporting mediated access with clear instructions describing access conditions (see here). Where valid reasons for mediating or restricting data access exist or for more restrictive licensing, they must be articulated explicitly.
Other digital research objects, such as code, presentation slide decks, descriptions of procedures, etc., can also be archived and linked to relevant datasets, increasing their value. Code, for example, can be developed in GitHub and then archived in Zenodo using automated connectors developed by the latter. OSF can be used as single ‘portal’ or ‘landing page’ that connects various project outputs (data, bibliography, code, reports, presentations, etc.). OSF can even include a project description, a wiki, or other information and serve as a website for a project or dataset, improving its findability and accessibility
All research data has value beyond the original project. By publishing your data and making it available to others, you can improve the impact of your research significantly.
Positive outcomes of data publication include:
Funders are increasingly requiring data to be openly published. By making the data or metadata available, the research findings become far more findable and reusable by other researchers, maximising its value and impact.
The Australian Research Council encourages researchers to deposit data arising from research projects in publicly accessible Data Repositories.
Journals and funders are adopting data sharing guidelines as a requirement for publication. “The TOP Guidelines were created by journals, funders, and societies to align scientific ideals with practices”. If you want to be able to publish in this growing list of journals in the future, you will need to be prepared to share your data.
In order to address concerns about research reproducibility and integrity, some journals and major publishers also require or strongly recommend that the data supporting any articles published is made publicly available:
Data publication requirements can be sometimes waived if the data is sensitive or confidential, or if the data cannot be anonymised BUT researchers are expected to make their data ‘As open as possible, as closed as necessary’ (EU Horizon 2020).
This is the approach being adopted at Macquarie University. It was originally explored and advocated in an Open Research Data Pilot with the European Commission's Horizon 2020 programme.
A Data Repository is a tool that allows you to share, preserve, and disseminate your research data. Domain-specific data repositories cater to particular research disciplines. Data should be submitted to a domain-specific repository where one is available. To locate a domain-specific repository for your discipline, please see the Registry of Research Data Repositories and/or consult a Data Steward.
If a domain-specific repository is unavailable data should be placed in the Macquarie University Research Data Repository with appropriate access restrictions.
Duplicate copies (e.g. backup copies) of data can be disposed of once the data is archived or a data output lodged into a domain-specific or institutional research data repository, provided doing so complies with relevant legal, regulatory requirements or research agreements pertaining to that data.
As is the case during earlier stages of the data lifecycle, any sensitive or highly sensitive data subject to mediated access that contains personal information must be stored in Australia unless explicit consent for ‘overseas disclosure’ has been obtained; such consent should only be sought in rare circumstances (consult the Digitally Enabled Research and/or Ethics teams for guidance).
One of the benefits of publishing your data on a Macquarie University research data infrastructure is that the data will be given a DOI (Digital Object Identifier). Persistent identifiers such as DOIs are used to identify a particular resource and avoid any ambiguity - you can be certain that if people are referring to an item with the same DOI, they are referring to the exact same resource. DOI links are easily created by adding the https://doi.org/ address before a DOI. If the digital location of the resource changes, the DOI can be updated to point to the new location.
Another identifier becoming more common are IGSNs (International Geo Sample Number). These provide unique identifiers for physical samples and are currently used in mineral sampling and processing. More information can be found about IGSNs.
Persistent Identifies make your data more easily cited.
Data citation refers to the practice of providing a reference to data in the same way as researchers routinely provide a bibliographic reference to other scholarly resources. While data has often been shared in the past, it was seldom cited in the same way as journal articles or other publications. This culture is, however, rapidly changing.
Many archives or repositories will give guidance on how to cite data. There is even an online data citation formatter provided by DataCite for any data that has a DOI.
Did you know?
For more information on data citation, see the ANDS guide to Data Citation.
In order for data to be disseminated safely, if the data has been classified as either sensitive or highly sensitive, restrictions can to be applied to data access.
Prior to publishing a data output, the selected data repository should be checked for the ability to impose the required access restrictions applicable to the sensitivity of the data.
A variety of terms are used to describe the different levels of access that may be applied to a data input. These include open, mediated (which could also be restricted or specialised), embargoed and closed. Different institutions or data repositories may apply slightly different terms or definitions.
Access Level | Description | Applicable Data |
---|---|---|
Open |
|
This type of access is typically applied to data which does not contain any sensitive or highly sensitive information. |
Mediated (or restricted) |
(a) ‘specialised access’ approved by a designated custodian on the research team, who must be available to respond to access requests in a timely manner (b) ‘restricted access’ where decisions are delegated to an organisational Data Steward with clear instructions about access and use. Access and use conditions and a procedure for acquiring data must be properly described within the metadata and implemented by the repository.
|
Often applied to data containing sensitive information
|
Embargoed |
|
Often applied to data which accompanies a research publication, or to commercially sensitive data (eg. a pending patent). |
Closed |
|
Should only be utilised under exceptional circumstances. |
An embargo is a request by a researcher to delay the publication of their dataset until a specified time. Embargoes (or embargo periods) are most commonly used when researchers want to publish datasets but are currently unable to due to reasons such as data sensitivity, impending publication plans or industry/funder agreements. This delay can help researchers who still wish to receive the benefits of publishing data when external factors currently prohibit its publication.
Additionally, researchers may choose to make the metadata for a dataset available immediately but only provide access to actual dataset after the embargo period. As applying an embargo restricts the accessibility of the data, researchers will need to provide a case or justification for the application of an embargo.
To make it easy for your published dataset be reused appropriately, it's critical to let other researchers know clearly what they can and can't do with the dataset. These permissions are usually set out and clarified by applying a license or terms and conditions of use to the published dataset.
Applying a standard, robust, well-defined license to a dataset means that anyone reusing that dataset will be confident in what they have permission to do and they can act with legal certainty. At Macquarie, open data is generally licensed with a permissive Creative Commons license (e.g., CC-BY 4.0 International). Sensitive data requiring mediated access is instead governed by a Macquarie University Data Licence Terms and Conditions of Use.
Information from Springer Nature on the benefits for researchers from data publication:
Additional information on data licensing:
Very useful information from the ARDC on licensing research datasets and related issues:
Information on the CC BY license, recommended for publishing datasets:
ANDS FAQ for research data licensing and copyright: