The Australian Data Archive is committed to providing open access to Australian and international research data for research and education purposes. This open access commitment however is balanced against our obligations to the original participants in these research studies. Data provided to the Australian Data Archive has been collected from research participants following research ethics requirements on the depositor who produced the data. These requirement place obligations on the researchers and ADA for appropriate use of the data for secondary purposes.
BioMed Central (with Chemistry Central and SpringerOpen) has published 285042 articles of peer-reviewed research, all of which are covered by their open access license agreement which allows free distribution and re-use of the full-text article, including the highly structured XML version. As a result, BioMed Central's open access corpus is ideally suited for use by text mining researchers.
CORE aggregates access to open access research papers from around the world. This can be searched online and the data aggregated from repositories by the CORE system can also be accessed in two ways, through the CORE API or by downloading the data to your computer. They also have software, CORE Publisher connector, that provides access to Gold and Hybrid Gold Open Access articles aggregated from non-standard systems of major publishers.
Crossref can be used by researchers to easily harvest full text documents from participating publishers regardless of their business model (eg open access, subscription). Provides step-by-step instructions.
Government agencies, scientific publishers, research institutions and even individual researchers maintain thousands of open-data repositories around the world, containing millions of data sets. Google's Dataset Search enables users to find datasets stored across thousands of repositories on the Web, making these datasets universally accessible and useful.
HathiTrust makes the texts of public domain works in its corpus available for research purposes. The works fall into two categories: non-Google-digitized volumes, which are freely available, and Google-digitized volumes, which are available through an agreement with Google. Within each category there is a distinction between public domain works available only in the US versus public domain works available anywhere in the world.
As well as providing datasets Hathi also allows computational analysis of all the digitised works on their platform via HTRC Analytics.
For further detail you can view their Non-Consumptive Use Research Policy
Date range: 15th Century to 17th Century Geographical focus: Britain, US
Contains digital facsimile page images of virtually every work printed in England, Ireland, Scotland, Wales and British North America and works in English printed elsewhere from 1473-1700.