Subject and Research Guides: Linguistics: Corpus Linguistics

Corpora

Sketch Engine
A tool to explore how language works. Its algorithms analyse authentic texts of billions of words (text corpora) to identify instantly what is typical in language and what is rare, unusual or emerging usage. It is also designed for text analysis or text mining applications.This tool is used by linguists, lexicographers, translators, students and teachers. It contains ready-to-use corpora in 90+ languages, each having a size of up to 60 billion words to provide a truly representative sample of language.
English Corpora
Corpora at this site were created by Mark Davies, Professor of Linguistics at Brigham Young University. The corpora have many different uses, including: finding out how native speakers actually speak and write; finding the frequency of words, phrases, and collocates; looking at language variation and change, e.g. historical, dialects, and genres; gaining insight into culture, e.g what is said about different concepts over time and in different countries; designing authentic language teaching materials and resources.
British National Corpus
The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of British English from the later part of the 20th century, both spoken and written.
Corpus of Contemporary American English
"The Corpus of Contemporary American English (COCA) is the largest freely-available corpus of English, and the only large and balanced corpus of American English. COCA is probably the most widely-used corpus of English, and it is related to many other corpora of English that we have created, which offer unparalleled insight into variation in English".
More Corpora via Multisearch
Google nGram
Google Ngram is a searchengine that charts word frequencies from GoogleBooks (mostly) and thereby allows for the examination of cultural change as it is reflected in books.

more... less...

Source: Younes, N., & Reips, U.-D.. (2019). Guideline for improving the reliability of Google Ngram studies: Evidence from religious terms. PLOS ONE, 14(3), e0213554. https://doi.org/10.1371/journal.pone.0213554
Tools for Corpus Linguistics
List of online tools used in corpus compilation and analysis. Many are free or open source. "Compiled by Kristin Berberich, Ingo Kleiber, and many amazing anonymous contributors." c2020

Corpus Linguistics at Macquarie

Corpus Linguistics
Corpus Linguistics Website for the Department of Linguistics