A tool to explore how language works. Its algorithms analyse authentic texts of billions of words (text corpora) to identify instantly what is typical in language and what is rare, unusual or emerging usage. It is also designed for text analysis or text mining applications.This tool is used by linguists, lexicographers, translators, students and teachers. It contains ready-to-use corpora in 90+ languages, each having a size of up to 60 billion words to provide a truly representative sample of language.
Corpora at this site were created by Mark Davies, Professor of Linguistics at Brigham Young University. The corpora have many different uses, including: finding out how native speakers actually speak and write; finding the frequency of words, phrases, and collocates; looking at language variation and change, e.g. historical, dialects, and genres; gaining insight into culture, e.g what is said about different concepts over time and in different countries; designing authentic language teaching materials and resources.
The British National Corpus (BNC) is a 100 million word collection of samples of written and spoken language from a wide range of sources, designed to represent a wide cross-section of British English from the later part of the 20th century, both spoken and written.
"The Corpus of Contemporary American English (COCA) is the largest freely-available corpus of English, and the only large and balanced corpus of American English. COCA is probably the most widely-used corpus of English, and it is related to many other corpora of English that we have created, which offer unparalleled insight into variation in English".
Source: Younes, N., & Reips, U.-D.. (2019). Guideline for improving the reliability of Google Ngram studies: Evidence from religious terms. PLOS ONE, 14(3), e0213554. https://doi.org/10.1371/journal.pone.0213554