corpus.byu.edu

corpora, size, queries = better resources, more insight


 Contribute 

 Academic site license 

Overview
Corpora
Size, speed, queries
Insight into variation

History / updates
FAQ / questions
Site license vs full-text
Researchers
Volunteer !

Log in / password
Profile / register

Related resources
   Full-text data
   Word frequency
   Collocates
   N-grams
   WordAndPhrase
   Academic vocabulary

Problems
Contact us


Created by Mark Davies, BYU. Overview, search types, looking at variation, researchers, corpus-based resources.

The most widely used online corpora -- more than 200,000 distinct researchers, teachers, and students each month.
 

English

# words

language/dialect

time period

 compare

 NEW!  Wikipedia Corpus (with virtual corpora)

1.9 billion

English

-2014

 Info 

Global Web-Based English (GloWbE)

1.9 billion

20 countries

2012-13

 

Corpus of Contemporary American English (COCA)

450 million

American

1990-2012

* * * * *

Corpus of Historical American English (COHA)

400 million

American

1810-2009

* *

TIME Magazine Corpus

100 million

American

1923-2006

 

Corpus of American Soap Operas

100 million

American

2001-2012

*

British National Corpus (BYU-BNC)*

100 million

British

1980s-1993

* *

Strathy Corpus (Canada)

50 million

Canadian

1970s-2000s

 

Other languages

       

Corpus del EspaŮol   (see also...)

100 million

Spanish

1200s-1900s

*

Corpus do PortuguÍs   (see also...)

45 million

Portuguese

1300s-1900s

 

N-grams

       

Google Books: American English

155 billion

American

1500s-2000s

*

Google Books: British English

34 billion

British

1500s-2000s

 

Google Books: One Million Books

89 billion

Am/Br

1500s-2000s

 
Google Books: Spanish 45 billion Spanish 1500s-2000s  

* Our architecture and interface to the BNC, which is distributed by IT Services (formerly OUCS) at Oxford University (on behalf of the BNC Consortium)