Abstract
Complete Corpus of Contemporary American English word, lemma, part of speech format for linguistic data originating from spoken word, fiction, magazine, newspaper, academic writing, movie and television subtitles, blogs, and web page sources. The data is provided in vertical format, making it possible to import into a database. Within the file, texts are separated by a line with ## and the textID.
This TAR file includes 8 zipped folders, each containing between 30 and 34 .txt files of data.
File Format
.tar
File Size (MB)
5511.1
Creation Date
2-22-2020
Deposit Date
7-11-2024
Recommended Citation
Davies, Mark. (2008-) The Corpus of Contemporary American English (COCA). Available online at https://www.english-corpora.org/coca/.
License Restrictions
Corpora data is subject to access and use restrictions, including:
- Data cannot be distributed outside Gonzaga
- Access limited to restricted login or password
- Data cannot be used to create software or products for sale or consumption
- Data is for research and substantial portions (50,000 words or more) cannot be made available to undergraduates
- Any publications or products based on the data should reference the source of the data (see Citation Information)
Comments
This file may take a long time to download due its size (5.38 GB).