Abstract
Complete Corpus del Español linear text data format for texts from 21 Spanish speaking countries. This format provides a textID for each text, and then the entire text on the same line. In this format, words are not annotated for part of speech or lemma. In addition, contracted words like can't are separated into two parts (ca n't) and punctuation is separated from words (eye level . As her). The TAR folder contains a zipped file of texts for each country.
File Format
.tar
File Size (MB)
3959.9
Creation Date
11-17-2016
Deposit Date
6-20-2024
Recommended Citation
Davies, Mark. (2016-) Corpus del Español: Web/Dialects. Available online at http://www.corpusdelespanol.org/web-dial/.
License Restrictions
Corpora data is subject to access and use restrictions, including:
- Data cannot be distributed outside Gonzaga
- Access limited to restricted login or password
- Data cannot be used to create software or products for sale or consumption
- Data is for research and substantial portions (50,000 words or more) cannot be made available to undergraduates
- Any publications or products based on the data should reference the source of the data (see Citation Information)
Comments
Due to the large size of this file (3.9 GB) it may take a long time to download.