Corpus of News on the Web

NOW 2014-04 April

Mark Davies

Abstract

Corpus of News on the Web data for April 2014.

The TAR folder contains linguistic data in three formats:

Database: This is the format allows for the most robust searches and allows for powerful JOINs across corpus, lexicon, and source tables but requires knowledge of SQL. See Full-text corpus data for more information on how to use the database format.
Linear Text: This format provides a textID for each text, and then the entire text on the same line. In this format, words are not annotated for part of speech or lemma. In addition, contracted words like can't are separated into two parts (ca n't) and punctuation is separated from words (eye level . As her).
Word, Lemma, Part of Speech: Texts are separated by a line with ## and the textID.

File Format

.tar

File Size (MB)

504.11

Creation Date

4-1-2014

Deposit Date

July 2024

Recommended Citation

Davies, Mark. (2016-) Corpus of News on the Web (NOW). Available online at https://www.english-corpora.org/now/.

License Restrictions

Corpora data is subject to access and use restrictions, including:

Data cannot be distributed outside Gonzaga
Access limited to restricted login or password
Data cannot be used to create software or products for sale or consumption
Data is for research and substantial portions (50,000 words or more) cannot be made available to undergraduates
Any publications or products based on the data should reference the source of the data (see Citation Information)

See the full limitations at Restrictions on use of the corpora.

This document is currently not available here.

COinS

Corpus of News on the Web

NOW 2014-04 April

Abstract

File Format

File Size (MB)

Creation Date

Deposit Date

Recommended Citation

License Restrictions

Search

Browse

Author Corner

LINKS

Corpus of News on the Web

NOW 2014-04 April

Creator

Abstract

File Format

File Size (MB)

Creation Date

Deposit Date

Recommended Citation

License Restrictions

Share

Search

Browse

Author Corner

LINKS