Skip to main content

The streaming access to the Google ngram data.

Project description

https://travis-ci.org/dimazest/google-ngram-downloader.png?branch=master https://coveralls.io/repos/dimazest/google-ngram-downloader/badge.png?branch=master

The Google Books Ngram Viewer dataset is a freely available resource under a Creative Commons Attribution 3.0 Unported License which provides ngram counts over books scanned by Google.

The data is so big, that storing it is almost impossible. However, sometimes you need an aggregate data over the dataset. For example to build a co-occurrence matrix.

This package provides an iterator over the dataset stored at Google. It decompresses the data on the fly and provides you the access to the underlying data.

Example use

>>> from google_ngram_downloader import readline_google_store
>>>
>>> fname, url, records = next(readline_google_store(ngram_len=5))
>>> fname
'googlebooks-eng-all-5gram-20120701-0.gz'
>>> url
'http://storage.googleapis.com/books/ngrams/books/googlebooks-eng-all-5gram-20120701-0.gz'
>>> next(records)
Record(ngram=u'0 " A most useful', year=1860, match_count=1, volume_count=1)

Installation

pip intall google-ngram-downloader

The command line tool

It also provides a simple command line tool to download the ngrams called google-ngram-downloader.

Changes

Version 3.0

  • download, readile and cooccurrence subcommands.

  • readline_google_store transforms lines to Record in several processes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

google-ngram-downloader-3.0.tar.gz (8.8 kB view details)

Uploaded Source

File details

Details for the file google-ngram-downloader-3.0.tar.gz.

File metadata

File hashes

Hashes for google-ngram-downloader-3.0.tar.gz
Algorithm Hash digest
SHA256 59d06b5a456feb31da05a391b608d38c46fa9f29e6689f05fbdc7ef42ce81c65
MD5 8dab4d647ad7cfd1c936a99d986d5a4a
BLAKE2b-256 6d353d65ae2aded17a5f8b0609327f1fe203b2a67f3f451b2e80b37b23cc6cbe

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page