Skip to main content

Static Hash-Based Lookup for Google Ngram Frequencies

Project description

gngram-lookup

PyPI version Downloads Downloads/Month Tests Python 3.9+

Word frequency and part-of-speech tags from 500 years of books. O(1) lookup. 5 million words.

Install

pip install gngram-lookup
python -m gngram_lookup.download_data       # frequency data, ~110 MB
python -m gngram_lookup.download_pos_data   # POS tag data, separate download

Python

import gngram_lookup as ng

ng.exists('computer')       # True
ng.exists('xyznotaword')    # False

ng.frequency('computer')
# {'peak_tf': 2000, 'peak_df': 2000, 'sum_tf': 892451, 'sum_df': 312876}

ng.batch_frequency(['the', 'algorithm', 'xyznotaword'])
# {'the': {...}, 'algorithm': {...}, 'xyznotaword': None}

ng.word_score('the')                     # 1  (most common)
ng.word_score('computer')               # 18
ng.word_score('rucksack')               # 58
ng.word_score('xyznotaword')            # None

ng.pos('fast')                           # ['ADJ', 'ADV', 'VERB']
ng.pos('corn', min_tf=100000)            # ['ADJ', 'NOUN']
ng.pos_freq('corn')                      # {'NOUN': 11722803, 'ADJ': 1433642, ...}
ng.has_pos('sing', ng.PosTag.VERB)       # True
ng.has_pos('sing', ng.PosTag.VERB, min_tf=1000)  # True

CLI

exists computer       # True, exit 0
exists xyznotaword    # False, exit 1

freq computer
# peak_tf_decade: 2000
# peak_df_decade: 2000
# sum_tf: 892451
# sum_df: 312876

score computer        # 18
pos fast              # ADJ ADV VERB
pos-freq corn         # ADJ: 1,433,642 / NOUN: 11,722,803 / VERB: 85,411
has-pos sing VERB     # True, exit 0
has-pos fast NOUN     # False, exit 1

Docs

See Also

Attribution

Data derived from the Google Books Ngram dataset.

License

Proprietary. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gngram_lookup-1.2.1.tar.gz (11.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gngram_lookup-1.2.1-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file gngram_lookup-1.2.1.tar.gz.

File metadata

  • Download URL: gngram_lookup-1.2.1.tar.gz
  • Upload date:
  • Size: 11.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.11.9 Darwin/25.3.0

File hashes

Hashes for gngram_lookup-1.2.1.tar.gz
Algorithm Hash digest
SHA256 6c88c770dbc8677e91aad9cfaf81459916312533cbe96a5357a8b678f0bad5cb
MD5 80818283341916f9cebd6c4a96358cba
BLAKE2b-256 d4721d2cfa100bd88fca3c9a47597480fd60f9d87d2c7e9f293fc8a96fa804ff

See more details on using hashes here.

File details

Details for the file gngram_lookup-1.2.1-py3-none-any.whl.

File metadata

  • Download URL: gngram_lookup-1.2.1-py3-none-any.whl
  • Upload date:
  • Size: 13.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.11.9 Darwin/25.3.0

File hashes

Hashes for gngram_lookup-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 87a3693c2f2705690c75f096f9e928adc6d812c9d804c030af9ebd07f7489dfc
MD5 c315aa37eb4078040cadeca7f6f32797
BLAKE2b-256 0106ab9cf8a0fb92cc8d5c993f236fbca47147c38d33a01a3404d9dec6959e64

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page