Skip to main content

Static Hash-Based Lookup for Google Ngram Frequencies

Project description

gngram-lookup

PyPI version Downloads Downloads/Month Tests Python 3.9+

Word frequency and part-of-speech tags from 500 years of books. O(1) lookup. 5 million words.

Install

pip install gngram-lookup
python -m gngram_lookup.download_data       # frequency data, ~110 MB
python -m gngram_lookup.download_pos_data   # POS tag data, separate download

Python

import gngram_lookup as ng

ng.exists('computer')       # True
ng.exists('xyznotaword')    # False

ng.frequency('computer')
# {'peak_tf': 2000, 'peak_df': 2000, 'sum_tf': 892451, 'sum_df': 312876}

ng.batch_frequency(['the', 'algorithm', 'xyznotaword'])
# {'the': {...}, 'algorithm': {...}, 'xyznotaword': None}

ng.pos('fast')                           # ['ADJ', 'ADV', 'VERB']
ng.pos('corn', min_tf=100000)            # ['ADJ', 'NOUN']
ng.pos_freq('corn')                      # {'NOUN': 11722803, 'ADJ': 1433642, ...}
ng.has_pos('sing', ng.PosTag.VERB)       # True
ng.has_pos('sing', ng.PosTag.VERB, min_tf=1000)  # True

CLI

ng-exists computer    # True, exit 0
ng-exists xyznotaword # False, exit 1

ng-freq computer
# peak_tf_decade: 2000
# peak_df_decade: 2000
# sum_tf: 892451
# sum_df: 312876

ng-pos fast           # ADJ ADV VERB
ng-pos-freq corn      # ADJ: 1,433,642 / NOUN: 11,722,803 / VERB: 85,411
ng-has-pos sing VERB  # True, exit 0
ng-has-pos fast NOUN  # False, exit 1

Docs

See Also

Attribution

Data derived from the Google Books Ngram dataset.

License

Proprietary. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gngram_lookup-1.1.0.tar.gz (10.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gngram_lookup-1.1.0-py3-none-any.whl (13.5 kB view details)

Uploaded Python 3

File details

Details for the file gngram_lookup-1.1.0.tar.gz.

File metadata

  • Download URL: gngram_lookup-1.1.0.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.11.9 Darwin/25.3.0

File hashes

Hashes for gngram_lookup-1.1.0.tar.gz
Algorithm Hash digest
SHA256 eaa81b23709cf50dc8a1618aebcac724d13331a28d144ee0f5c3ca7a5fd99101
MD5 d4ae715388025ae2b18102b8dde888ce
BLAKE2b-256 5d9d3e35cd96192cbae86b20328a195fe61cb2edaba1cde1139f09a3cb7937a6

See more details on using hashes here.

File details

Details for the file gngram_lookup-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: gngram_lookup-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.11.9 Darwin/25.3.0

File hashes

Hashes for gngram_lookup-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 31eee154d4f606b46edae77a2e423e72de6d8c95b4f2ac8ff48366dab6d9b533
MD5 d5a2af3b79d58e356b95b0c67b1b59b3
BLAKE2b-256 cb8ce17a8efd2b8e24978efb9de4aaed1e3e92be272936ea62ea7e53bdf5744e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page