Skip to main content

Static Hash-Based Lookup for Google Ngram Frequencies

Project description

gngram-lookup

PyPI version Downloads Downloads/Month Tests Python 3.9+

Word frequency and part-of-speech tags from 500 years of books. O(1) lookup. 5 million words.

Install

pip install gngram-lookup
python -m gngram_lookup.download_data       # frequency data, ~110 MB
python -m gngram_lookup.download_pos_data   # POS tag data, separate download

Python

import gngram_lookup as ng

ng.exists('computer')       # True
ng.exists('xyznotaword')    # False

ng.frequency('computer')
# {'peak_tf': 2000, 'peak_df': 2000, 'sum_tf': 892451, 'sum_df': 312876}

ng.batch_frequency(['the', 'algorithm', 'xyznotaword'])
# {'the': {...}, 'algorithm': {...}, 'xyznotaword': None}

ng.pos('fast')                           # ['ADJ', 'ADV', 'VERB']
ng.pos('corn', min_tf=100000)            # ['ADJ', 'NOUN']
ng.pos_freq('corn')                      # {'NOUN': 11722803, 'ADJ': 1433642, ...}
ng.has_pos('sing', ng.PosTag.VERB)       # True
ng.has_pos('sing', ng.PosTag.VERB, min_tf=1000)  # True

CLI

ng-exists computer    # True, exit 0
ng-exists xyznotaword # False, exit 1

ng-freq computer
# peak_tf_decade: 2000
# peak_df_decade: 2000
# sum_tf: 892451
# sum_df: 312876

ng-pos fast           # ADJ ADV VERB
ng-pos-freq corn      # ADJ: 1,433,642 / NOUN: 11,722,803 / VERB: 85,411
ng-has-pos sing VERB  # True, exit 0
ng-has-pos fast NOUN  # False, exit 1

Docs

See Also

Attribution

Data derived from the Google Books Ngram dataset.

License

Proprietary. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gngram_lookup-1.2.0.tar.gz (11.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gngram_lookup-1.2.0-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file gngram_lookup-1.2.0.tar.gz.

File metadata

  • Download URL: gngram_lookup-1.2.0.tar.gz
  • Upload date:
  • Size: 11.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.11.9 Darwin/25.3.0

File hashes

Hashes for gngram_lookup-1.2.0.tar.gz
Algorithm Hash digest
SHA256 e06b8f7b337112e90dc34e0b948639deefb7bc8ab711f6e6cff4a0e24248fd07
MD5 076c7a97c9d3eec4b9ee64879847f0ce
BLAKE2b-256 80b8ecb028122c015d642926a2c6b66808090b98faaf2898191a90dcf38a3806

See more details on using hashes here.

File details

Details for the file gngram_lookup-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: gngram_lookup-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 13.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.11.9 Darwin/25.3.0

File hashes

Hashes for gngram_lookup-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d55e3ce2c7017a16eb6a48ad0165f21fc74a160c1d805fd014771cec71df9266
MD5 545619eca0d373b63521a361d9d4d47e
BLAKE2b-256 bdb68a0b11412cb49c5b8aa1402ef0730709938bede473d943b77f2bcc60c93f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page