Skip to main content

Frequency vocabulary for NLP purposes

Project description

nlpvocab

Tokens frequency counter with save/load features. Supports save/load with Pickle and TSV formats.

Provides 2 command-line scripts for building words and characters frequency vocabularies.

Usage from Python

from nlpvocab import Vocabulary

text  = 'token1 token2 token1 token2 token3'

vocab = Vocabulary()
vocab.update(text.split())
vocab.save('vocab.tsv', format=Vocabulary.FORMAT_TSV_WITH_HEADERS)

Usage from command line

nlpvocab-chars /a/b/d/e dir_chars_vocab.tsv
nlpvocab-words /a/b/d/e dir_words_vocab.tsv
nlpvocab-chars /a/b.txt file_chars_vocab.tsv
nlpvocab-words /a/b.txt file_words_vocab.tsv

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlpvocab-1.2.0.tar.gz (5.3 kB view details)

Uploaded Source

File details

Details for the file nlpvocab-1.2.0.tar.gz.

File metadata

  • Download URL: nlpvocab-1.2.0.tar.gz
  • Upload date:
  • Size: 5.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7

File hashes

Hashes for nlpvocab-1.2.0.tar.gz
Algorithm Hash digest
SHA256 e5f8b2384d22e7e4eab431fd2e1991f58105fdc9b71a0ea9872c4b751cf73482
MD5 3e1e4156d526e422348983b580ca7a9e
BLAKE2b-256 c229c9a1c1e3a5fb45c4342e8ff26a1389e817fe2fd56b5218d7e506c792b0be

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page