Skip to main content

Frequency vocabulary for NLP purposes

Project description

nlpvocab

Tokens frequency counter with save/load features. Supports save/load with Pickle and TSV formats.

Provides 2 command-line scripts for building words and characters frequency vocabularies.

Usage from Python

from nlpvocab import Vocabulary

text  = 'token1 token2 token1 token2 token3'

vocab = Vocabulary()
vocab.update(text.split())
vocab.save('vocab.tsv', format=Vocabulary.FORMAT_TSV_WITH_HEADERS)

Usage from command line

nlpvocab-chars /a/b/d/e dir_chars_vocab.tsv
nlpvocab-words /a/b/d/e dir_words_vocab.tsv
nlpvocab-chars /a/b.txt file_chars_vocab.tsv
nlpvocab-words /a/b.txt file_words_vocab.tsv

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlpvocab-1.1.5.tar.gz (5.0 kB view details)

Uploaded Source

File details

Details for the file nlpvocab-1.1.5.tar.gz.

File metadata

  • Download URL: nlpvocab-1.1.5.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.6.5

File hashes

Hashes for nlpvocab-1.1.5.tar.gz
Algorithm Hash digest
SHA256 66d38d4bf79a21b97059a54bdd5b16240ea8d29c6074595305729515ea93cf4b
MD5 090203dcd2721c85eb5833d7e7a8db55
BLAKE2b-256 3602ee51786e8ff8362b002e7c417f174259b49fe2f63006c8353532523ffdce

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page