Skip to main content

Frequency vocabulary for NLP purposes

Project description

nlpvocab

Tokens frequency counter with save/load features. Supports save/load with Pickle and TSV formats.

Provides 2 command-line scripts for building words and characters frequency vocabularies.

Usage from Python

from nlpvocab import Vocabulary

text  = 'token1 token2 token1 token2 token3'

vocab = Vocabulary()
vocab.update(text.split())
vocab.save('vocab.tsv', format=Vocabulary.FORMAT_TSV_WITH_HEADERS)

Usage from command line

nlpvocab-chars /a/b/d/e dir_chars_vocab.tsv
nlpvocab-words /a/b/d/e dir_words_vocab.tsv
nlpvocab-chars /a/b.txt file_chars_vocab.tsv
nlpvocab-words /a/b.txt file_words_vocab.tsv

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlpvocab-1.1.6.tar.gz (5.2 kB view details)

Uploaded Source

File details

Details for the file nlpvocab-1.1.6.tar.gz.

File metadata

  • Download URL: nlpvocab-1.1.6.tar.gz
  • Upload date:
  • Size: 5.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7

File hashes

Hashes for nlpvocab-1.1.6.tar.gz
Algorithm Hash digest
SHA256 3d1e76058e6673eec69be810823ff18a3254167f5d5986ef76cec7dd57be6fbb
MD5 95ed841061f03ae0de0dc98180380f2b
BLAKE2b-256 bba85bd6f0a3472539960ba68cf2455787af676337e136eeba8cc74baa75ad32

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page