Frequency vocabulary for NLP purposes
Project description
nlpvocab
Tokens frequency counter with save/load features. Supports save/load with Pickle and TSV formats.
Provides 2 command-line scripts for building words and characters frequency vocabularies.
Usage from Python
from nlpvocab import Vocabulary
text = 'token1 token2 token1 token2 token3'
vocab = Vocabulary()
vocab.update(text.split())
vocab.save('vocab.tsv', format=Vocabulary.FORMAT_TSV_WITH_HEADERS)
Usage from command line
nlpvocab-chars /a/b/d/e dir_chars_vocab.tsv
nlpvocab-words /a/b/d/e dir_words_vocab.tsv
nlpvocab-chars /a/b.txt file_chars_vocab.tsv
nlpvocab-words /a/b.txt file_words_vocab.tsv
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nlpvocab-1.2.0.tar.gz
(5.3 kB
view details)
File details
Details for the file nlpvocab-1.2.0.tar.gz
.
File metadata
- Download URL: nlpvocab-1.2.0.tar.gz
- Upload date:
- Size: 5.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5f8b2384d22e7e4eab431fd2e1991f58105fdc9b71a0ea9872c4b751cf73482 |
|
MD5 | 3e1e4156d526e422348983b580ca7a9e |
|
BLAKE2b-256 | c229c9a1c1e3a5fb45c4342e8ff26a1389e817fe2fd56b5218d7e506c792b0be |