Frequency vocabulary for NLP purposes
Project description
nlpvocab
Tokens frequency counter with save/load features. Supports save/load with Pickle and TSV formats.
Provides 2 command-line scripts for building words and characters frequency vocabularies.
Usage from Python
from nlpvocab import Vocabulary
text = 'token1 token2 token1 token2 token3'
vocab = Vocabulary()
vocab.update(text.split())
vocab.save('vocab.tsv', format=Vocabulary.FORMAT_TSV_WITH_HEADERS)
Usage from command line
nlpvocab-chars /a/b/d/e dir_chars_vocab.tsv
nlpvocab-words /a/b/d/e dir_words_vocab.tsv
nlpvocab-chars /a/b.txt file_chars_vocab.tsv
nlpvocab-words /a/b.txt file_words_vocab.tsv
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nlpvocab-1.1.5.tar.gz
(5.0 kB
view details)
File details
Details for the file nlpvocab-1.1.5.tar.gz
.
File metadata
- Download URL: nlpvocab-1.1.5.tar.gz
- Upload date:
- Size: 5.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/39.1.0 requests-toolbelt/0.8.0 tqdm/4.23.4 CPython/3.6.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66d38d4bf79a21b97059a54bdd5b16240ea8d29c6074595305729515ea93cf4b |
|
MD5 | 090203dcd2721c85eb5833d7e7a8db55 |
|
BLAKE2b-256 | 3602ee51786e8ff8362b002e7c417f174259b49fe2f63006c8353532523ffdce |