Frequency vocabulary for NLP purposes
Project description
nlpvocab
Tokens frequency counter with save/load features. Supports save/load with Pickle and TSV formats.
Provides 2 command-line scripts for building words and characters frequency vocabularies.
Usage from Python
from nlpvocab import Vocabulary
text = 'token1 token2 token1 token2 token3'
vocab = Vocabulary()
vocab.update(text.split())
vocab.save('vocab.tsv', format=Vocabulary.FORMAT_TSV_WITH_HEADERS)
Usage from command line
nlpvocab-chars /a/b/d/e dir_chars_vocab.tsv
nlpvocab-words /a/b/d/e dir_words_vocab.tsv
nlpvocab-chars /a/b.txt file_chars_vocab.tsv
nlpvocab-words /a/b.txt file_words_vocab.tsv
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
nlpvocab-1.1.6.tar.gz
(5.2 kB
view details)
File details
Details for the file nlpvocab-1.1.6.tar.gz
.
File metadata
- Download URL: nlpvocab-1.1.6.tar.gz
- Upload date:
- Size: 5.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3d1e76058e6673eec69be810823ff18a3254167f5d5986ef76cec7dd57be6fbb |
|
MD5 | 95ed841061f03ae0de0dc98180380f2b |
|
BLAKE2b-256 | bba85bd6f0a3472539960ba68cf2455787af676337e136eeba8cc74baa75ad32 |