Vocabulary management for NLP in Python.
Project description
Vocab is a python package that provides vocabulary objects for natural language processing.
Installation
pip install vocab pip install git+https://github.com/vzhong/vocab.git
Usage
>>> from vocab import Vocab, UnkVocab >>> v = Vocab() >>> v.word2index('hello', train=True) 0 >>> v.word2index(['hello', 'world'], train=True) [0, 1] >>> v.index2word([1, 0]) ['world', 'hello'] >>> v.index2word(1) 'world' >>> small = v.prune_by_count(2) >>> small.to_dict() {'counts': {'hello': 2}, 'index2word': ['hello']} >>> u = UnkVocab() >>> u.word2index(['hello', 'world'], train=True) [1, 2] >>> u.word2index('hello friend !'.split()) [1, 0, 0] >>> u.index2word(0) '<unk>'
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size | File type | Python version | Upload date | Hashes |
---|---|---|---|---|
Filename, size vocab-0.0.5-py3-none-any.whl (7.6 kB) | File type Wheel | Python version py3 | Upload date | Hashes View |
Filename, size vocab-0.0.5.tar.gz (6.9 kB) | File type Source | Python version None | Upload date | Hashes View |