Vocabulary management for NLP in Python.
Project description
Vocab is a python package that provides vocabulary objects for natural language processing.
Installation
pip install vocab
pip install git+https://github.com/vzhong/vocab.git
Usage
>>> from vocab import Vocab, UnkVocab
>>> v = Vocab()
>>> v.word2index('hello', train=True)
0
>>> v.word2index(['hello', 'world'], train=True)
[0, 1]
>>> v.index2word([1, 0])
['world', 'hello']
>>> v.index2word(1)
'world'
>>> small = v.prune_by_count(2)
>>> small.to_dict()
{'counts': {'hello': 2}, 'index2word': ['hello']}
>>> u = UnkVocab()
>>> u.word2index(['hello', 'world'], train=True)
[1, 2]
>>> u.word2index('hello friend !'.split())
[1, 0, 0]
>>> u.index2word(0)
'<unk>'
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
vocab-0.0.5.tar.gz
(6.9 kB
view details)
Built Distribution
vocab-0.0.5-py3-none-any.whl
(7.6 kB
view details)
File details
Details for the file vocab-0.0.5.tar.gz
.
File metadata
- Download URL: vocab-0.0.5.tar.gz
- Upload date:
- Size: 6.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cab92e20c13f964c9c1d319267fbdbe523754e80b17ad7b93ce46bce9089e06b |
|
MD5 | 0d80a787b92e125d45a6e2336adc9286 |
|
BLAKE2b-256 | a5abd0a7c3dffef6146a3d09796ee195153241f61c054909b0f4169faa670913 |
File details
Details for the file vocab-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: vocab-0.0.5-py3-none-any.whl
- Upload date:
- Size: 7.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0.post20200518 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0bba440212204f0427576434264605c118cfea4274342ef1c03a2620014bbe33 |
|
MD5 | 1e659dd112ec325a6bca3a9ba3e52b64 |
|
BLAKE2b-256 | a263c3f14ca498f1f811eaf8f2c6817e1bdc9724cd6294183066cd43fa124aaf |