Vocabulary management for NLP in Python.
Project description
Vocab is a python package that provides vocabulary objects for natural language processing.
Installation
pip install vocab
pip install git+https://github.com/vzhong/vocab.git
Usage
>>> from vocab import Vocab, UnkVocab
>>> v = Vocab()
>>> v.word2index('hello', train=True)
0
>>> v.word2index(['hello', 'world'], train=True)
[0, 1]
>>> v.index2word([1, 0])
['world', 'hello']
>>> v.index2word(1)
'world'
>>> small = v.prune_by_count(2)
>>> small.to_dict()
{'counts': {'hello': 2}, 'index2word': ['hello']}
>>> u = UnkVocab()
>>> u.word2index(['hello', 'world'], train=True)
[1, 2]
>>> u.word2index('hello friend !'.split())
[1, 0, 0]
>>> u.index2word(0)
'<unk>'
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
vocab-0.0.5.tar.gz
(6.9 kB
view hashes)
Built Distribution
vocab-0.0.5-py3-none-any.whl
(7.6 kB
view hashes)