Vocabulary management for NLP in Python.
Project description
Vocab is a python package that provides vocabulary objects for natural language processing.
Installation
pip install vocab pip install git+https://github.com/vzhong/vocab.git
Usage
>>> from vocab import Vocab, UnkVocab >>> v = Vocab() >>> v.word2index('hello', train=True) 0 >>> v.word2index(['hello', 'world'], train=True) [0, 1] >>> v.index2word([1, 0]) ['world', 'hello'] >>> v.index2word(1) 'world' >>> small = v.prune_by_count(2) >>> small.to_dict() {'counts': {'hello': 2}, 'index2word': ['hello']} >>> u = UnkVocab() >>> u.word2index(['hello', 'world'], train=True) [1, 2] >>> u.word2index('hello friend !'.split()) [1, 0, 0] >>> u.index2word(0) '<unk>'
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
vocab-0.0.5.tar.gz
(6.9 kB
view hashes)
Built Distribution
vocab-0.0.5-py3-none-any.whl
(7.6 kB
view hashes)