NLP toolkit, including tokenization, sequence tagging, etc.
Project description
naivenlp
A naive toolkit for NLP.
Tokenizers
A tokenizer is used to tokenize text. It can converts tokens to ids, and convert ids to tokens.
Here are some vocab-based tokenizers, which means theses tokenizers need an vocabulary.
VocabBasedTokenizer
, base class for vocab-based tokenizers.JiebaTokenizer
, an wrapper for original fsxjy/jiebaBasicTokenizer
andWordpieceTokenizer
, from google-research/bertLanguageModelTokenizer
, a tokenizer for language models.Transformer
,BERT
for example.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
naivenlp-0.0.3.tar.gz
(14.5 kB
view hashes)
Built Distribution
naivenlp-0.0.3-py3-none-any.whl
(22.7 kB
view hashes)