Skip to main content

Simple tool to predict text classes with various models.

Project description

# TextClassify

## Model

* fastText char
* fastText word
* CNN char embedding
* CNN word embedding
* CNN char & word embedding
* CNN + BiGRU + char & word embedding

## Segment Model

* pyltp
* jieba

## Embedding

* fastText (CBOW / skip-gram)
* gensim

char or word embedding

## Usage

```python
from text_classify import TextClassify

# default params
t = TextClassify()
text = ''
logtis = t.predict(text, precision='16')

# get index2label
t.index2label

# get top label
t.get_top_label(text, k=5, precision='16')
```

## Parameters

### `TextClassify`

* model: 'fasttext' (default), 'cnn', 'mcnn', 'mgcnn'
* cut: True, False (default)
* cut_model: 'pyltp' (default), 'jieba'
* pyltp_model: '/data_hdd/ltp_data/cws.model'
* fasttext_char_model: '/data_hdd/embedding/fasttext/zhihu_char_model.bin'
* fasttext_word_model: '/data_hdd/embedding/fasttext/zhihu_word_model.bin'
* cnn_char_model: '/home/keming/GitHub/custom_recom/cnn_char_fulltext_best.pth'
* cnn_word_model: '/home/keming/GitHub/custom_recom/cnn_word_fulltext_best.pth'
* mcnn_model: '/home/keming/GitHub/custom_recom/mcnn_fulltext_best.pth'
* mgcnn_model: '/home/keming/GitHub/custom_recom/mgcnn_fulltext_best.pth'
* char_embedding_model: '/data_hdd/embedding/wiki_char_256.model'
* word_embedding_model: '/data_hdd/embedding/wiki_word_256.model'
* words_index: '/data_hdd/zhihu/topic/words.csv'
* chars_index: '/data_hdd/zhihu/topic/chars.csv'
* labels_index: '/data_hdd/zhihu/topic/topics.csv'
* delete_char: '/data_hdd/zhihu/del_chars.txt'
* num_class: 384
* embedding_dim: 256
* num_filter: 128
* char_sentence_length: 256
* word_sentence_length: 128
* char_vocab_size: 12592
* word_vocab_size: 727811
* filter_size_1: [2, 3, 4, 5]
* filter_size_2: [2, 3, 4]
* rnn_num_unit: 128
* rnn_num_layer: 2

### `TextClassify.predict`

* text
* precision: '16' (default), '32', '64'

### `TextClassify.get_top_label`

* text
* k: 5 (default), numbers of label to return
* precision: '16' (default), '32', '64'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

text_classify-0.0.8-py2.py3-none-any.whl (10.1 kB view details)

Uploaded Python 2Python 3

File details

Details for the file text_classify-0.0.8-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for text_classify-0.0.8-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 317d292c27e1eb1aaae0879b5aee8b0030ebb2431135b7c38773d2c883f4c767
MD5 832d9297cbc00384a1632c19e9fc2122
BLAKE2b-256 4b14eb2f2ce36770ef53730eb0cc1abfe8932babeeff293b067be9d5469a8ead

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page