Short and Long Text Classifier using clustering-based enrichment
Project description
saltclass
saltclass (Short and Long Text Classifier) is a Python module for text classification built under the MIT license. The project was started in 2018 at the Department of Methodology & Statistics, Utrecht University.
Installation
To install via pip:
$ pip install saltclass $ pip install --upgrade saltclass
Sample Usage
>>> from saltclass import SALT
>>> train_X = [[10, 0, 0], [0, 20, 0], [4, 13, 5]]
>>> train_y = [0, 1, 1]
>>> vocab = ['statistics', 'medicine', 'crime']
>>> object_from_df = SALT(train_X, train_y, vocabulary=vocab)
>>> object_from_file = SALT.data_from_dir(train_dir='D:/train/', language='en')
>>> object_from_df.enrich()
>>> object_from_df.train(classifier='svm')
>>> object_from_df.print_info()
>>> prediction = object_from_df.predict(data_file='second_test.txt')
>>> print(object_from_df.vocabulary)
>>> print(object_from_df.newdata)
>>> print([k for (k, v) in object_from_df.vocabulary.items() if object_from_df.newdata[0][v] != 0])
>>> print(prediction)
Dependencies
saltclass requires:
Python (>= 3.5)
NumPy (>= 1.11.0)
SciPy (>= 0.17.0)
LDA
Scikit-learn (>= 0.20.0)
Matplotlib (>= 3.0)
Tqdm
Language_check
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
saltclass-0.1.1.tar.gz
(6.8 kB
view hashes)