Skip to main content
Join the official 2020 Python Developers SurveyStart the survey!

No project description provided

Project description

lda_classifcation

Instantly train an LDA model with a scikit-learn compatible wrapper around gensim's LDA model.

  • Preprocess Your Documents
  • Train an LDA
  • Evaluate Your LDA Model
  • Extract Document Vectors
  • Select the Most Informative Features
  • Classify Your Documents

All in a few lines of code, completely compatible with sklearn's Transformer API.


Installation:

If you want to install via Pypi use the following command:

pip install lda_classification

If you want to install from the sourcefile:

git clone https://github.com/FeryET/lda_classification.git
cd lda_classification/
python setup.py install

Requirements:

gensim == 3.8.0
matplotlib == 3.1.2
numpy == 1.19.1
setuptools~=49.6.0
spacy == 2.3.1
tqdm == 4.48.2
scikit-learn~=0.23.1
tomotopy~=0.9.1
Optional:

If you want to automate the feature selection using this package you can also install xgboost to use the util class.

xgboost == 1.1.1 (Optional)

Example:

from lda_classification.model import GensimLDAVectorizer
from lda_classification.preprocess import SpacyCleaner
from lda_classification.utils import XGBoostFeatureSelector

# docs, labels = FETCH YOUR DATASET 
# y = ENCODED_LABELS
docs = SpacyCleaner().transform(docs)
X = GensimLDAVectorizer(200, return_dense=False).fit_transform(docs)
X_transform = XGBoostFeatureSelector().fit_transform(X, y)

There is also a dataloader class and a BaseData class in order to automate reading your data files from disk. Extend BaseData and implement the abstractmethods in the subclass and feed it to DataReader to simplify fetching your dataset.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for lda-classification, version 0.0.25
Filename, size File type Python version Upload date Hashes
Filename, size lda_classification-0.0.25-py3-none-any.whl (12.3 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size lda_classification-0.0.25.tar.gz (8.1 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page