Skip to main content

No project description provided

Project description

lda_classifcation

Instantly train an LDA model with a scikit-learn compatible wrapper around gensim's LDA model.

  • Preprocess Your Documents
  • Train an LDA
  • Evaluate Your LDA Model
  • Extract Document Vectors
  • Select the Most Informative Features
  • Classify Your Documents

All in a few lines of code, completely compatible with sklearn's Transformer API.


Installation:

If you want to install via Pypi use the following command:

pip install lda_classification

If you want to install from the sourcefile:

git clone https://github.com/FeryET/lda_classification.git
cd lda_classification/
python setup.py install

Requirements:

gensim == 3.8.0
matplotlib == 3.1.2
numpy == 1.19.1
setuptools~=49.6.0
spacy == 2.3.1
tqdm == 4.48.2
scikit-learn~=0.23.1
tomotopy~=0.9.1
Optional:

If you want to automate the feature selection using this package you can also install xgboost to use the util class.

xgboost == 1.1.1 (Optional)

Example:

from lda_classification.model import GensimLDAVectorizer
from lda_classification.preprocess import SpacyCleaner
from lda_classification.utils import XGBoostFeatureSelector

# docs, labels = FETCH YOUR DATASET 
# y = ENCODED_LABELS
docs = SpacyCleaner().transform(docs)
X = GensimLDAVectorizer(200, return_dense=False).fit_transform(docs)
X_transform = XGBoostFeatureSelector().fit_transform(X, y)

There is also a dataloader class and a BaseData class in order to automate reading your data files from disk. Extend BaseData and implement the abstractmethods in the subclass and feed it to DataReader to simplify fetching your dataset.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lda_classification-0.0.28.tar.gz (8.1 kB view hashes)

Uploaded Source

Built Distribution

lda_classification-0.0.28-py3-none-any.whl (12.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page