Skip to main content

No project description provided

Project description

lda_classifcation

Instantly train an LDA model with a scikit-learn compatible wrapper around gensim's LDA model.

  • Preprocess Your Documents
  • Train an LDA
  • Evaluate Your LDA Model
  • Extract Document Vectors
  • Select the Most Informative Features
  • Classify Your Documents

All in a few lines of code, completely compatible with sklearn's Transformer API.


Installation:

If you want to install via Pypi use the following command:

pip install lda_classification

If you want to install from the sourcefile:

git clone https://github.com/FeryET/lda_classification.git
cd lda_classification/
python setup.py install

Requirements:

gensim == 3.8.0
matplotlib == 3.1.2
numpy == 1.19.1
setuptools~=49.6.0
spacy == 2.3.1
tqdm == 4.48.2
scikit-learn~=0.23.1
tomotopy~=0.9.1
Optional:

If you want to automate the feature selection using this package you can also install xgboost to use the util class.

xgboost == 1.1.1 (Optional)

Example:

from lda_classification.model import GensimLDAVectorizer
from lda_classification.preprocess import SpacyCleaner
from lda_classification.utils import XGBoostFeatureSelector

# docs, labels = FETCH YOUR DATASET 
# y = ENCODED_LABELS
docs = SpacyCleaner().transform(docs)
X = GensimLDAVectorizer(200, return_dense=False).fit_transform(docs)
X_transform = XGBoostFeatureSelector().fit_transform(X, y)

There is also a dataloader class and a BaseData class in order to automate reading your data files from disk. Extend BaseData and implement the abstractmethods in the subclass and feed it to DataReader to simplify fetching your dataset.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lda_classification-0.0.29.tar.gz (8.2 kB view details)

Uploaded Source

Built Distribution

lda_classification-0.0.29-py3-none-any.whl (12.3 kB view details)

Uploaded Python 3

File details

Details for the file lda_classification-0.0.29.tar.gz.

File metadata

  • Download URL: lda_classification-0.0.29.tar.gz
  • Upload date:
  • Size: 8.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.7

File hashes

Hashes for lda_classification-0.0.29.tar.gz
Algorithm Hash digest
SHA256 5519a1a0009dfe9a9893172fec40cb91ed48cfcb27ed48f44ff8742fa06eb21a
MD5 0632a2c3b60c7f982ad1fa04945a8ae6
BLAKE2b-256 188c5ddcec1fe5c4383e699c15a58f3aee903c649da2dcf201839223dd487443

See more details on using hashes here.

File details

Details for the file lda_classification-0.0.29-py3-none-any.whl.

File metadata

  • Download URL: lda_classification-0.0.29-py3-none-any.whl
  • Upload date:
  • Size: 12.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/49.6.0.post20200814 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.7

File hashes

Hashes for lda_classification-0.0.29-py3-none-any.whl
Algorithm Hash digest
SHA256 228cfc9f552abf7babf0f7c6b62397f45f2d6b9062f74c7c2040f24e53f71fef
MD5 ee55475d9eda01544af59d44abc75535
BLAKE2b-256 12572c6900973bdd2618867ef03fdcf76d20c6f2613ff6adfc29c50def04278c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page