Skip to main content

Text classification automl

Project description

Lazy Text Predict

Usage

You can currently upload data which has single categories (i.e. the models can be trained to detect differences between happy, jealous or sad text etc., but not both happy and excited). Your data should be submitted as python lists or pandas series to the fields Xdata and Ydata. Alternately you can pass csv or xlsx files to the appropriate options.

Click here for an extensive example notebook Open In Colab , or see below:

from lazy-text-predict import basic_classification

trial=basic_classification.LTP(Xdata=X,Ydata=Y, csv=None, xlsx=None, x_col='X', y_col='Y', models='all') 
# Xdata is a list of text entries, and Ydata is a list of corresponding labels.
# csv and xlsx give options to load data from those file formats (you can pass the file or the file's location)
# x_col and y_col are strings that specify the columns of the # text and label columns in your csv or xlsx file respectively.
# You can choose between 'transformers'-based, 'count-vectorizer'-based, and 'all' models.

trial.run(training_epochs=5) 
#This trains the models specified above on the data you loaded. 
#Here you can specify the number of training epochs. 
#Fewer training epochs will give poorer performance, but will run quicker to allow debugging.

trial.print_metrics_table()
# This will return the performance of the models that have been trained:
                    Model            loss        accuracy              f1       precision          recall
        bert-base-uncased         0.80771         0.69004         0.68058          0.6783         0.69004
           albert-base-v2          0.8885         0.62252          0.6372           0.714         0.62252
             roberta-base         0.99342           0.533         0.56416         0.68716           0.533
               linear_SVM         0.36482         0.63518         0.30077         0.47439         0.30927
multinomial_naive_bayesian         0.31697         0.68303         0.35983           0.443         0.37341


trial.predict(text) 
# Here text is some custom, user-specified string that your trained classifiers can classify. 
# This will return the class's index based on how the order it appears in your input labels.

This will train and test each of the models show you their performance (loss rate, f1 score, training time, computing resources required etc.) and let you classify your own text.

The models are currently hard-coded, i.e. you can only choose between transformer and count-vectorizer models, but watch this space!

Once you have determined which model is best for your application you can do a more in-depth training on the model of your choice. This can be done by calling a new instance of the LTP class and running a focused training:

focused_trial=basic_classification.LTP(test_frac=0.05,train_frac=0.45)
focused_trial.run(focused=True,focused_model='bert-base-uncased',training_epochs=5)

We have added several example ipynb files to show how the library may be used.

Installation

Install the package from PyPi in command line:

pip install lazy-text-predict

About

Do you want to automatically tag your blog posts? Identify scientific terms in a document? Try to identify the author of a new novel? These are all text classification problems, but may require different levels of complexity in their execution. You don't want to use a deep neural network when a decision tree could suffice, or vice-versa!

How do you choose the best option out of so many choices?

How to choose out of seemingly identical choices?

This tool lets you quickly choose between different natural language processing tools for your classification problem.

What we do is load your text for classification into several of the most useful tools we have identified, train the tools on a small portion of the data, and then try to show you which would be best for your problem.

The models we use at the moment come from this library.

The results show you metrics like: accuracy, f1 score, precision, recall etc. in a simple table-like output. You can then go ahead and choose whichever model works best for you.

This tool is built on top of PyTorch framework and transformers library. The inspiration for this tool came from the great Lazy Predict package, which you should check out if you are interested.

System Requirements

Unfortunately, this tool requires a fair bit of computing power. If you do not have a GPU that the tool can use, you will struggle to run it.

A good test is to try to install the package, if you can, there is a chance you could run it!

A practical alternative is to run this all in google colab pro or similar platforms that give you access to the resources you need (although these might not be free!).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lazy-text-predict-0.0.11.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

lazy_text_predict-0.0.11-py3-none-any.whl (8.7 kB view details)

Uploaded Python 3

File details

Details for the file lazy-text-predict-0.0.11.tar.gz.

File metadata

  • Download URL: lazy-text-predict-0.0.11.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.11

File hashes

Hashes for lazy-text-predict-0.0.11.tar.gz
Algorithm Hash digest
SHA256 78e54c5982f7f6bca569a22d1a126331a192b73744f4745cbf4584ead2ca780b
MD5 d639f50d2b7d42df4cd4ae20f691cf78
BLAKE2b-256 94be46cb35c236ad485241dfabcbb335ce64031488eb2f7962b2f3a0b75543c5

See more details on using hashes here.

File details

Details for the file lazy_text_predict-0.0.11-py3-none-any.whl.

File metadata

  • Download URL: lazy_text_predict-0.0.11-py3-none-any.whl
  • Upload date:
  • Size: 8.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.11

File hashes

Hashes for lazy_text_predict-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 dda0699b009c2a3a98e05a0f9d897a5ba4016fb5f068d276d66ed44a0bcbd1ed
MD5 fb5138d13ebeb1c75a63b53df334b438
BLAKE2b-256 395cbff4a9a4e96733f59a1108386c8d729f225c7a71bdc0387f8d6c8decf329

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page