Text classification automl

These details have not been verified by PyPI

Project links

Homepage

Project description

Lazy Text Predict

Usage

You can currently upload data which has single categories (i.e. the models can be trained to detect differences between happy, jealous or sad text etc., but not both happy and excited). Your data should be submitted as python lists or pandas series to the fields Xdata and Ydata. Alternately you can pass csv or xlsx files to the appropriate options.

Click here for an extensive example notebook , or see below:

from lazy-text-predict import basic_classification

trial=basic_classification.LTP(Xdata=X,Ydata=Y, csv=None, xlsx=None, x_col='X', y_col='Y', models='all') 
# Xdata is a list of text entries, and Ydata is a list of corresponding labels.
# csv and xlsx give options to load data from those file formats (you can pass the file or the file's location)
# x_col and y_col are strings that specify the columns of the # text and label columns in your csv or xlsx file respectively.
# You can choose between 'transformers'-based, 'count-vectorizer'-based, and 'all' models.

trial.run(training_epochs=5) 
#This trains the models specified above on the data you loaded. 
#Here you can specify the number of training epochs. 
#Fewer training epochs will give poorer performance, but will run quicker to allow debugging.

trial.print_metrics_table()
# This will return the performance of the models that have been trained:
                    Model            loss        accuracy              f1       precision          recall
        bert-base-uncased         0.80771         0.69004         0.68058          0.6783         0.69004
           albert-base-v2          0.8885         0.62252          0.6372           0.714         0.62252
             roberta-base         0.99342           0.533         0.56416         0.68716           0.533
               linear_SVM         0.36482         0.63518         0.30077         0.47439         0.30927
multinomial_naive_bayesian         0.31697         0.68303         0.35983           0.443         0.37341


trial.predict(text) 
# Here text is some custom, user-specified string that your trained classifiers can classify. 
# This will return the class's index based on how the order it appears in your input labels.

This will train and test each of the models show you their performance (loss rate, f1 score, training time, computing resources required etc.) and let you classify your own text.

The models are currently hard-coded, i.e. you can only choose between transformer and count-vectorizer models, but watch this space!

Once you have determined which model is best for your application you can do a more in-depth training on the model of your choice. This can be done by calling a new instance of the LTP class and running a focused training:

focused_trial=basic_classification.LTP(test_frac=0.05,train_frac=0.45)
focused_trial.run(focused=True,focused_model='bert-base-uncased',training_epochs=5)

We have added several example ipynb files to show how the library may be used.

Installation

Install the package from PyPi in command line:

pip install lazy-text-predict

About

Do you want to automatically tag your blog posts? Identify scientific terms in a document? Try to identify the author of a new novel? These are all text classification problems, but may require different levels of complexity in their execution. You don't want to use a deep neural network when a decision tree could suffice, or vice-versa!

How do you choose the best option out of so many choices?

How to choose out of seemingly identical choices?

This tool lets you quickly choose between different natural language processing tools for your classification problem.

What we do is load your text for classification into several of the most useful tools we have identified, train the tools on a small portion of the data, and then try to show you which would be best for your problem.

The models we use at the moment come from this library.

The results show you metrics like: accuracy, f1 score, precision, recall etc. in a simple table-like output. You can then go ahead and choose whichever model works best for you.

This tool is built on top of PyTorch framework and transformers library. The inspiration for this tool came from the great Lazy Predict package, which you should check out if you are interested.

System Requirements

Unfortunately, this tool requires a fair bit of computing power. If you do not have a GPU that the tool can use, you will struggle to run it.

A good test is to try to install the package, if you can, there is a chance you could run it!

A practical alternative is to run this all in google colab pro or similar platforms that give you access to the resources you need (although these might not be free!).

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.0.11

Jul 18, 2021

0.0.10

Jul 18, 2021

0.0.9

Jul 18, 2021

0.0.8

Jul 7, 2021

0.0.6

Jul 7, 2021

0.0.5

Feb 22, 2021

0.0.4

Feb 3, 2021

0.0.3

Jan 27, 2021

0.0.2

Jan 16, 2021

0.0.1

Jan 11, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lazy-text-predict-0.0.11.tar.gz (7.8 kB view details)

Uploaded Jul 18, 2021 Source

Built Distribution

lazy_text_predict-0.0.11-py3-none-any.whl (8.7 kB view details)

Uploaded Jul 18, 2021 Python 3

File details

Details for the file lazy-text-predict-0.0.11.tar.gz.

File metadata

Download URL: lazy-text-predict-0.0.11.tar.gz
Upload date: Jul 18, 2021
Size: 7.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.11

File hashes

Hashes for lazy-text-predict-0.0.11.tar.gz
Algorithm	Hash digest
SHA256	`78e54c5982f7f6bca569a22d1a126331a192b73744f4745cbf4584ead2ca780b`
MD5	`d639f50d2b7d42df4cd4ae20f691cf78`
BLAKE2b-256	`94be46cb35c236ad485241dfabcbb335ce64031488eb2f7962b2f3a0b75543c5`

See more details on using hashes here.

File details

Details for the file lazy_text_predict-0.0.11-py3-none-any.whl.

File metadata

Download URL: lazy_text_predict-0.0.11-py3-none-any.whl
Upload date: Jul 18, 2021
Size: 8.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.4.1 importlib_metadata/4.6.1 pkginfo/1.7.1 requests/2.23.0 requests-toolbelt/0.9.1 tqdm/4.41.1 CPython/3.7.11

File hashes

Hashes for lazy_text_predict-0.0.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`dda0699b009c2a3a98e05a0f9d897a5ba4016fb5f068d276d66ed44a0bcbd1ed`
MD5	`fb5138d13ebeb1c75a63b53df334b438`
BLAKE2b-256	`395cbff4a9a4e96733f59a1108386c8d729f225c7a71bdc0387f8d6c8decf329`

See more details on using hashes here.

lazy-text-predict 0.0.11

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Lazy Text Predict

Usage

Installation

About

System Requirements

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes