Skip to main content

A collection of supervised learning models based on shallow neural network approaches (e.g., word2vec and fastText) with some additional exclusive features

Project description

A collection of supervised learning models based on shallow neural network approaches (e.g., word2vec and fastText) with some additional exclusive features. Written in Python and fully compatible with Scikit-learn.

https://travis-ci.org/giacbrd/ShallowLearn.svg?branch=master https://badge.fury.io/py/shallowlearn.svg

Getting Started

Install the latest version:

pip install cython
pip install shallowlearn

Import models from shallowlearn.models, they implement the standard methods for supervised learning in Scikit-learn, e.g., fit(X, y), predict(X), etc.

Data is raw text, each sample is a list of tokens (words of a document), while each target value in y can be a single label (or a list in case of multi-label training set) associated with the relative sample.

Models

shallowlearn.models.GensimFastText

A supervised learning model based on the fastText algorithm [1]. The code is mostly taken and rewritten from Gensim, it takes advantage of its optimizations (e.g. Cython) and support.

shallowlearn.models.FastText

TODO: The supervised algorithm of fastText implemented in https://github.com/salestock/fastText.py

shallowlearn.models.DeepInverseRegression

TODO: Based on https://radimrehurek.com/gensim/models/word2vec.html#gensim.models.word2vec.Word2Vec.score

Exclusive Features

TODO

Benchmarks

The script scripts/document_classification_20newsgroups.py refers to this Scikit-learn example in which text classifiers are compared on a reference dataset; we added our models to the comparison. The current results, even if still preliminary, are comparable with other approaches, achieving the best performance in speed.

Results as of release 0.0.2, with chi2_select option set to 80%. The times take into account of tf-idf vectorization in the “classic” classifiers; the evaluation measure is macro F1.

Text classifiers comparison

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ShallowLearn-0.0.2-5.tar.gz (70.0 kB view details)

Uploaded Source

File details

Details for the file ShallowLearn-0.0.2-5.tar.gz.

File metadata

File hashes

Hashes for ShallowLearn-0.0.2-5.tar.gz
Algorithm Hash digest
SHA256 52aac29c6d6d57858f66035f73ed065b0743b837b6f5d63d09508f06fd340fa0
MD5 a8235ac70cf55e66e19d642719571dc2
BLAKE2b-256 2041da65f30862b13a78837dc234b27c0a745e097399f3225d0cdcd866d066fb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page