Skip to main content

A simple, modular active learning library for text classification.

Project description

PyPI codecov Documentation Status Maintained Yes Contributions Welcome GitHub

small-text logo

Active Learning for Text Classifcation in Python.


Installation | Quick Start | Docs

Small-Text provides state-of-the-art Active Learning for Text Classification. Several components are provided, which are abstracted via generic interfaces, so that you can easily mix and match many classifiers and query strategies to build active learning experiments or applications.

What is Active Learning? Active Learning allows you to efficiently label training data in a small-data scenario.

Features

  • Provides unified interfaces for Active Learning so that you can easily mix and match query strategies with classifiers provided by sklearn, Pytorch, or transformers.
  • Supports GPU-based Pytorch models and integrates transformers so that you can use state-of-the-art Text Classification models for Active Learning.
  • GPU is only required for some models. In case of a CPU-only use case, a slim installation does not need any unnecessary dependencies.
  • Multiple scientifically evaluated components re-implemented: Query Strategies, Initialization Strategies, and Stopping Criteria.

News

  • 🎉 Beta Release (v1.0.0b1) - February 22, 2022
    • New features: Multi-label classification and stopping criteria are now supported.
    • Added/revised large parts of the documentation.

Installation

Small-text can be easily installed via pip:

pip install small-text

For a full installation include the transformers extra requirement:

pip install small-text[transformers]

Requires Python 3.7 or newer. For using the GPU, CUDA 10.1 or newer is required. More information regarding the installation can be found in the documentation.

Quick Start

For a quick start, see the provided examples for binary classification, pytorch multi-class classification, and transformer-based multi-class classification, or check out the notebooks.

Notebooks

# Notebook
1 Intro: Active Learning for Text Classification with Small-Text Open In Colab
2 Using Stopping Criteria for Active Learning Open In Colab

Documentation

Read the latest documentation (currently work in progress) here.

Alternatives

modAL, ALiPy, libact

Contribution

Contributions are welcome. Details can be found in CONTRIBUTING.md.

Acknowledgments

This software was created by Christopher Schröder (@chschroeder) at Leipzig University's NLP group which is a part of the Webis research network. The encompassing project was funded by the Development Bank of Saxony (SAB) under project number 100335729.

Citation

A preprint which introduces small-text is available here:
Small-text: Active Learning for Text Classification in Python.

@misc{schroeder2021smalltext,
    title={Small-Text: Active Learning for Text Classification in Python}, 
    author={Christopher Schröder and Lydia Müller and Andreas Niekler and Martin Potthast},
    year={2021},
    eprint={2107.10314},
    archivePrefix={arXiv},
    primaryClass={cs.LG}
}

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

small-text-1.0.0b1.tar.gz (125.5 kB view hashes)

Uploaded Source

Built Distribution

small_text-1.0.0b1-py3-none-any.whl (121.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page