Skip to main content

Active Learning for Text Classification in Python.

Project description

PyPI Conda Forge codecov Documentation Status Maintained Yes Contributions Welcome MIT License DOI

small-text logo

Active Learning for Text Classification in Python.


Installation | Quick Start | Contribution | Changelog | Docs

Small-Text provides state-of-the-art Active Learning for Text Classification. Several pre-implemented query strategies, initialization strategies, and stopping criteria are provided, which can be easily mixed and matched to build active learning experiments or applications.

What is Active Learning?

Active learning allows you to efficiently label training data for supervised learning when you have little to no labeled data.

Learning curve example for the TREC-6 dataset.

Active Learning in Practice

Active Learning for Text Classification has been applied across diverse fields, including biomedical research, social science, information science, computer science, and political communication:

See the showcase section specifically for previous active learning applications where small-text was used.

Features

  • Provides unified interfaces for Active Learning, allowing you to easily mix and match query strategies with classifiers provided by sklearn, PyTorch, or transformers.
  • Supports GPU-based Pytorch models and integrates transformers so that you can use state-of-the-art Text Classification models for Active Learning.
  • GPU is supported but not required. CPU-only use cases require only a lightweight installation with minimal dependencies.
  • Multiple scientifically evaluated components are pre-implemented and ready to use (Query Strategies, Initialization Strategies, and Stopping Criteria).

News

Version 2.0.0 dev4 (v2.0.0.dev4) - May 23rd, 2026

  • This is a development release with the most changes so far. You can consider it an alpha release, which does not guarantee you stable interfaces yet, but is otherwise ready to use.
  • Version 2.0.0 offers refined interfaces, new query strategies, improved classifiers, and new functionality such as vector indices. See the changelog for a full list of changes.

Community Survey - March 8th, 2026

Version 1.4.1 (v1.4.1) - August 18th, 2024

  • Bugfix release.

Paper published at EACL 2023 🎉

For a complete list of changes, see the change log.


Installation

Small-Text can be easily installed via pip:

pip install small-text

The command results in a slim installation with only the necessary dependencies. For a full installation via pip, you just need to include the transformers extra requirement:

pip install small-text[transformers]

The library requires Python 3.10 or newer. For using the GPU, CUDA 10.1 or newer is required. More information regarding the installation can be found in the documentation.

Quick Start

For a quick start, see the provided examples for binary classification, pytorch multi-class classification, and transformer-based multi-class classification, or check out the notebooks.

Notebooks

Showcase

A full list of showcases can be found in the docs.

Would you like to share your use case? Regardless if it is a paper, an experiment, a practical application, a thesis, a dataset, or other, let us know and we will add you to the showcase section or even here.

Documentation

Read the latest documentation here. Noteworthy pages include:


Scope of Features

Extension of Table 1 in the EACL 2023 paper.
Name Active Learning
Query Strategies Stopping Criteria
small-text v1.3.0 14 5
small-text v2.0.0 19 5

We use the numbers only to show the tremendous progress that small-text has made over time. There are many features and improvements that are not reflected in these numbers.

Alternatives

modAL, ALiPy, libact, ALToolbox


Contribution

Contributions are welcome. Details can be found in CONTRIBUTING.md.

Acknowledgments

This software was created by Christopher Schröder (@chschroeder) at Leipzig University's NLP group which is a part of the Webis research network. The encompassing project was funded by the Development Bank of Saxony (SAB) under project number 100335729.

Citation

Small-Text has been introduced in detail in the EACL23 System Demonstration Paper "Small-Text: Active Learning for Text Classification in Python" which can be cited as follows:

@inproceedings{schroeder2023small-text,
    title = "Small-Text: Active Learning for Text Classification in Python",
    author = {Schr{\"o}der, Christopher  and  M{\"u}ller, Lydia  and  Niekler, Andreas  and  Potthast, Martin},
    booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations",
    month = may,
    year = "2023",
    address = "Dubrovnik, Croatia",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.eacl-demo.11",
    pages = "84--95"
}

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

small_text-2.0.0.dev4.tar.gz (320.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

small_text-2.0.0.dev4-py3-none-any.whl (127.7 kB view details)

Uploaded Python 3

File details

Details for the file small_text-2.0.0.dev4.tar.gz.

File metadata

  • Download URL: small_text-2.0.0.dev4.tar.gz
  • Upload date:
  • Size: 320.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for small_text-2.0.0.dev4.tar.gz
Algorithm Hash digest
SHA256 0b7d3d93262ba0bc4dbdeadb1064fad2c76e6f5fd7b0b873ce5781555783b8e5
MD5 c6d397c4fe16ec49d4ca03fddda2ac98
BLAKE2b-256 36ba7633ae1bfb871db9e0e19cacd7c42f03f5dca313ef38d9f5745c34523ffe

See more details on using hashes here.

File details

Details for the file small_text-2.0.0.dev4-py3-none-any.whl.

File metadata

  • Download URL: small_text-2.0.0.dev4-py3-none-any.whl
  • Upload date:
  • Size: 127.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for small_text-2.0.0.dev4-py3-none-any.whl
Algorithm Hash digest
SHA256 f832b070a0954473ab4ed1fb5187e3510c7e0fbb30407f3d1715a57f8f3f21dd
MD5 f46131abd6864150685c98a980d5455a
BLAKE2b-256 d1a174cab000a53bf969e7d8d8dcb4bc7b6f8c60be164963759516c3b549ed20

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page