Skip to main content

Active Learning for Text Classification in Python.

Project description

PyPI Conda Forge codecov Documentation Status Maintained Yes Contributions Welcome MIT License DOI Twitter URL

small-text logo

Active Learning for Text Classification in Python.


Installation | Quick Start | Contribution | Changelog | Docs

Small-Text provides state-of-the-art Active Learning for Text Classification. Several pre-implemented Query Strategies, Initialization Strategies, and Stopping Critera are provided, which can be easily mixed and matched to build active learning experiments or applications.

Features

  • Provides unified interfaces for Active Learning so that you can easily mix and match query strategies with classifiers provided by sklearn, Pytorch, or transformers.
  • Supports GPU-based Pytorch models and integrates transformers so that you can use state-of-the-art Text Classification models for Active Learning.
  • GPU is supported but not required. In case of a CPU-only use case, a lightweight installation only requires a minimal set of dependencies.
  • Multiple scientifically evaluated components are pre-implemented and ready to use (Query Strategies, Initialization Strategies, and Stopping Criteria).

What is Active Learning?

Active Learning allows you to efficiently label training data for supervised learning in a scenario where you have little to no labeled data.

Learning curve example for the TREC-6 dataset.


News

For a complete list of changes, see the change log.


Installation

Small-Text can be easily installed via pip (or conda):

pip install small-text

The command results in a slim installation with only the necessary dependencies. For a full installation via pip, you just need to include the transformers extra requirement:

pip install small-text[transformers]

For conda, which lacks the extra requirements feature, a full installation can be achieved as follows:

conda install -c conda-forge "torch>=1.6.0" "torchtext>=0.7.0" transformers small-text

The library requires Python 3.7 or newer. For using the GPU, CUDA 10.1 or newer is required. More information regarding the installation can be found in the documentation.

Quick Start

For a quick start, see the provided examples for binary classification, pytorch multi-class classification, and transformer-based multi-class classification, or check out the notebooks.

Notebooks

# Notebook
1 Intro: Active Learning for Text Classification with Small-Text Open In Colab
2 Using Stopping Criteria for Active Learning Open In Colab
3 Active Learning using SetFit Open In Colab
4 Using SetFit's Zero Shot Capabilities for Cold Start Initialization Open In Colab

Showcase

A full list of showcases can be found in the docs.

🎀 Would you like to share your use case? Regardless if it is a paper, an experiment, a practical application, a thesis, a dataset, or other, let us know and we will add you to the showcase section or even here.

Documentation

Read the latest documentation here. Noteworthy pages include:


Alternatives

modAL, ALiPy, libact, ALToolbox

Contribution

Contributions are welcome. Details can be found in CONTRIBUTING.md.

Acknowledgments

This software was created by Christopher Schröder (@chschroeder) at Leipzig University's NLP group which is a part of the Webis research network. The encompassing project was funded by the Development Bank of Saxony (SAB) under project number 100335729.

Citation

Small-Text has been introduced in detail in the EACL23 System Demonstration Paper "Small-Text: Active Learning for Text Classification in Python" which can be cited as follows:

@inproceedings{schroeder2023small-text,
    title = "Small-Text: Active Learning for Text Classification in Python",
    author = {Schr{\"o}der, Christopher  and  M{\"u}ller, Lydia  and  Niekler, Andreas  and  Potthast, Martin},
    booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations",
    month = may,
    year = "2023",
    address = "Dubrovnik, Croatia",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.eacl-demo.11",
    pages = "84--95"
}

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

small_text-1.4.1.tar.gz (268.6 kB view details)

Uploaded Source

Built Distribution

small_text-1.4.1-py3-none-any.whl (211.7 kB view details)

Uploaded Python 3

File details

Details for the file small_text-1.4.1.tar.gz.

File metadata

  • Download URL: small_text-1.4.1.tar.gz
  • Upload date:
  • Size: 268.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for small_text-1.4.1.tar.gz
Algorithm Hash digest
SHA256 13e3bcdf5d0b405f9b3aed15ce99b317fa408f5f17b7278d5d1fd4d0c5837857
MD5 1d7adfc8535a7625af15492d9fbda9fe
BLAKE2b-256 46d7451047555fda846caa42acd248e0691f3b9b35ca0f8a57e6200764ad2b45

See more details on using hashes here.

File details

Details for the file small_text-1.4.1-py3-none-any.whl.

File metadata

  • Download URL: small_text-1.4.1-py3-none-any.whl
  • Upload date:
  • Size: 211.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for small_text-1.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c08c41379e4ed7c009113e4c0ac40713d7a334e1dbc699b1dc32f6f27a7bc00d
MD5 71b9ee1908236e1f26d3c85a9ce20c5e
BLAKE2b-256 8e5e8048a14c991619f929b58f9731d5ef42858a69fe92304995ce112b869482

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page