Active Learning for Text Classifcation in Python.
Project description
Active Learning for Text Classifcation in Python.
Installation | Quick Start | Contribution | Changelog | Docs
Small-Text provides state-of-the-art Active Learning for Text Classification. Several pre-implemented Query Strategies, Initialization Strategies, and Stopping Critera are provided, which can be easily mixed and matched to build active learning experiments or applications.
What is Active Learning?
Active Learning allows you to efficiently label training data in a small data scenario.
Features
- Provides unified interfaces for Active Learning so that you can easily mix and match query strategies with classifiers provided by sklearn, Pytorch, or transformers.
- Supports GPU-based Pytorch models and integrates transformers so that you can use state-of-the-art Text Classification models for Active Learning.
- GPU is supported but not required. In case of a CPU-only use case, a lightweight installation only requires a minimal set of dependencies.
- Multiple scientifically evaluated components are pre-implemented and ready to use (Query Strategies, Initialization Strategies, and Stopping Criteria).
News
-
Version 1.1.0 (v1.1.0): Highlights - October 01, 2022
- A small-text package on conda-forge is now available.
- Early stopping and model selection have been reworked.
- One new query strategy and three new stopping criteria have been added.
-
Version 1.0.1 (v1.0.1): Highlights - September 12, 2022
- Minor bug fix release that fixes notebook and code example links that caused problems by pointing to the latest main branch.
-
Use Small-Text from the Rubrix User Interface - July 16, 2022
- We are happy to announce that the great team at rubrix has worked hard to provide a
comprehensive tutorial on how to use small-text from within the rubrix user interface.
- We are happy to announce that the great team at rubrix has worked hard to provide a
-
Version 1.0.0 (v1.0.0): Highlights - June 13, 2022
- We're out of beta 🎉!
- This release mainly consists of code cleanup, documentation, and repository organization.
For a complete list of changes, see the change log.
Installation
Small-Text can be easily installed via pip:
pip install small-text
For a full installation include the transformers extra requirement:
pip install small-text[transformers]
It requires Python 3.7 or newer. For using the GPU, CUDA 10.1 or newer is required. More information regarding the installation can be found in the documentation.
Quick Start
For a quick start, see the provided examples for binary classification, pytorch multi-class classification, and transformer-based multi-class classification, or check out the notebooks.
Notebooks
# | Notebook | |
---|---|---|
1 | Intro: Active Learning for Text Classification with Small-Text | |
2 | Using Stopping Criteria for Active Learning |
Showcase
- Tutorial: 👂 Learn actively, and listen carefully to small-text. (Use small-text conveniently from the rubrix UI.)
Documentation
Read the latest documentation here. Noteworthy pages include:
Alternatives
Contribution
Contributions are welcome. Details can be found in CONTRIBUTING.md.
Acknowledgments
This software was created by Christopher Schröder (@chschroeder) at Leipzig University's NLP group which is a part of the Webis research network. The encompassing project was funded by the Development Bank of Saxony (SAB) under project number 100335729.
Citation
A preprint which introduces small-text is available here:
Small-Text: Active Learning for Text Classification in Python.
@misc{schroeder2021smalltext,
title={Small-Text: Active Learning for Text Classification in Python},
author={Christopher Schröder and Lydia Müller and Andreas Niekler and Martin Potthast},
year={2021},
eprint={2107.10314},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for small_text-1.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d43890207063d23adec94eb6f28632b2725b68b7328bf8ad2ed23c88a234f4c0 |
|
MD5 | 602adff32f578400e3b0277dba42b34b |
|
BLAKE2b-256 | b18a29ddf0631f3ca1a196bae196ea88736e9759693d89432d1c1bb3c22d48c2 |