A simple, modular active learning library for text classification.
Project description
Active Learning for Text Classifcation in Python.
Installation | Quick Start | Docs
Active Learning allows you to efficiently label training data in a small-data scenario.
This library provides state-of-the-art active learning for text classification which allows to easily mix and match many classifiers and query strategies to build active learning experiments or applications.
Features
- Provides unified interfaces for Active Learning so that you can easily use any classifier provided by sklearn.
- (Optionally) As an optional feature, you can also use pytorch classifiers, including transformer models.
- Multiple scientifically-proven strategies re-implemented: Query Strategies, Initialization Strategies
Installation
Small-text can be easily installed via pip:
pip install small-text
For a full installation include the transformers extra requirement:
pip install small-text[transformers]
Requires Python 3.7 or newer. For using the GPU, CUDA 10.1 or newer is required. More information regarding the installation can be found in the documentation.
Quick Start
For a quick start, see the provided examples for binary classification, pytorch multi-class classification, or transformer-based multi-class classification
Documentation
Read the latest documentation (currently work in progress) here.
Alternatives
Contribution
Contributions are welcome. Details can be found in CONTRIBUTING.md.
Acknowledgments
This software was created by @chschroeder at Leipzig University's NLP group which is a part of the Webis research network. The encompassing project was funded by the Development Bank of Saxony (SAB) under project number 100335729.
Citation
A preprint which introduces small-text is available here:
Small-text: Active Learning for Text Classification in Python.
@misc{schroeder2021smalltext,
title={Small-text: Active Learning for Text Classification in Python},
author={Christopher Schröder and Lydia Müller and Andreas Niekler and Martin Potthast},
year={2021},
eprint={2107.10314},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for small_text-1.0.0a5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e17db39d4b7866c6b46140b639c17df3d84c2b3489c09075cda06071fa92b3a3 |
|
MD5 | 53bf6a069d0b8bba738365ace145494c |
|
BLAKE2b-256 | eb2a96d05f4d012bf2c1dd0be656b7e06008e2e2cf76124517d5d1a5b4f7bcc4 |