Toolkit for Active Learning in Generative Tasks
Project description
ATGen: Active Learning for Natural Language Generation
A comprehensive toolkit for applying active learning techniques to natural language generation tasks. This repository contains implementations of various active learning strategies specifically designed for text generation models, helping to reduce annotation costs while maximizing model performance.
🌟 Features
- Multiple Active Learning Strategies: Implementation of strategies like HUDS, HADAS, FAC-LOC, IDDS, and more
- Flexible Model Support: Compatible with various language models (Qwen, Llama, etc.)
- Comprehensive Evaluation: Supports multiple evaluation metrics including ROUGE, BLEU, BERTScore, AlignScore, etc.
- Interactive Visualization: Streamlit dashboard for exploring results and comparing strategies
- Hydra Configuration: Easily configurable experiments through Hydra's YAML-based configuration system
- PEFT Integration: Efficient fine-tuning using Parameter-Efficient Fine-Tuning methods
📋 Requirements
- Python 3.10+
- CUDA-compatible GPU (for model training)
- Dependencies listed in
requirements.txt
🔧 Installation
pip install atgen
🚀 Usage
Running Active Learning Experiments
Experiments can be launched using the run-al command:
CUDA_VISIBLE_DEVICES=0 HYDRA_CONFIG_NAME=base run-al
Parameters:
CUDA_VISIBLE_DEVICES: Specify which GPU to useHYDRA_CONFIG_NAME: Configuration file (e.g.,base,custom,test)
Additional parameters can be overridden via the command line following Hydra's syntax:
CUDA_VISIBLE_DEVICES=0 HYDRA_CONFIG_NAME=base run-al al.strategy=huds model.checkpoint=Qwen/Qwen2.5-7B
Interactive Dashboard
Launch the Streamlit application to explore and visualize your experiments:
streamlit run Welcome.py
Navigate to http://localhost:8501 in your web browser to access the dashboard.
📁 Project Structure
configs/: Configuration files for experimentsal/: Active learning strategy configurationsdata/: Dataset configurationslabeller/: Labeller configurations
src/atgen/: Main packagestrategies/: Implementation of active learning strategiesmetrics/: Code for evaluation metricsutils/: Utility functionsrun_scripts/: Scripts for running experimentslabellers/: Labelling mechanismsvisualize/: Visualization tools
pages/: Streamlit application pagesoutputs/: Experimental results storagecache/: Cached computations to speed up repeated runs
📚 Supported Active Learning Strategies
huds: Hypothetical Document Scoringhadas: Harmonic Diversity Scoringrandom: Random sampling baselinefac-loc: Facility Location strategyidds: Improved Diverse Density Scoring- And more...
📊 Supported Datasets
The toolkit comes pre-configured for several datasets including summarization, question answering, and other generative tasks. Custom datasets can be added by creating new configuration files.
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
📜 License
This project is licensed under the MIT License - see the LICENSE.md file for details.
🔗 Citation
If you use this toolkit in your research, please cite:
@software{atgen,
title = {ATGen: Active Learning for Natural Language Generation},
url = {https://github.com/Aktsvigun/atgen},
year = {2025},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file atgen-0.1.0.tar.gz.
File metadata
- Download URL: atgen-0.1.0.tar.gz
- Upload date:
- Size: 57.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4b7a5ef88f40bc51ec7cb9b1a6388ef8e4c3caacbf7a53dd0f6af1f947dbc490
|
|
| MD5 |
d098f6391429d0b1c457f2f92e72e2e8
|
|
| BLAKE2b-256 |
96b7af045af83200dd3a47059ea401fb17b9777681608d5be93cc31a7e4f36a1
|
File details
Details for the file atgen-0.1.0-py3-none-any.whl.
File metadata
- Download URL: atgen-0.1.0-py3-none-any.whl
- Upload date:
- Size: 75.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c73f845341568237e0fe3584c247cc85423ccffbaabf44ba31caedf3ec15dca6
|
|
| MD5 |
5cb80ef819631cd9b5b8a35ff1d6bf56
|
|
| BLAKE2b-256 |
009df645cc6251fc1613a1d1a78993f643d185db1ad4919b6c94c3fe9f9dee8b
|