Sequence labeling active learning framework for Python
Project description
SeqAL
SeqAL is a sequence labeling active learning framework based on Flair.
Installation
SeqAL is available on PyPI:
pip install seqal
SeqAL officially supports Python 3.8+.
Usage
To understand what SeqAL can do, we first introduce the pool-based active learning cycle.
- Step 0: Prepare seed data (a small number of labeled data used for training)
- Step 1: Train the model with seed data
- Step 2: Predict unlabeled data with the trained model
- Step 3: Query informative samples based on predictions
- Step 4: Annotator (Oracle) annotate the selected samples
- Step 5: Input the new labeled samples to labeled dataset
- Step 6: Retrain model
- Repeat step2~step6 until the f1 score of the model beyond the threshold or annotation budget is no left
SeqAL can cover all steps except step 0 and step 4. Because there is no 3rd part annotation tool, we can run below script to simulate the active learning cycle.
$python examples/run_al_cycle.py --text_column 0 --tag_column 1 --data_folder ./data/sample_bio --train_file train_seed.txt --dev_file dev.txt --test_file test.txt --pool_file labeled_data_pool.txt --tag_type ner --hidden_size 256 --embeddings glove --use_rnn False --max_epochs 1 --mini_batch_size 32 --learning_rate 0.1 --sampler MaxNormLogProbSampler --query_number 2 --token_based False --iterations 5 --research_mode True
We set research_mode=True
. This means that we simulate the active learning cycle. You can also find the script in examples/run_al_cycle.py
or examples/active_learning_cycle_research_mode.py
. If you want to connect SeqAL with an annotation tool, you can see the script in examples/active_learning_cycle_annotation_mode.py
.
You can find more explanations about the parameters in the following tutorials.
Tutorials
We provide a set of quick tutorials to get you started with the library.
- Tutorials on Github Page
- Tutorials on Markown
- Tutorial 1: Introduction
- Tutorial 2: Prepare Corpus
- Tutorial 3: Active Learner Setup
- Tutorial 4: Prepare Data Pool
- Tutorial 5: Research and Annotation Mode
- Tutorial 6: Query Setup
- Tutorial 7: Annotated Data
- Tutorial 8: Stopper
- Tutorial 9: Output Labeled Data
- Tutorial 10: Performance Recorder
- Tutorial 11: Multiple Language Support
Performance
Active learning algorithms achieve 97% performance of the best deep model trained on full data using only 30% of the training data on the CoNLL 2003 English dataset. The CPU model can decrease the time cost greatly only sacrificing a little performance.
See performance for more detail about performance and time cost.
Contributing
If you have suggestions for how SeqAL could be improved, or want to report a bug, open an issue! We'd love all and any contributions.
For more, check out the Contributing Guide.
Credits
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file seqal-0.3.5.tar.gz
.
File metadata
- Download URL: seqal-0.3.5.tar.gz
- Upload date:
- Size: 23.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.8.13 Darwin/19.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 55baf1d30fec3c43e4abb35241e5135b179d06e1a2bc4d0337704b328d93b684 |
|
MD5 | b8a5148b84912ca80a8a412baab4940a |
|
BLAKE2b-256 | 8a11ab3c73ae10a442a2f935441cae5e995c3778f7b8af5d669b99e1b0b7e8e0 |
File details
Details for the file seqal-0.3.5-py3-none-any.whl
.
File metadata
- Download URL: seqal-0.3.5-py3-none-any.whl
- Upload date:
- Size: 24.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.13 CPython/3.8.13 Darwin/19.6.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e5dee6fc74007897a91b470ffc7c21be585da6551447fa6ff8dabb06534c2837 |
|
MD5 | b8e7fac8f3e6fea16fced9629a0bba69 |
|
BLAKE2b-256 | a5d9330b3590147ee5c44ee373c06c9f9cb37ae0d9e2b4867eb777d7489f7a9c |