Skip to main content

Sequence labeling active learning framework for Python

Project description

SeqAL

Documentation Status CI Status Poetry black pre-commit

PyPI Version Supported Python versions License

SeqAL is a sequence labeling active learning framework based on Flair.

Installation

SeqAL is available on PyPI:

pip install seqal

SeqAL officially supports Python 3.8+.

Usage

To understand what SeqAL can do, we first introduce the pool-based active learning cycle.

al_cycle

  • Step 0: Prepare seed data (a small number of labeled data used for training)
  • Step 1: Train the model with seed data
    • Step 2: Predict unlabeled data with the trained model
    • Step 3: Query informative samples based on predictions
    • Step 4: Annotator (Oracle) annotate the selected samples
    • Step 5: Input the new labeled samples to labeled dataset
    • Step 6: Retrain model
  • Repeat step2~step6 until the f1 score of the model beyond the threshold or annotation budget is no left

SeqAL can cover all steps except step 0 and step 4. Because there is no 3rd part annotation tool, we can run below script to simulate the active learning cycle.

$python examples/run_al_cycle.py --text_column 0  --tag_column 1 --data_folder ./data/sample_bio --train_file train_seed.txt --dev_file dev.txt --test_file test.txt --pool_file labeled_data_pool.txt --tag_type ner --hidden_size 256 --embeddings glove --use_rnn False --max_epochs 1 --mini_batch_size 32 --learning_rate 0.1 --sampler MaxNormLogProbSampler --query_number 2 --token_based False --iterations 5 --research_mode True

We set research_mode=True. This means that we simulate the active learning cycle. You can also find the script in examples/run_al_cycle.py or examples/active_learning_cycle_research_mode.py. If you want to connect SeqAL with an annotation tool, you can see the script in examples/active_learning_cycle_annotation_mode.py.

You can find more explanations about the parameters in the following tutorials.

Tutorials

We provide a set of quick tutorials to get you started with the library.

Performance

Active learning algorithms achieve 97% performance of the best deep model trained on full data using only 30% of the training data on the CoNLL 2003 English dataset. The CPU model can decrease the time cost greatly only sacrificing a little performance.

See performance for more detail about performance and time cost.

Contributing

If you have suggestions for how SeqAL could be improved, or want to report a bug, open an issue! We'd love all and any contributions.

For more, check out the Contributing Guide.

Credits

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seqal-0.3.5.tar.gz (23.6 kB view hashes)

Uploaded Source

Built Distribution

seqal-0.3.5-py3-none-any.whl (24.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page