Skip to main content

Sequence labeling active learning framework for Python

Project description

SeqAL

Documentation Status CI Status Poetry black pre-commit

PyPI Version Supported Python versions License

SeqAL is a sequence labeling active learning framework based on Flair.

Installation

SeqAL is available on PyPI:

pip install seqal

SeqAL officially supports Python 3.8+.

Usage

To understand what SeqAL can do, we first introduce the pool-based active learning cycle.

al_cycle

  • Step 0: Prepare seed data (a small number of labeled data used for training)
  • Step 1: Train the model with seed data
    • Step 2: Predict unlabeled data with the trained model
    • Step 3: Query informative samples based on predictions
    • Step 4: Annotator (Oracle) annotate the selected samples
    • Step 5: Input the new labeled samples to labeled dataset
    • Step 6: Retrain model
  • Repeat step2~step6 until the f1 score of the model beyond the threshold or annotation budget is no left

SeqAL can cover all steps except step 0 and step 4. Because there is no 3rd part annotation tool, we can run below script to simulate the active learning cycle.

$python examples/run_al_cycle.py --text_column 0  --tag_column 1 --data_folder ./data/sample_bio --train_file train_seed.txt --dev_file dev.txt --test_file test.txt --pool_file labeled_data_pool.txt --tag_type ner --hidden_size 256 --embeddings glove --use_rnn False --max_epochs 1 --mini_batch_size 32 --learning_rate 0.1 --sampler MaxNormLogProbSampler --query_number 2 --token_based False --iterations 5 --research_mode True

We set research_mode=True. This means that we simulate the active learning cycle. You can also find the script in examples/run_al_cycle.py or examples/active_learning_cycle_research_mode.py. If you want to connect SeqAL with an annotation tool, you can see the script in examples/active_learning_cycle_annotation_mode.py.

You can find more explanations about the parameters in the following tutorials.

Tutorials

We provide a set of quick tutorials to get you started with the library.

Performance

Active learning algorithms achieve 97% performance of the best deep model trained on full data using only 30% of the training data on the CoNLL 2003 English dataset. The CPU model can decrease the time cost greatly only sacrificing a little performance.

See performance for more detail about performance and time cost.

Contributing

If you have suggestions for how SeqAL could be improved, or want to report a bug, open an issue! We'd love all and any contributions.

For more, check out the Contributing Guide.

Credits

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seqal-0.3.5.tar.gz (23.6 kB view details)

Uploaded Source

Built Distribution

seqal-0.3.5-py3-none-any.whl (24.5 kB view details)

Uploaded Python 3

File details

Details for the file seqal-0.3.5.tar.gz.

File metadata

  • Download URL: seqal-0.3.5.tar.gz
  • Upload date:
  • Size: 23.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.13 Darwin/19.6.0

File hashes

Hashes for seqal-0.3.5.tar.gz
Algorithm Hash digest
SHA256 55baf1d30fec3c43e4abb35241e5135b179d06e1a2bc4d0337704b328d93b684
MD5 b8a5148b84912ca80a8a412baab4940a
BLAKE2b-256 8a11ab3c73ae10a442a2f935441cae5e995c3778f7b8af5d669b99e1b0b7e8e0

See more details on using hashes here.

File details

Details for the file seqal-0.3.5-py3-none-any.whl.

File metadata

  • Download URL: seqal-0.3.5-py3-none-any.whl
  • Upload date:
  • Size: 24.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.8.13 Darwin/19.6.0

File hashes

Hashes for seqal-0.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 e5dee6fc74007897a91b470ffc7c21be585da6551447fa6ff8dabb06534c2837
MD5 b8e7fac8f3e6fea16fced9629a0bba69
BLAKE2b-256 a5d9330b3590147ee5c44ee373c06c9f9cb37ae0d9e2b4867eb777d7489f7a9c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page