Sequence labeling active learning framework for Python

These details have not been verified by PyPI

Project links

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
- OS Independent
Programming Language
Topic
- Software Development :: Libraries

Project description

SeqAL

Supported Python versions License

SeqAL is a sequence labeling active learning framework based on Flair.

Installation

SeqAL is available on PyPI:

pip install seqal

SeqAL officially supports Python 3.8+.

Usage

To understand what SeqAL can do, we first introduce the pool-based active learning cycle.

al_cycle

Step 0: Prepare seed data (a small number of labeled data used for training)
Step 1: Train the model with seed data
- Step 2: Predict unlabeled data with the trained model
- Step 3: Query informative samples based on predictions
- Step 4: Annotator (Oracle) annotate the selected samples
- Step 5: Input the new labeled samples to labeled dataset
- Step 6: Retrain model
Repeat step2~step6 until the f1 score of the model beyond the threshold or annotation budget is no left

SeqAL can cover all steps except step 0 and step 4. Because there is no 3rd part annotation tool, we can run below script to simulate the active learning cycle.

from flair.embeddings import WordEmbeddings

from seqal.active_learner import ActiveLearner
from seqal.datasets import ColumnCorpus, ColumnDataset
from seqal.samplers import LeastConfidenceSampler

# 1. get the corpus
columns = {0: "text", 1: "ner"}
data_folder = "./data/sample_bio"
corpus = ColumnCorpus(
    data_folder,
    columns,
    train_file="train_seed.txt",
    dev_file="dev.txt",
    test_file="test.txt",
)

# 2. tagger params
tagger_params = {}
tagger_params["tag_type"] = "ner"
tagger_params["hidden_size"] = 256
embeddings = WordEmbeddings("glove")
tagger_params["embeddings"] = embeddings
tagger_params["use_rnn"] = False

# 3. trainer params
trainer_params = {}
trainer_params["max_epochs"] = 1
trainer_params["mini_batch_size"] = 32
trainer_params["learning_rate"] = 0.1
trainer_params["patience"] = 5

# 4. setup active learner
sampler = LeastConfidenceSampler()
learner = ActiveLearner(corpus, sampler, tagger_params, trainer_params)

# 5. initialize active learner
learner.initialize(dir_path="output/init_train")

# 6. prepare data pool
pool_file = data_folder + "/labeled_data_pool.txt"
data_pool = ColumnDataset(pool_file, columns)
unlabeled_sentences = data_pool.sentences

# 7. query setup
query_number = 2
token_based = False
iterations = 5

# 8. iteration
for i in range(iterations):
    # 9. query unlabeled sentences
    queried_samples, unlabeled_sentences = learner.query(
        unlabeled_sentences, query_number, token_based=token_based, research_mode=True
    )

    # 10. retrain model, the queried_samples will be added to corpus.train
    learner.teach(queried_samples, dir_path=f"output/retrain_{i}")

When calling learner.query(), we set research_mode=True. This means that we simulate the active learning cycle. You can also find the script in examples/active_learning_cycle_research_mode.py. If you want to connect SeqAL with an annotation tool, you can see the script in examples/active_learning_cycle_annotation_mode.py.

Tutorials

We provide a set of quick tutorials to get you started with the library.

Performance

Active learning algorithms achieve 97% performance of the best deep model trained on full data using only 30% of the training data on the CoNLL 2003 English dataset. The CPU model can decrease the time cost greatly only sacrificing a little performance.

See performance for more detail about performance and time cost.

Contributing

If you have suggestions for how SeqAL could be improved, or want to report a bug, open an issue! We'd love all and any contributions.

For more, check out the Contributing Guide.

Credits

Project details

These details have not been verified by PyPI

Project links

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Operating System
- OS Independent
Programming Language
Topic
- Software Development :: Libraries

Release history Release notifications | RSS feed

0.3.5

Oct 19, 2022

This version

0.3.4

Oct 18, 2022

0.3.3

Oct 13, 2022

0.3.2

Oct 13, 2022

0.3.1

Aug 22, 2022

0.3.0

Aug 17, 2022

0.2.2

Aug 20, 2021

0.2.0

Aug 4, 2021

0.1.3

Jun 7, 2021

0.1.0

Jun 3, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seqal-0.3.4.tar.gz (24.4 kB view details)

Uploaded Oct 18, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

seqal-0.3.4-py3-none-any.whl (24.9 kB view details)

Uploaded Oct 18, 2022 Python 3

File details

Details for the file seqal-0.3.4.tar.gz.

File metadata

Download URL: seqal-0.3.4.tar.gz
Upload date: Oct 18, 2022
Size: 24.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.1.13 CPython/3.8.13 Darwin/19.6.0

File hashes

Hashes for seqal-0.3.4.tar.gz
Algorithm	Hash digest
SHA256	`30d6bb4f410bb0199efa38ec6f238f9c1936516c8756ee84b35d6140c22b88b3`
MD5	`f908148c6112720093b13b00cc3c1e96`
BLAKE2b-256	`6e4420a07a48a2ada22c58d895384104ece551f3043c7da2b8d1242e0f48f260`

See more details on using hashes here.

File details

Details for the file seqal-0.3.4-py3-none-any.whl.

File metadata

Download URL: seqal-0.3.4-py3-none-any.whl
Upload date: Oct 18, 2022
Size: 24.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/1.1.13 CPython/3.8.13 Darwin/19.6.0

File hashes

Hashes for seqal-0.3.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7bed2337c1c93e3fa54f0d496a5b203967a44dc2b8e21613ebaf4c16c317ca83`
MD5	`0b2f5150097273c4b209d73898d76c08`
BLAKE2b-256	`016fcc011561423103e69dcc991f89154589f2399732bc0b3d595d71a4d1145f`

See more details on using hashes here.

seqal 0.3.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SeqAL

Installation

Usage

Tutorials

Performance

Contributing

Credits

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes