Distributed Python Active Learning library
Project description
DPyACL
Distributed Python Framework for Active Learning
May 2020
Alfredo Lorie Bernardo
version 0.3.3
Introduction
DPyACL
is a flexible Distributed Active Learning library written in Python, aimed to make active learning experiments
simpler and faster. Its leverage Dask distributed features to execute active learning experiments computations among a
cluster of computers, allowing to speed up computation and tackle scenarios where data doesn't fit in a single computer.
It also has been developed with a modular object-oriented design to provide an intuitive, ease of use interface and
to allow reuse, modification, and extensibility. It also offers full compatibility with libraries like NumPy, SciPy,
Pandas, Scikit-learn and Keras. This library is available in PyPI and distributed under the GNU license.4
Up to date, DPyACL heavily uses Dask library to implement in a distributed and parallel fashion the the most significant strategies strategies that have appeared on the single_label-label. For future releases, we hope to include strategies strategies related with multi-label learning paradigms.
Download
GitHub: https://github.com/a24lorie/DPyACL
Using DPyACL
The fastest way to use DPyACL
is from a Jupyter Notebook.
Preparing an experiment
When defining an Active Learning experiment DPyACL
offers set pre-defined components that can be configured and
combined by the user to better fit its needs. The required components to setup and experiment are listed below
- The Dataset
- Labelled and unlabelled sets: Optional - The experiment might be configured to randomly choose an initial labeled and unlabeled sets
- An Experiment: HoldOut and KFold experiments are provided
- The AL scenario: The current release provides a Pool Based Scenario
- The Machine Learning Technique: It can be a machine learning technique from any library that provides an API compatible with the fit, predict and predict_proba definitions. Sklearn, Dask-ML, Keras are compatible
- The Evaluation Method(s)
- The Query Strategy
- The Stopping Criteria
- The Oracle: The current release provides a Simulated Oracle
Configuring the experiment
ml_technique = LogisticRegression(solver='liblinear')
stopping_criteria = MaxIteration(50)
query_strategy = QueryMarginSampling()
performance_metrics = [
Accuracy(),
F1(average='macro'),
Precision(average='macro'),
Recall(average='macro')]
experiment = HoldOutExperiment(
client=None,
X=_X,
Y=_y,
scenario_type=PoolBasedSamplingScenario,
train_idx=train_idx,
test_idx=test_idx,
label_idx=label_idx,
unlabel_idx=unlabel_idx,
ml_technique=ml_technique,
performance_metrics=performance_metrics,
query_strategy=query_strategy,
oracle=SimulatedOracle(labels=_y),
stopping_criteria=stopping_criteria,
self_partition=False
)
Execute the experiment
result = experiment.evaluate(verbose=True)
Analyze the experiment results
query_analyser = ExperimentAnalyserFactory.experiment_analyser(
performance_metrics= [metric.metric_name for metric in performance_metrics],
method_name=query_strategy.query_function_name,
method_results=result,
type="queries"
)
# get a brief description of the experiment
query_analyser.plot_learning_curves(title='Active Learning experiment results')
Contribution
If you find a bug, send a pull request and we'll discuss things. If you are not familiar with "pull request" term I recommend reading the following article for better understanding
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file dpyacl-0.3.3.tar.gz
.
File metadata
- Download URL: dpyacl-0.3.3.tar.gz
- Upload date:
- Size: 52.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 911a87c7935793e43c191241cae1e5260421ae457296896e6967b573a34fc183 |
|
MD5 | 32b4d7175deb6fe9241a6f66eab31d9c |
|
BLAKE2b-256 | a9adfabac2cf0e0ae36afcbaa7bb205b30523f9023f7ad173f9d266b3f44ec28 |