Skip to main content

Distributed Python Active Learning library

Project description


                             DPyACL
            Distributed Python Framework for Active Learning

                           May 2020
		    Alfredo Lorie Bernardo							

                         version 0.3.3

Introduction

DPyACL is a flexible Distributed Active Learning library written in Python, aimed to make active learning experiments simpler and faster. Its leverage Dask distributed features to execute active learning experiments computations among a cluster of computers, allowing to speed up computation and tackle scenarios where data doesn't fit in a single computer. It also has been developed with a modular object-oriented design to provide an intuitive, ease of use interface and to allow reuse, modification, and extensibility. It also offers full compatibility with libraries like NumPy, SciPy, Pandas, Scikit-learn and Keras. This library is available in PyPI and distributed under the GNU license.4

Up to date, DPyACL heavily uses Dask library to implement in a distributed and parallel fashion the the most significant strategies strategies that have appeared on the single_label-label. For future releases, we hope to include strategies strategies related with multi-label learning paradigms.

Download

GitHub: https://github.com/a24lorie/DPyACL

Using DPyACL

The fastest way to use DPyACL is from a Jupyter Notebook.

Preparing an experiment

When defining an Active Learning experiment DPyACL offers set pre-defined components that can be configured and combined by the user to better fit its needs. The required components to setup and experiment are listed below

  1. The Dataset
  2. Labelled and unlabelled sets: Optional - The experiment might be configured to randomly choose an initial labeled and unlabeled sets
  3. An Experiment: HoldOut and KFold experiments are provided
  4. The AL scenario: The current release provides a Pool Based Scenario
  5. The Machine Learning Technique: It can be a machine learning technique from any library that provides an API compatible with the fit, predict and predict_proba definitions. Sklearn, Dask-ML, Keras are compatible
  6. The Evaluation Method(s)
  7. The Query Strategy
  8. The Stopping Criteria
  9. The Oracle: The current release provides a Simulated Oracle

Configuring the experiment

ml_technique = LogisticRegression(solver='liblinear')
stopping_criteria = MaxIteration(50)
query_strategy =  QueryMarginSampling()
performance_metrics = [
                Accuracy(),
                F1(average='macro'),
                Precision(average='macro'),
                Recall(average='macro')]

experiment = HoldOutExperiment(
    client=None,
    X=_X,
    Y=_y,
    scenario_type=PoolBasedSamplingScenario,
    train_idx=train_idx,
    test_idx=test_idx,
    label_idx=label_idx,
    unlabel_idx=unlabel_idx,
    ml_technique=ml_technique,
    performance_metrics=performance_metrics,
    query_strategy=query_strategy,
    oracle=SimulatedOracle(labels=_y),
    stopping_criteria=stopping_criteria,
    self_partition=False
)

Execute the experiment

 result = experiment.evaluate(verbose=True)

Analyze the experiment results

 query_analyser = ExperimentAnalyserFactory.experiment_analyser(
                            performance_metrics= [metric.metric_name for metric in performance_metrics],
                            method_name=query_strategy.query_function_name,
                            method_results=result,
                            type="queries"
                        )

# get a brief description of the experiment
query_analyser.plot_learning_curves(title='Active Learning experiment results')

Contribution

If you find a bug, send a pull request and we'll discuss things. If you are not familiar with "pull request" term I recommend reading the following article for better understanding

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dpyacl-0.3.3.tar.gz (52.2 kB view details)

Uploaded Source

File details

Details for the file dpyacl-0.3.3.tar.gz.

File metadata

  • Download URL: dpyacl-0.3.3.tar.gz
  • Upload date:
  • Size: 52.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.4.2 requests/2.22.0 setuptools/45.2.0 requests-toolbelt/0.8.0 tqdm/4.30.0 CPython/3.8.2

File hashes

Hashes for dpyacl-0.3.3.tar.gz
Algorithm Hash digest
SHA256 911a87c7935793e43c191241cae1e5260421ae457296896e6967b573a34fc183
MD5 32b4d7175deb6fe9241a6f66eab31d9c
BLAKE2b-256 a9adfabac2cf0e0ae36afcbaa7bb205b30523f9023f7ad173f9d266b3f44ec28

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page