Skip to main content

QuaPy: a framework for Quantification in Python

Project description

QuaPy

QuaPy is an open source framework for quantification (a.k.a. supervised prevalence estimation, or learning to quantify) written in Python.

QuaPy is based on the concept of "data sample", and provides implementations of the most important aspects of the quantification workflow, such as (baseline and advanced) quantification methods, quantification-oriented model selection mechanisms, evaluation measures, and evaluations protocols used for evaluating quantification methods. QuaPy also makes available commonly used datasets, and offers visualization tools for facilitating the analysis and interpretation of the experimental results.

Last updates:

  • Version 0.2.0 is released! major changes can be consulted here.
  • The developer API documentation is available here
  • Manuals are available here

Installation

pip install quapy

Cite QuaPy

If you find QuaPy useful (and we hope you will), please consider citing the original paper in your research:

@inproceedings{moreo2021quapy,
  title={QuaPy: a python-based framework for quantification},
  author={Moreo, Alejandro and Esuli, Andrea and Sebastiani, Fabrizio},
  booktitle={Proceedings of the 30th ACM International Conference on Information \& Knowledge Management},
  pages={4534--4543},
  year={2021}
}

A quick example:

The following script fetches a dataset of tweets, trains, applies, and evaluates a quantifier based on the Adjusted Classify & Count quantification method, using, as the evaluation measure, the Mean Absolute Error (MAE) between the predicted and the true class prevalence values of the test set.

import quapy as qp

training, test = qp.datasets.fetch_UCIBinaryDataset("yeast").train_test

# create an "Adjusted Classify & Count" quantifier
model = qp.method.aggregative.ACC()
Xtr, ytr = training.Xy
model.fit(Xtr, ytr)

estim_prevalence = model.predict(test.X)
true_prevalence = test.prevalence()

error = qp.error.mae(true_prevalence, estim_prevalence)
print(f'Mean Absolute Error (MAE)={error:.3f}')

Quantification is useful in scenarios characterized by prior probability shift. In other words, we would be little interested in estimating the class prevalence values of the test set if we could assume the IID assumption to hold, as this prevalence would be roughly equivalent to the class prevalence of the training set. For this reason, any quantification model should be tested across many samples, even ones characterized by class prevalence values different or very different from those found in the training set. QuaPy implements sampling procedures and evaluation protocols that automate this workflow. See the documentation for detailed examples.

Features

  • Implementation of many popular quantification methods (Classify-&-Count and its variants, Expectation Maximization, quantification methods based on structured output learning, HDy, QuaNet, quantification ensembles, among others).
  • Versatile functionality for performing evaluation based on sampling generation protocols (e.g., APP, NPP, etc.).
  • Implementation of most commonly used evaluation metrics (e.g., AE, RAE, NAE, NRAE, SE, KLD, NKLD, etc.).
  • Datasets frequently used in quantification (textual and numeric), including:
    • 32 UCI Machine Learning datasets.
    • 11 Twitter quantification-by-sentiment datasets.
    • 3 product reviews quantification-by-sentiment datasets.
    • 4 tasks from LeQua 2022 competition and 4 tasks from LeQua 2024 competition
    • IFCB for Plancton quantification
  • Native support for binary and single-label multiclass quantification scenarios.
  • Model selection functionality that minimizes quantification-oriented loss functions.
  • Visualization tools for analysing the experimental results.

Requirements

  • scikit-learn, numpy, scipy
  • pytorch (for QuaNet)
  • svmperf patched for quantification (see below)
  • joblib
  • tqdm
  • pandas, xlrd
  • matplotlib

Contributing

In case you want to contribute improvements to quapy, please generate pull request to the "devel" branch.

Documentation

Check out the developer API documentation here.

Check out the Manuals, in which many code examples are provided:

Acknowledgments:

SoBigData++

This work has been supported by the QuaDaSh project "Finanziato dall’Unione europea---Next Generation EU, Missione 4 Componente 2 CUP B53D23026250001".

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quapy-0.2.0.tar.gz (124.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quapy-0.2.0-py3-none-any.whl (140.4 kB view details)

Uploaded Python 3

File details

Details for the file quapy-0.2.0.tar.gz.

File metadata

  • Download URL: quapy-0.2.0.tar.gz
  • Upload date:
  • Size: 124.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for quapy-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b8da33548acbef8637abc85451590156852614eef7c7f3821cdb75124be52a87
MD5 d776539d3f7ca17743e10949b6b4b31c
BLAKE2b-256 351480ea5aa7743fa888f41d7c4de22fe7ffbda396b527258fbd6238ee8b8ef7

See more details on using hashes here.

File details

Details for the file quapy-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: quapy-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 140.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for quapy-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f93ecda7afd960f52c3fa96a5f75d7dbe14cd1e61492ab3865c7b4a12f3f224d
MD5 fbc369a15f2ffc6629f675935f445605
BLAKE2b-256 c446130f09a59e67c2c8cd1515c6db97cc7b04e74c31ececf7025eba7d6425b6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page