Skip to main content

Positive-unlabeled learning with Python

Project description

PyPI-Status PyPI-Versions Build-Status Codecov LICENCE

Website: https://pulearn.github.io/pulearn/

Documentation: https://pulearn.github.io/pulearn/doc/pulearn/

from pulearn import ElkanotoPuClassifier
from sklearn.svm import SVC
svc = SVC(C=10, kernel='rbf', gamma=0.4, probability=True)
pu_estimator = ElkanotoPuClassifier(estimator=svc, hold_out_ratio=0.2)
pu_estimator.fit(X, y)

1 Documentation

This is the repository of the pulearn package, and this readme file is aimed to help potential contributors to the project.

To learn more about how to use pulearn, either visit pulearn’s homepage or read the online documentation of pulearn.

2 Installation

Install pulearn with:

pip install pulearn

3 Implemented Classifiers

Elkanoto

Scikit-Learn wrappers for both the methods mentioned in the paper by Elkan and Noto, “Learning classifiers from only positive and unlabeled data” (published in Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2008).

These wrap the Python code from a fork by AdityaAS (with implementation to both methods) to the original repository by Alexandre Drouin implementing one of the methods.

3.1 Classic Elkanoto

To use the classic (unweighted) method, use the ElkanotoPuClassifier class:

from pulearn import ElkanotoPuClassifier
from sklearn.svm import SVC
svc = SVC(C=10, kernel='rbf', gamma=0.4, probability=True)
pu_estimator = ElkanotoPuClassifier(estimator=svc, hold_out_ratio=0.2)
pu_estimator.fit(X, y)

3.2 Weighted Elkanoto

To use the weighted method, use the WeightedElkanotoPuClassifier class:

from pulearn import WeightedElkanotoPuClassifier
from sklearn.svm import SVC
svc = SVC(C=10, kernel='rbf', gamma=0.4, probability=True)
pu_estimator = WeightedElkanotoPuClassifier(
    estimator=svc, labeled=10, unlabeled=20, hold_out_ratio=0.2)
pu_estimator.fit(X, y)

See the original paper for details on how the labeled and unlabeled quantities are used to weigh training examples and affect the learning process: https://cseweb.ucsd.edu/~elkan/posonly.pdf.

4 Examples

A nice code example of the classic Elkan-Noto classifier used for classification on the Wisconsin breast cancer dataset , comparing it to a regular random forest classifer, can be found in the examples directory.

To run it, clone the repository, and run the following command from the root of the repository, with a python environment where pulearn is installed:

python examples/BreastCancerElkanotoExample.py

You should see a nice plot, like the one below, comparing the F1 score of the PU learner versus a naive learner, demonstrating how PU learning becomes more powerful the more positive examples are “hidden” from the training set.

5 Contributing

Package author and current maintainer is Shay Palachy (shay.palachy@gmail.com); You are more than welcome to approach him for help. Contributions are very welcomed, especially since this package is very much in its infancy and many other pipeline stages can be added.

5.1 Installing for development

Clone:

git clone git@github.com:shaypal5/pulearn.git

Install in development mode with test dependencies:

cd pulearn
pip install -e ".[test]"

5.2 Running the tests

To run the tests, use:

python -m pytest

Notice pytest runs are configured by the pytest.ini file. Read it to understand the exact pytest arguments used.

5.3 Adding tests

At the time of writing, pulearn is maintained with a test coverage of 100%. Although challenging, I hope to maintain this status. If you add code to the package, please make sure you thoroughly test it. Codecov automatically reports changes in coverage on each PR, and so PR reducing test coverage will not be examined before that is fixed.

Tests reside under the tests directory in the root of the repository. Each model has a separate test folder, with each class - usually a pipeline stage - having a dedicated file (always starting with the string “test”) containing several tests (each a global function starting with the string “test”). Please adhere to this structure, and try to separate tests cases to different test functions; this allows us to quickly focus on problem areas and use cases. Thank you! :)

5.4 Code style

pdpip code is written to adhere to the coding style dictated by flake8. Practically, this means that one of the jobs that runs on the project’s Travis for each commit and pull request checks for a successfull run of the flake8 CLI command in the repository’s root. Which means pull requests will be flagged red by the Travis bot if non-flake8-compliant code was added.

To solve this, please run flake8 on your code (whether through your text editor/IDE or using the command line) and fix all resulting errors. Thank you! :)

5.5 Adding documentation

This project is documented using the numpy docstring conventions, which were chosen as they are perhaps the most widely-spread conventions that are both supported by common tools such as Sphinx and result in human-readable docstrings (in my personal opinion, of course). When documenting code you add to this project, please follow these conventions.

Additionally, if you update this README.rst file, use python setup.py checkdocs to validate it compiles.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pulearn-0.0.1.tar.gz (23.5 kB view details)

Uploaded Source

Built Distribution

pulearn-0.0.1-py3-none-any.whl (8.1 kB view details)

Uploaded Python 3

File details

Details for the file pulearn-0.0.1.tar.gz.

File metadata

  • Download URL: pulearn-0.0.1.tar.gz
  • Upload date:
  • Size: 23.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.5

File hashes

Hashes for pulearn-0.0.1.tar.gz
Algorithm Hash digest
SHA256 566f8545f5bc621f5114ecdc1db3082ac25252b4d96757a86fe70175b6b7dff0
MD5 6873fef3288a49bb72956e5cbefc79f9
BLAKE2b-256 4551b517193dfcb4389c68f9d7af72a80885a588b2c43033928e074ed16cc610

See more details on using hashes here.

File details

Details for the file pulearn-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: pulearn-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 8.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.40.2 CPython/3.7.5

File hashes

Hashes for pulearn-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 df8c9c4068132bbc4b3af4a7ba15965bee82da23625d445a7565098049dd2c8b
MD5 ac236f1fa799678d32456e287824024d
BLAKE2b-256 cda118ef24674f1336d1b764cb2577c5c280cba5c96344ffb1fa5c8568309932

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page