Skip to main content

The Python Data Valuation Library

Project description

pyDVL Logo

A library for data valuation.

Build Status

Docs

Installation

To install the latest release use:

$ pip install pyDVL

You can also install the latest development version from TestPyPI:

pip install pyDVL --index-url https://test.pypi.org/simple/

For more instructions and information refer to the Installing pyDVL section of the documentation.

Usage

pyDVL requires Memcached in order to cache certain results and speed-up computation.

You need to run it either locally or using Docker:

docker container run -it --rm -p 11211:11211 memcached:latest -v

Caching is enabled by default but can be disabled if not needed or desired.

Once that's done you should start by creating a Dataset object with your train and test splits. Then, you should create a model instance and a Utility object that will wrap the dataset, the model and the scoring function. Finally, you should use one of the methods defined in the library to compute the data valuation. Here we use Truncated Montecarlo Shapley because it is the most efficient.

Put all together:

import numpy as np
from pydvl.utils import Dataset, Utility
from pydvl.shapley.montecarlo import truncated_montecarlo_shapley
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split

X, y = np.arange(100).reshape((50, 2)), np.arange(50)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.5, random_state=16
)
dataset = Dataset(X_train, X_test, y_train, y_test)
model = LinearRegression()
utility = Utility(model, dataset)
values, errors = truncated_montecarlo_shapley(u=utility, max_iterations=100)

For more instructions and information refer to the Getting Started section of the documentation

Refer to the Examples section of the documentation for more detailed examples.

Contributing

Please open new issues for bugs, feature requests and extensions. See more details about the structure and workflow in the developer's readme.

License

pyDVL is distributed under LGPL-3.0. A complete version can be found in two files: here and here.

All contributions will be distributed under this license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyDVL-0.1.0.tar.gz (63.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyDVL-0.1.0-py3-none-any.whl (71.8 kB view details)

Uploaded Python 3

File details

Details for the file pyDVL-0.1.0.tar.gz.

File metadata

  • Download URL: pyDVL-0.1.0.tar.gz
  • Upload date:
  • Size: 63.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.14

File hashes

Hashes for pyDVL-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0f664aed61f8040fcf96ddcc4f186fdd28bcd0da8ce1c447644123e41b27e8af
MD5 95cc7cf917fa4ea1b18effcf4cdcf84a
BLAKE2b-256 8b82ea4561e788b8f58965967b0ed0e1310f1a6aa19334bc201c3c93125e31d8

See more details on using hashes here.

File details

Details for the file pyDVL-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pyDVL-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 71.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.14

File hashes

Hashes for pyDVL-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5d550ce84b0fa4ba1654fa1e1c1cf9e656f3904b05cc6f743444a814d7b9f2bc
MD5 12ffa94307cb3c17ed63b221648ae094
BLAKE2b-256 1c2bae02cf682394543ab1019df71f9e30d921b442aef156f936eead6879df4c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page