A framework for doing stability analysis with PCS.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

A library for making stability analysis simple, following the veridical data-science framework.

Why use `vflow`?

Using vflow's simple wrappers easily enables many best practices for data science, and makes writing pipelines easy.

Stability	Computation	Reproducibility
Replace a single function (e.g. preprocessing) with a set of functions and easily assess the stability of downstream results	Automatic parallelization and caching throughout the pipeline	Automatic experiment tracking and saving

Here we show a simple example of an entire data-science pipeline with several perturbations (e.g. different data subsamples, models, and metrics) written simply using vflow.

import sklearn
from sklearn.metrics import accuracy_score, balanced_accuracy_score
from vflow import init_args, Vset

# initialize data
X, y = sklearn.datasets.make_classification()
X_train, X_test, y_train, y_test = init_args(
    sklearn.model_selection.train_test_split(X, y),
    names=['X_train', 'X_test', 'y_train', 'y_test']  # optionally name the args
)

# subsample data
subsampling_funcs = [
    sklearn.utils.resample for _ in range(3)
]
subsampling_set = Vset(name='subsampling',
                       modules=subsampling_funcs,
                       output_matching=True)
X_trains, y_trains = subsampling_set(X_train, y_train)

# fit models
models = [
    sklearn.linear_model.LogisticRegression(),
    sklearn.tree.DecisionTreeClassifier()
]
modeling_set = Vset(name='modeling',
                    modules=models,
                    module_keys=["LR", "DT"])
modeling_set.fit(X_trains, y_trains)
preds_test = modeling_set.predict(X_test)

# get metrics
binary_metrics_set = Vset(name='binary_metrics',
                          modules=[accuracy_score, balanced_accuracy_score],
                          module_keys=["Acc", "Bal_Acc"])
binary_metrics = binary_metrics_set.evaluate(preds_test, y_test)

Once we've written this pipeline, get very easily see how stable certain metrics (e.g. "Acc") are to our choice of subsampling or model.

Documentation

See the docs for reference on the API

Examples

Synthetic classification example

Enhancer example

fMRI example

Installation

Install with pip install vflow (see here for help). For dev version (unstable), clone the repo and run python setup.py develop from the repo directory.

References

interface: easily build on scikit-learn and dvc (data version control)
computation: integration with ray and caching with joblib
tracking: mlflow
pull requests very welcome! (see contributing.md)

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.1.4

Feb 12, 2024

0.1.3

Feb 10, 2024

0.1.2

Mar 14, 2023

0.1.1

Jan 12, 2022

0.1.0

Jan 10, 2022

This version

0.0.2

Nov 3, 2021

0.0.1

Sep 6, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vflow-0.0.2.tar.gz (452.1 kB view hashes)

Uploaded Nov 3, 2021 Source

Built Distribution

vflow-0.0.2-py3-none-any.whl (18.9 kB view hashes)

Uploaded Nov 3, 2021 Python 3

Hashes for vflow-0.0.2.tar.gz

Hashes for vflow-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`26eeb3aae591ab5c8343d19f2071c20834b587286d7fcad584a1872d52f34f4c`
MD5	`d5f978729c916c0e129aa4345726e3d1`
BLAKE2b-256	`f791c782333b7d788ecd0994c3eeeee2eb7a983a06f2ab879852daecd1d9b652`

Hashes for vflow-0.0.2-py3-none-any.whl

Hashes for vflow-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`cbc6d7b0383de0ad6dec372960152643d7d874818cc45be00e7b2fc37afd0ec0`
MD5	`5bff1211ee3b4eb383d5fa1bfe02518d`
BLAKE2b-256	`49de1222f6c27d30d6d75a0807956af62c86c21a690756fc210fac1680204690`

vflow 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Why use `vflow`?

Documentation

Installation

References

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

vflow 0.0.2

Navigation

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Project description

Why use vflow?

Documentation

Installation

References

Project details

Verified details

Maintainers

Unverified details

Project links

GitHub Statistics

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

Why use `vflow`?