datacomp

Systematic comparisons of multiple datasets

These details have not been verified by PyPI

Project links

Homepage

Operating System
- OS Independent
Programming Language

Project description

DataComp: A Python Framework for Systematic Dataset Comparisons

Current version on PyPI Apache 2.0 License Stable Supported Python Versions

Description

DataComp is an open source Python package for domain independent multimodal longitudinal dataset comparisons. It serves as an investigative toolbox to assess differences between multiple datasets on feature level. DataComp empowers data analysts to identify significantly different and not significantly difference between datasets and thereby is helpful to identify comparable dataset combinations.

Typical application scenarios are:

Identifying comparable datasets that can be used in machine learning approaches as training and independent test data
Evaluate if, how and where simulated or synthetic datasets deviate from real world data
Assess (systematic) differences across multiple datasets (for example multiple sampling sites)
Conducting multiple statistical comparisons
Comparative visualizations

The figure above depicts a typical DataComp workflow.

Main Features

DataComp supports:

Evaluating and visualizing the overlap in features across datasets
Parametric and nonparametric statistical hypothesis testing to compare feature value distributions
Creating comparative plots of feature value distributions
Normalizing time series data to baseline and statistically comparing the progression of features over time
Comparative visualization of feature progression over time
Hierarchical clustering of the entities in the data sets to evaluate if dataset membership labels are evenly distributed across clusters or assigned to distinct clusters
Performing a MANOVA to assess the influence of features onto the dataset membership

Installation

pip install datacomp

Documentation

The full package documentation can be found here.

Application examples

Example notebooks showcasing Datacomp workflows and results on simulated data can be found at DataComp_Examples:

Cross-sectional Comparison Example

Longitudinal Comparison Example

Project details

These details have not been verified by PyPI

Project links

Homepage

Operating System
- OS Independent
Programming Language

Release history Release notifications | RSS feed

This version

0.0.6

Mar 13, 2019

0.0.5

Mar 13, 2019

0.0.5.dev0 pre-release

Mar 13, 2019

0.0.4

Mar 13, 2019

0.0.3

Mar 13, 2019

0.0.1

Mar 13, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datacomp-0.0.6.tar.gz (21.5 kB view details)

Uploaded Mar 13, 2019 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

datacomp-0.0.6-py3-none-any.whl (27.5 kB view details)

Uploaded Mar 13, 2019 Python 3

File details

Details for the file datacomp-0.0.6.tar.gz.

File metadata

Download URL: datacomp-0.0.6.tar.gz
Upload date: Mar 13, 2019
Size: 21.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.5.2

File hashes

Hashes for datacomp-0.0.6.tar.gz
Algorithm	Hash digest
SHA256	`3980c5300702c3561da8e8f61709d8adcc581af14e7007a129c5d8e8e2c8ef6a`
MD5	`7ce0f2bb9766e653711318b051f7bf3b`
BLAKE2b-256	`ef12f528202bdd6edfebd7f56999ba017ebc5519457cd82abbdb5af05055daf4`

See more details on using hashes here.

File details

Details for the file datacomp-0.0.6-py3-none-any.whl.

File metadata

Download URL: datacomp-0.0.6-py3-none-any.whl
Upload date: Mar 13, 2019
Size: 27.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.5.2

File hashes

Hashes for datacomp-0.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`897a43ab8d835d5fc5b7937e8628557a2ab5bba236d308a918d9b4e283fa002c`
MD5	`c7f80c16a762874365e19ff98df1b588`
BLAKE2b-256	`39b8c40f97d8c2d220778c49bb004ceacc84494f9ea008c3f5ec30986a21563f`

See more details on using hashes here.

datacomp 0.0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

DataComp: A Python Framework for Systematic Dataset Comparisons

Description

Main Features

Installation

Documentation

Application examples

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes