Systematic comparisons of multiple datasets
Project description
DataComp: A Python Framework for Systematic Dataset Comparisons
Description
DataComp is an open source Python package for domain independent multimodal longitudinal dataset comparisons. It serves as an investigative toolbox to assess differences between multiple datasets on feature level. DataComp empowers data analysts to identify significantly different and not significantly difference between datasets and thereby is helpful to identify comparable dataset combinations.
Typical application scenarios are:
Identifying comparable datasets that can be used in machine learning approaches as training and independent test data
Evaluate if, how and where simulated or synthetic datasets deviate from real world data
Assess (systematic) differences across multiple datasets (for example multiple sampling sites)
Conducting multiple statistical comparisons
Comparative visualizations
The figure above depicts a typical DataComp workflow.
Main Features
DataComp supports:
Evaluating and visualizing the overlap in features across datasets
Parametric and nonparametric statistical hypothesis testing to compare feature value distributions
Creating comparative plots of feature value distributions
Normalizing time series data to baseline and statistically comparing the progression of features over time
Comparative visualization of feature progression over time
Hierarchical clustering of the entities in the data sets to evaluate if dataset membership labels are evenly distributed across clusters or assigned to distinct clusters
Performing a MANOVA to assess the influence of features onto the dataset membership
Installation
pip install datacomp
Documentation
The full package documentation can be found here.
Application examples
Example notebooks showcasing Datacomp workflows and results on simulated data can be found at DataComp_Examples:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file datacomp-0.0.6.tar.gz
.
File metadata
- Download URL: datacomp-0.0.6.tar.gz
- Upload date:
- Size: 21.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.5.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3980c5300702c3561da8e8f61709d8adcc581af14e7007a129c5d8e8e2c8ef6a |
|
MD5 | 7ce0f2bb9766e653711318b051f7bf3b |
|
BLAKE2b-256 | ef12f528202bdd6edfebd7f56999ba017ebc5519457cd82abbdb5af05055daf4 |
File details
Details for the file datacomp-0.0.6-py3-none-any.whl
.
File metadata
- Download URL: datacomp-0.0.6-py3-none-any.whl
- Upload date:
- Size: 27.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.5.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 897a43ab8d835d5fc5b7937e8628557a2ab5bba236d308a918d9b4e283fa002c |
|
MD5 | c7f80c16a762874365e19ff98df1b588 |
|
BLAKE2b-256 | 39b8c40f97d8c2d220778c49bb004ceacc84494f9ea008c3f5ec30986a21563f |