CLI interface for Desbordante platform
Project description
Desbordante: high-performance data profiler (console interface)
What is it?
Desbordante is a high-performance data profiler oriented towards exploratory data analysis. This is the repository for the Desbordante console interface, which is published as a separate package. This package depends on the desbordante package, which contains the C++ code for pattern discovery and validation. As the result, depending on the algorithm and dataset, the runtimes may be cut by 2-10 times compared to the alternative tools.
Table of Contents
Main Features
Desbordante is a high-performance data profiler that is capable of discovering and validating many different patterns in data using various algorithms.
The Discovery task is designed to identify all instances of a specified pattern type of a given dataset.
The Validation task is different: it is designed to check whether a specified pattern instance is present in a given dataset. This task not only returns True or False, but it also explains why the instance does not hold (e.g. it can list table rows with conflicting values).
The currently supported data patterns are:
- Exact functional dependencies (discovery and validation)
- Approximate functional dependencies, with
- $g_1$ metric — classic AFDs (discovery and validation)
- $\mu+$ metric (discovery)
- $\tau$ metric (discovery)
- $pdep$ metric (discovery)
- $\rho$ metric (discovery)
- Probabilistic functional dependencies, with PerTuple and PerValue metrics (discovery and validation)
- Classic soft functional dependencies (with corellations), with $\rho$ metric (discovery and validation)
- Numerical dependencies (validation)
- Graph functional dependencies (validation)
- Conditional functional dependencies (discovery)
- Inclusion dependencies (discovery)
- Order dependencies:
- set-based axiomatization (discovery)
- list-based axiomatization (discovery)
- Metric functional dependencies (validation)
- Fuzzy algebraic constraints (discovery)
- Differential Dependencies (discovery)
- Unique column combinations:
- Approximate unique column combination, with g1 metric (discovery and validation)
- Approximate unique column combination, with $g_1$ metric (discovery and validation)
- Association rules (discovery)
For more information about the supported patterns check the main repo.
Installation
Requrements:
- Python 3.11+
- pipx
desbordantepackage requirements
PyPI
Run the following command:
pipx install desbordante-cli
Git
pipx install git+https://github.com/desbordante/desbordante-cli
Usage examples
Example datasets can be found at main repo
- Discover all exact functional dependencies in a table stored in a comma-separated file with a header row. In this example the default FD discovery algorithm (HyFD) is used.
desbordante --task=fd --table=../examples/datasets/university_fd.csv , True
[Course Classroom] -> Professor
[Classroom Semester] -> Professor
[Classroom Semester] -> Course
[Professor] -> Course
[Professor Semester] -> Classroom
[Course Semester] -> Classroom
[Course Semester] -> Professor
- Discover all approximate functional dependencies with error less than or equal to 0.1 in a table represented by a .csv file that uses a comma as the separator and has a header row. In this example the default AFD discovery algorithm (Pyro) is used.
desbordante --task=afd --algo=tane --table=../examples/datasets/inventory_afd.csv , True --afd_error_measure=g1 --error=0.1
[Id] -> ProductName
[Id] -> Price
[ProductName] -> Price
- Check whether metric functional dependency “Title -> Duration” with radius 5 (using the Euclidean metric) holds in a table represented by a .csv file that uses a comma as the separator and has a header row. In this example the default MFD validation algorithm (BRUTE) is used.
desbordante --task=mfd_verification --table=../examples/datasets/theatres_mfd.csv , True --lhs_indices=0 --rhs_indices=2 --metric=euclidean --parameter=5
True
For more information check the --help option:
desbordante --help
Contacts and Q&A
If you have any questions regarding the tool you can create an issue at GitHub.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file desbordante_cli-1.1.1.tar.gz.
File metadata
- Download URL: desbordante_cli-1.1.1.tar.gz
- Upload date:
- Size: 25.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d8a866c132a4897211fb4c57e01f0fbac6f2ac6a2d6f1e3fb354fab2e76b0ae0
|
|
| MD5 |
e289713897039938c732d214b73a3e67
|
|
| BLAKE2b-256 |
081f4cd07c2f1c8c2ffd78cadf53583b3902f6ef6e1375be709c50ecf9cc0e1f
|
Provenance
The following attestation bundles were made for desbordante_cli-1.1.1.tar.gz:
Publisher:
release.yml on Desbordante/desbordante-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
desbordante_cli-1.1.1.tar.gz -
Subject digest:
d8a866c132a4897211fb4c57e01f0fbac6f2ac6a2d6f1e3fb354fab2e76b0ae0 - Sigstore transparency entry: 153678793
- Sigstore integration time:
-
Permalink:
Desbordante/desbordante-cli@408c93c863aa531da1d988e0b6c17151a4bf5511 -
Branch / Tag:
refs/tags/1.1.1 - Owner: https://github.com/Desbordante
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@408c93c863aa531da1d988e0b6c17151a4bf5511 -
Trigger Event:
release
-
Statement type:
File details
Details for the file desbordante_cli-1.1.1-py3-none-any.whl.
File metadata
- Download URL: desbordante_cli-1.1.1-py3-none-any.whl
- Upload date:
- Size: 25.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7224300e53baca42d8c22c76acd5d2694177cdc4d4f984bb869571525f0f9ff2
|
|
| MD5 |
22c819384f93e15f527902977f6b28c2
|
|
| BLAKE2b-256 |
8949d4b45e44a1fab9cdae5af72adbd43a5228f4bb79cd4b1facf9557f51ec7d
|
Provenance
The following attestation bundles were made for desbordante_cli-1.1.1-py3-none-any.whl:
Publisher:
release.yml on Desbordante/desbordante-cli
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
desbordante_cli-1.1.1-py3-none-any.whl -
Subject digest:
7224300e53baca42d8c22c76acd5d2694177cdc4d4f984bb869571525f0f9ff2 - Sigstore transparency entry: 153678794
- Sigstore integration time:
-
Permalink:
Desbordante/desbordante-cli@408c93c863aa531da1d988e0b6c17151a4bf5511 -
Branch / Tag:
refs/tags/1.1.1 - Owner: https://github.com/Desbordante
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@408c93c863aa531da1d988e0b6c17151a4bf5511 -
Trigger Event:
release
-
Statement type: