Skip to main content

CLI interface for Desbordante platform

Project description


Desbordante: high-performance data profiler (console interface)

What is it?

Desbordante is a high-performance data profiler oriented towards exploratory data analysis. This is the repository for the Desbordante console interface, which is published as a separate package. This package depends on the desbordante package, which contains the C++ code for pattern discovery and validation. As the result, depending on the algorithm and dataset, the runtimes may be cut by 2-10 times compared to the alternative tools.

Table of Contents

Main Features

Desbordante is a high-performance data profiler that is capable of discovering and validating many different patterns in data using various algorithms.

The Discovery task is designed to identify all instances of a specified pattern type of a given dataset.

The Validation task is different: it is designed to check whether a specified pattern instance is present in a given dataset. This task not only returns True or False, but it also explains why the instance does not hold (e.g. it can list table rows with conflicting values).

The currently supported data patterns are:

  • Functional dependency variants:
    • Exact functional dependencies (discovery and validation)
    • Approximate functional dependencies, with g1 metric (discovery and validation)
    • Probabilistic functional dependencies, with PerTuple and PerValue metrics (discovery)
  • Graph functional dependencies (validation)
  • Conditional functional dependencies (discovery)
  • Inclusion dependencies (discovery)
  • Order dependencies:
    • set-based axiomatization (discovery)
    • list-based axiomatization (discovery)
  • Metric functional dependencies (validation)
  • Fuzzy algebraic constraints (discovery)
  • Unique column combinations:
    • Exact unique column combination (discovery and validation)
    • Approximate unique column combination, with g1 metric (discovery and validation)
  • Association rules (discovery)

For more information about the supported patterns check the main repo.

Installation

Requrements:

PyPI

Run the following command:

pipx install desbordante-cli

Git

pipx install git+https://github.com/desbordante/desbordante-cli

Usage examples

Example datasets can be found at main repo

  1. Discover all exact functional dependencies in a table stored in a comma-separated file with a header row. In this example the default FD discovery algorithm (HyFD) is used.
desbordante --task=fd --table=../examples/datasets/university_fd.csv , True
[Course Classroom] -> Professor
[Classroom Semester] -> Professor
[Classroom Semester] -> Course
[Professor] -> Course
[Professor Semester] -> Classroom
[Course Semester] -> Classroom
[Course Semester] -> Professor
  1. Discover all approximate functional dependencies with error less than or equal to 0.1 in a table represented by a .csv file that uses a comma as the separator and has a header row. In this example the default AFD discovery algorithm (Pyro) is used.
desbordante --task=afd --table=../examples/datasets/inventory_afd.csv , True --error=0.1
[Id] -> ProductName
[Id] -> Price
[ProductName] -> Price
  1. Check whether metric functional dependency “Title -> Duration” with radius 5 (using the Euclidean metric) holds in a table represented by a .csv file that uses a comma as the separator and has a header row. In this example the default MFD validation algorithm (BRUTE) is used.
desbordante --task=mfd_verification --table=../examples/datasets/theatres_mfd.csv , True --lhs_indices=0 --rhs_indices=2 --metric=euclidean --parameter=5
True

For more information check the --help option:

desbordante --help

Contacts and Q&A

If you have any questions regarding the tool you can create an issue at GitHub.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

desbordante_cli-1.1.0.tar.gz (25.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

desbordante_cli-1.1.0-py3-none-any.whl (25.1 kB view details)

Uploaded Python 3

File details

Details for the file desbordante_cli-1.1.0.tar.gz.

File metadata

  • Download URL: desbordante_cli-1.1.0.tar.gz
  • Upload date:
  • Size: 25.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for desbordante_cli-1.1.0.tar.gz
Algorithm Hash digest
SHA256 f81d94cfb4fee234b4167a94333fa6c2485059f03f31de9da12438fdbce71535
MD5 c7b86b574c6f21cc41bc72e67c50c7ae
BLAKE2b-256 313e36b6c8549080ac821599cd501ad99b5d5b4d4a0e5f8b77f0eedd8f14c06f

See more details on using hashes here.

Provenance

The following attestation bundles were made for desbordante_cli-1.1.0.tar.gz:

Publisher: release.yml on Desbordante/desbordante-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file desbordante_cli-1.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for desbordante_cli-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4f511de492f88ba919f6f308715acc317d02f5c0c4920d25ace46ec84e66c416
MD5 38d1fb53d839cf1c15d0c39fcff7b861
BLAKE2b-256 6a88bccdea56f0344f479e9c025b8b2cef73b2efd00c69d8768b8ee1fea364aa

See more details on using hashes here.

Provenance

The following attestation bundles were made for desbordante_cli-1.1.0-py3-none-any.whl:

Publisher: release.yml on Desbordante/desbordante-cli

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page