Skip to main content

openclean Metanome Python Package

Project description

https://img.shields.io/pypi/pyversions/openclean-metanome.svg https://badge.fury.io/py/openclean-metanome.svg https://img.shields.io/badge/License-BSD-green.svg https://github.com/VIDA-NYU/openclean-metanome/workflows/build/badge.svg Documentation Status https://codecov.io/gh/VIDA-NYU/openclean-metanome/branch/master/graph/badge.svg?token=VL43CKXZEF
openclean Logo

About

This package is an extension for the openclean-core package. It provides access to data profiling algorithms from the Metanome project in openclean. The algorithms themselves are executable via the Metanome Wrapper that enables to run Metanome algorithms via the command line.

Installation & Configuration

The package can be installed using pip.

pip install openclean-metanome

The openclean-metanome package uses flowServ to run Metanome algorithms as serial workflows in openclean. flowServ supports two modes of execution: (1) using the Python sub-process package, and (2) using Docker.

Python Sub-Process

When running Metanome algorithms as Python sub-processes you need to have an installation of the Jave Runtime Environment (Version 8 or higher) on your local machine. You also need a local copy of the Metanome.jar wrapper. The file can be downloaded from Zenodo <https://zenodo.org/record/4604964#.YE9tif4pBH4>`_. The package also provides the option to download the file from within your Python scripts.

from openclean_metanome.download import download_jar

download_jar(verbose=True)

The example will download the jar file into the default directory (defined via the METANOME_JARPATH environment variable). If the variable is not set, the users default cache folder is used. Note that the Metanome.jar is currently about 75 MB in size. Make sure that the environment variable METANOME_JARPATH contains a reference to the downloaded jar-file if you did not download the file into the default location.

Docker

If you have Docker installed on your machine you can run Metanome using the provided Docker container image. To do so, make sure that the environment variable METANOME_WORKER references the configuration file docker_worker.yaml that is included in the config folder of this repository.

Algorithms

The package currently supports two data profiling algorithms.

HyFD

The HyFD algorithm (A Hybrid Approach to Functional Dependency Discovery) is a functional dependency discovery algorithm. Details about the algorithm can be found in:

Thorsten Papenbrock, Felix Naumann
A Hybrid Approach to Functional Dependency Discovery
ACM International Conference on Management of Data (SIGMOD '16)

For an example of how to use the algorithm in openclean have a look at the example notebook Run HyFD Algorithm - Example.

HyUCC

The HyUCC algorithm (A Hybrid Approach for Efficient Unique Column Combination Discovery) is a unique column combination discovery. Details about the algorithm can be found here.

For an example of how to use the algorithm in openclean have a look at the example notebook Run HyUCC Algorithm - Example.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openclean-metanome-0.2.0.tar.gz (16.4 kB view details)

Uploaded Source

Built Distribution

openclean_metanome-0.2.0-py3-none-any.whl (16.7 kB view details)

Uploaded Python 3

File details

Details for the file openclean-metanome-0.2.0.tar.gz.

File metadata

  • Download URL: openclean-metanome-0.2.0.tar.gz
  • Upload date:
  • Size: 16.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.5

File hashes

Hashes for openclean-metanome-0.2.0.tar.gz
Algorithm Hash digest
SHA256 20734847c55003bb9596daaa1bc3e73df5dd431ccae10b836bfcd6862f94a350
MD5 186d19325c7c7ee97d92d5fcaac49d39
BLAKE2b-256 3a56ffe107562aa427ba3f09d62f6138b333c4d5fd9ac783cf662a7434be5315

See more details on using hashes here.

File details

Details for the file openclean_metanome-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: openclean_metanome-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 16.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.5

File hashes

Hashes for openclean_metanome-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e1ff2c6160b5dcb2e6fbd7a6068850da9cb09bde71983a5365b5f7482cf8338a
MD5 b3022471467bcb9e722f6e8a6b56edbc
BLAKE2b-256 32ac7f88b3aa07b22463453fc45523b38c952d08bad5015ce08d76fff9d44b2a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page