Skip to main content

Python data validation library

Project description

Library for data quality process

nefertem is an exetensible framework for monitoring and managing data quality processes. With nefertem you can define your own data quality process, run them and get the results. You can also create specific plugins that enable the use of your favourite data quality frameworks.

Overview

nefertem adopt a run execution model. A user defines an execution run with a client that handles also the I/O storages. Every run is executed under an experiment, an organizational unit.

Running nefertem produces in-memory objects, deriving from the execution frameworks plugged-in (e.g. frictionless, ydata_profiling, etc.), a bunch of process descriptive metadata and a series of artifacts that can be persisted on various backend storage.

The typical workflow involves the configuration of the resources, of the input storages in which the resources are saved (local or remote filesystems, databases and datalakes) and the configuration of the run itself, where the user specifies the desired operations and the frameworks to be used.

Out-of-the-box nefertem supports the following data quality operation:

  • Validation
  • Inference
  • Profiling
  • Metrics

Example

import nefertem

# Set configurations
output_path = "./nefertem_run"
store = {"name": "local", "store_type": "local"}
data_resource = {
    "name": "resource_name",
    "path": "path/to/resource",
    "store": "local",
}
run_config = {
    "operation": "inference",
    "exec_config": [{"framework": "frictionless"}]
}

# Create a client and run
client = nefertem.create_client(output_path=output_path, store=[store])
with client.create_run([data_resource], run_config) as nt_run:
    nt_run.infer()
    nt_run.log_schema()
    nt_run.persist_schema()

Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nefertem-2.0.3.tar.gz (23.6 kB view details)

Uploaded Source

Built Distribution

nefertem-2.0.3-py3-none-any.whl (42.2 kB view details)

Uploaded Python 3

File details

Details for the file nefertem-2.0.3.tar.gz.

File metadata

  • Download URL: nefertem-2.0.3.tar.gz
  • Upload date:
  • Size: 23.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for nefertem-2.0.3.tar.gz
Algorithm Hash digest
SHA256 37151227dbf6359565febc2a68cf6bbd1472a71c214333ceca8920ee19690cdf
MD5 2bac05aa0bf9286e03a2ab1b40b4b0af
BLAKE2b-256 a3ac3fdbcfeda9fb54f40afc5e27c43493cb0e02e2df19f95a9b3b82c7162e5f

See more details on using hashes here.

Provenance

File details

Details for the file nefertem-2.0.3-py3-none-any.whl.

File metadata

  • Download URL: nefertem-2.0.3-py3-none-any.whl
  • Upload date:
  • Size: 42.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for nefertem-2.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0f8210122292d4e12c891890716dfedc852217d2e22f8c70ef8414d4083a7163
MD5 4c9cc975db6a99726f9653f049467d03
BLAKE2b-256 4ae1f36b871bbd58d3f57c5a82459e7afebd6f86a6604d702526791a3a0735e8

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page