Skip to main content

Python data validation library

Project description

Library for data quality process

nefertem is an exetensible framework for monitoring and managing data quality processes. With nefertem you can define your own data quality process, run them and get the results. You can also create specific plugins that enable the use of your favourite data quality frameworks.

Overview

nefertem adopt a run execution model. A user defines an execution run with a client that handles also the I/O storages. Every run is executed under an experiment, an organizational unit.

Running nefertem produces in-memory objects, deriving from the execution frameworks plugged-in (e.g. frictionless, ydata_profiling, etc.), a bunch of process descriptive metadata and a series of artifacts that can be persisted on various backend storage.

The typical workflow involves the configuration of the resources, of the input storages in which the resources are saved (local or remote filesystems, databases and datalakes) and the configuration of the run itself, where the user specifies the desired operations and the frameworks to be used.

Out-of-the-box nefertem supports the following data quality operation:

  • Validation
  • Inference
  • Profiling
  • Metrics

Example

import nefertem

# Set configurations
output_path = "./nefertem_run"
store = {"name": "local", "store_type": "local"}
data_resource = {
    "name": "resource_name",
    "path": "path/to/resource",
    "store": "local",
}
run_config = {
    "operation": "inference",
    "exec_config": [{"framework": "frictionless"}]
}

# Create a client and run
client = nefertem.create_client(output_path=output_path, store=[store])
with client.create_run([data_resource], run_config) as nt_run:
    nt_run.infer()
    nt_run.log_schema()
    nt_run.persist_schema()

Documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nefertem-2.0.6.tar.gz (23.7 kB view details)

Uploaded Source

Built Distribution

nefertem-2.0.6-py3-none-any.whl (42.6 kB view details)

Uploaded Python 3

File details

Details for the file nefertem-2.0.6.tar.gz.

File metadata

  • Download URL: nefertem-2.0.6.tar.gz
  • Upload date:
  • Size: 23.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for nefertem-2.0.6.tar.gz
Algorithm Hash digest
SHA256 0dd8da329a03cff928300d312a6cd07f176f593a25d606e8805c115fbde46814
MD5 a592674726f3ccdb48b329dce238e1fa
BLAKE2b-256 d99c31802af8246be656f5dfdf7d1cae93cf5b44aad434eebc4bbec5bab7a72d

See more details on using hashes here.

File details

Details for the file nefertem-2.0.6-py3-none-any.whl.

File metadata

  • Download URL: nefertem-2.0.6-py3-none-any.whl
  • Upload date:
  • Size: 42.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.18

File hashes

Hashes for nefertem-2.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 a552f3b34b6a33877ba63096bff1598a528bd60e371379ac29cd4cc9917b6280
MD5 0edcd051743fb848167fa8e00d503779
BLAKE2b-256 b1d4dc93a8a76ec798eb842811ee330e4402e852c10f769780437445275c212a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page