Skip to main content

The easiest way to validate your data streams in Python. Whether you have small JSON files or massive CSV dumps, this tool ensures your data isn't garbage.

Project description

PyData Constraints

License Python CI PRs Welcome

PyData Constraints is the easiest way to validate your data streams in Python. Whether you have small JSON files or massive CSV dumps, this tool ensures your data isn't garbage.


🚀 Why PyData Constraints?

  • Rule-Based: Define your rules in simple JSON/YAML files. No coding required.
  • Universal: Works with JSON and CSV out of the box using efficient streaming.
  • Developer Friendly: Written in pure Python with minimal dependencies.

⚡ Basic Use Case (At a glance)

Imagine you have a users.json file and you want to ensure all emails are valid.

1. Your Data (users.json):

[
  { "id": 1, "email": "alice@example.com" },
  { "id": 2, "email": "bob-has-no-domain" }
]

2. Your Rules (config.json):

{
  "sources": [
    {
      "service": "users",
      "type": "file",
      "path": "users.json",
      "format": "json"
    }
  ],
  "constraints": [
    {
      "type": "format",
      "id": "valid-email",
      "service": "users",
      "field": "email",
      "regex": "^[^\\s@]+@[^\\s@]+\\.[^\\s@]+$",
      "message": "Invalid email: {{email}}"
    }
  ]
}

3. Run and get results:

$ data-constraints validate --config config.json
[INFO] Validating data...
[valid-email] (format) Invalid email: bob-has-no-domain
Validation finished. Found 1 issues.

📦 Installation

To install via pip:

pip install pydata-constraints

This installs both the python package pydata_constraints and the CLI command data-constraints.

🧠 Basic Concepts

PyData Constraints works with three core files:

  1. Data Files: Your actual data dumps in .json or .csv format.
  2. Config File: A JSON/YAML file pointing the engine to your data and rules.
  3. Constraints (Rules): The definitions of what is valid.

Rule Types at a Glance

  • 📝 Format: Ensure strings look correct (e.g. Emails).
    • Example: "regex": "^[^\\s@]+@[^\\s@]+\\.[^\\s@]+$"
  • 🆔 Unique: Ensure no duplicate IDs exist across a file.
    • Example: "field": "employee_id"
  • 🔗 Foreign Key: Ensure referenced IDs actually exist in another file.
    • Example: order.userId must exist in users.id

📚 Documentation

For full documentation, guides and advanced use cases, please check the docs/ directory.

  • Key Concepts: Easy-to-understand explanation of file types and constraints.
  • User Guide: The comprehensive guide to using the CLI and defining rules.
  • Integration Guide: How to integrate the engine programmatically in Python.
  • Examples: Runnable examples, ranging from simple to e-commerce.

🛠️ Features

Feature Description
Format Validation Regex-based validation for strings (Emails, Phones, Codes).
Unique Validation Ensure IDs and codes are unique across your dataset.
Foreign Keys Validate relationships between different files (e.g. order.userId -> user.id).
Multiple Reporters Output results to Console, JSON, or Markdown files.

🤝 Contributing

In PyData Constraints contributions, bug reports, and feature requests are welcome. If you have ideas, just launch your PRs!

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydata_constraints-1.0.1.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydata_constraints-1.0.1-py3-none-any.whl (23.4 kB view details)

Uploaded Python 3

File details

Details for the file pydata_constraints-1.0.1.tar.gz.

File metadata

  • Download URL: pydata_constraints-1.0.1.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for pydata_constraints-1.0.1.tar.gz
Algorithm Hash digest
SHA256 2462ed588cad25ff378167bda80a22b2e2a9f9a1b12286473555f68aadbbb440
MD5 86cd3a3ad78179e9372b490d52bba76b
BLAKE2b-256 5fe7a634be47dac5d2135191fbe0a2e424789bb454cad80f5a9b41533d9a46c1

See more details on using hashes here.

File details

Details for the file pydata_constraints-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pydata_constraints-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2619dca9e80c8a9f3e3e4b1e57ddae5d63464142d725e61dd7d9a017925937c1
MD5 71c61aab107ee552bf60018f40bf68de
BLAKE2b-256 63f544f5633755eecf920484ba9a1790d7eacfeb48ff0f0cc334e966a4e21418

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page