A tool for generating and validating data schemas

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Project description

Smart Schema

Smart Schema is a powerful Python package for generating and validating data schemas from various data sources. It provides a flexible and intuitive way to work with structured data, particularly focusing on CSV files and JSON data.

Features

Schema Generation: Automatically generate Pydantic models from:
- CSV files
- JSON data
- Pandas DataFrames
Data Validation: Validate data against generated schemas
CSV Processing:
- Split large CSV files
- Infer column types
- Validate CSV data
Model Management: Save and load generated models
Rich CLI: User-friendly command-line interface with detailed output

Installation

As a Binary (CLI Tool)

# Using pip
pip install smart-schema

# Using pipx (recommended for CLI tools)
pipx install smart-schema

As a Library

# Using pip
pip install smart-schema

# Using Poetry
poetry add smart-schema

# From Source
git clone https://github.com/yourusername/smart-schema.git
cd smart-schema
pip install -e .

Command Line Interface

Smart Schema provides a powerful CLI tool for working with data schemas. After installation, you can use the smart-schema command.

Basic Commands

# Show help and available commands
smart-schema --help

# Show help for a specific command
smart-schema generate-model --help

Generate Models

# Generate a model from a CSV file
smart-schema generate-model data.csv --output models/product_model.py

# Generate a model with specific datetime columns
smart-schema generate-model data.csv --datetime-columns created_at,updated_at --output models/product_model.py

# Generate a model from JSON data
smart-schema generate-model data.json --type json --output models/order_model.py

Validate Data

# Validate a CSV file against a model
smart-schema validate data.csv --model models/product_model.py

# Validate and save valid records
smart-schema validate data.csv --model models/product_model.py --output valid_data.csv

# Show detailed validation errors
smart-schema validate data.csv --model models/product_model.py --verbose

Process CSV Files

# Split a large CSV file into smaller chunks
smart-schema split data.csv --rows 1000 --output split_

# Split a CSV file by column values
smart-schema split data.csv --by-column category --output category_

# Infer column types from a CSV file
smart-schema infer-types data.csv --output types.json

Common Options

# Show progress bar for long operations
smart-schema generate-model data.csv --progress

# Specify input file encoding
smart-schema generate-model data.csv --encoding utf-8

# Use a different delimiter for CSV files
smart-schema generate-model data.csv --delimiter ";"

# Skip header row in CSV files
smart-schema generate-model data.csv --no-header

Output Formats

# Save model as Python file (default)
smart-schema generate-model data.csv --output models/product_model.py

# Save model as JSON schema
smart-schema generate-model data.csv --output schema.json --format json

# Save validation report as HTML
smart-schema validate data.csv --model models/product_model.py --output report.html --format html

Examples

Generate a model from a CSV file and validate it:

# Generate model
smart-schema generate-model products.csv --output models/product_model.py

# Validate the same file
smart-schema validate products.csv --model models/product_model.py

Process a large CSV file:

# Split into 1000-row chunks
smart-schema split large_file.csv --rows 1000 --output chunks/chunk_

# Generate model from first chunk
smart-schema generate-model chunks/chunk_1.csv --output models/data_model.py

# Validate all chunks
for f in chunks/chunk_*.csv; do
    smart-schema validate "$f" --model models/data_model.py
done

Work with JSON data:

# Generate model from JSON
smart-schema generate-model config.json --type json --output models/config_model.py

# Validate JSON data
smart-schema validate data.json --model models/config_model.py --type json

Quickstart

Basic Usage

from smart_schema import ModelGenerator, ModelValidator

# Generate a model from a CSV file
generator = ModelGenerator(name="Product")
model = generator.from_dataframe(
    df,
    datetime_columns=['last_updated']
)

# Validate data against the model
validator = ModelValidator(model)
valid_records, invalid_records = validator.validate_dataframe(df)

Command Line Interface

# Generate a model from a CSV file
smart-schema generate-model data.csv --output models/product_model.py

# Validate a CSV file against a model
smart-schema validate data.csv --model models/product_model.py

# Split a large CSV file
smart-schema split data.csv --rows 1000

Detailed Usage

Generating Models

From CSV Files

from smart_schema import ModelGenerator
import pandas as pd

# Read CSV file
df = pd.read_csv('data.csv')

# Generate model
generator = ModelGenerator(name="Product")
model = generator.from_dataframe(
    df,
    datetime_columns=['created_at', 'updated_at']
)

# Save model to file
model_file = "models/product_model.py"
with open(model_file, "w") as f:
    f.write(f"from pydantic import BaseModel\n\n")
    f.write(f"class {model.__name__}(BaseModel):\n")
    for field_name, field in model.model_fields.items():
        f.write(f"    {field_name}: {field.annotation.__name__}\n")

From JSON Data

from smart_schema import ModelGenerator

# Sample JSON data
json_data = {
    "user": {
        "id": 1,
        "name": "John Doe",
        "email": "john@example.com"
    },
    "orders": [
        {
            "order_id": "ORD-001",
            "items": [
                {"product_id": "P1", "quantity": 2}
            ]
        }
    ]
}

# Generate model
generator = ModelGenerator(name="OrderSystem")
model = generator.from_json(
    json_data,
    datetime_columns=['order_created_at']
)

Validating Data

from smart_schema import ModelValidator

# Validate DataFrame
validator = ModelValidator(model)
valid_records, invalid_records = validator.validate_dataframe(df)

# Print validation results
print(f"Valid records: {len(valid_records)}")
print(f"Invalid records: {len(invalid_records)}")

if invalid_records:
    print("\nInvalid Records Details:")
    for record in invalid_records:
        print(f"\nRecord: {record['record']}")
        for error in record['errors']:
            print(f"  - {error['msg']}")

Working with CSV Files

Splitting Large Files

from smart_schema.adapters.csv_splitter import split_by_rows, split_by_column

# Split by number of rows
split_by_rows(
    "large_file.csv",
    rows_per_file=1000,
    output_prefix="split_"
)

# Split by column value
split_by_column(
    "data.csv",
    column="category",
    output_prefix="category_"
)

Inferring Column Types

from smart_schema.adapters.csv_inference import infer_column_types
import pandas as pd

df = pd.read_csv("data.csv")
column_types = infer_column_types(df)
print("Inferred column types:", column_types)

Contributing

We welcome contributions! Here's how you can help:

Setting Up Development Environment

Fork the repository

Clone your fork:

git clone https://github.com/yourusername/smart-schema.git
cd smart-schema

Project details

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Release history Release notifications | RSS feed

1.0.3

May 28, 2025

1.0.2

May 28, 2025

1.0.1

May 28, 2025

This version

0.1.1

May 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smart_schema-0.1.1.tar.gz (35.0 kB view details)

Uploaded May 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

smart_schema-0.1.1-py3-none-any.whl (23.1 kB view details)

Uploaded May 26, 2025 Python 3

File details

Details for the file smart_schema-0.1.1.tar.gz.

File metadata

Download URL: smart_schema-0.1.1.tar.gz
Upload date: May 26, 2025
Size: 35.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for smart_schema-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`eebe034dc75fef99619947eefd8703a8d8a0983ab64e2a17285c442cb6eec9cb`
MD5	`9bef42b922d26b74cdc29c0a018c1f56`
BLAKE2b-256	`03a5d44ca9770924a6ab2a327b728e9adfbcfff29987cbb5d88f374172a0e25e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for smart_schema-0.1.1.tar.gz:

Publisher: python-publish.yml on ipriyaaanshu/smart-schema

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: smart_schema-0.1.1.tar.gz
- Subject digest: eebe034dc75fef99619947eefd8703a8d8a0983ab64e2a17285c442cb6eec9cb
- Sigstore transparency entry: 220020342
- Sigstore integration time: May 26, 2025
Source repository:
- Permalink: ipriyaaanshu/smart-schema@36b198ec20f4e69342ba7f182f94cbe737cdeb88
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/ipriyaaanshu
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@36b198ec20f4e69342ba7f182f94cbe737cdeb88
- Trigger Event: release

File details

Details for the file smart_schema-0.1.1-py3-none-any.whl.

File metadata

Download URL: smart_schema-0.1.1-py3-none-any.whl
Upload date: May 26, 2025
Size: 23.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for smart_schema-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3cd96abc237095003f311ce81da193b9fc1e6c718eeed4a7ccccccba0a296779`
MD5	`df714d675b86f292a1492eff4b742de2`
BLAKE2b-256	`c8283e1f63c4546b9ccd7457ecb0f9d9d725104eaa30d641bd2e22798fd41a4c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for smart_schema-0.1.1-py3-none-any.whl:

Publisher: python-publish.yml on ipriyaaanshu/smart-schema

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: smart_schema-0.1.1-py3-none-any.whl
- Subject digest: 3cd96abc237095003f311ce81da193b9fc1e6c718eeed4a7ccccccba0a296779
- Sigstore transparency entry: 220020344
- Sigstore integration time: May 26, 2025
Source repository:
- Permalink: ipriyaaanshu/smart-schema@36b198ec20f4e69342ba7f182f94cbe737cdeb88
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/ipriyaaanshu
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@36b198ec20f4e69342ba7f182f94cbe737cdeb88
- Trigger Event: release

smart-schema 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Smart Schema

Features

Installation

As a Binary (CLI Tool)

As a Library

Command Line Interface

Basic Commands

Generate Models

Validate Data

Process CSV Files

Common Options

Output Formats

Examples

Quickstart

Basic Usage

Command Line Interface

Detailed Usage

Generating Models

From CSV Files

From JSON Data

Validating Data

Working with CSV Files

Splitting Large Files

Inferring Column Types

Contributing

Setting Up Development Environment

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance