Skip to main content

CSV file validation against given data schema.

Project description

CSV Schema Validator

A powerful Python library for validating CSV files against JSON schemas. Ensure your CSV data meets specific requirements with comprehensive validation including data types, patterns, ranges, and custom constraints.

Features

  • Type Validation: Support for string, number, integer, and boolean data types
  • Pattern Matching: Regex pattern validation for strings (e.g., email formats, dates)
  • Enum Validation: Restrict values to predefined options
  • Range Validation: Min/max constraints for numeric fields
  • Required Field Checking: Ensure mandatory fields are present
  • Detailed Error Reporting: Comprehensive validation results with row/column information
  • Command Line Interface: Easy-to-use CLI for quick validation
  • Python API: Programmatic access for integration into larger workflows

Installation

pip install csv-schema-validator

Quick Start

Command Line Usage

# Validate a CSV file against a schema
csv-schema-validator employees.csv employee_schema.json

# Show help
csv-schema-validator --help

# Show version
csv-schema-validator --version

Python API Usage

from csv_schema_validator import validate_csv

# Define your schema
schema = {
    "name": "Employee Data Schema",
    "description": "Schema for validating employee CSV files",
    "fields": [
        {
            "name": "employee_id",
            "type": "integer",
            "required": True,
            "description": "Unique employee identifier"
        },
        {
            "name": "email",
            "type": "string",
            "required": True,
            "pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$",
            "description": "Valid email address"
        },
        {
            "name": "department",
            "type": "string",
            "required": True,
            "enum": ["Engineering", "Marketing", "Sales", "HR", "Finance"],
            "description": "Employee department"
        },
        {
            "name": "salary",
            "type": "number",
            "required": True,
            "min": 30000,
            "max": 200000,
            "description": "Annual salary in USD"
        }
    ]
}

# Validate your CSV
result = validate_csv("employees.csv", schema)

if result["is_valid"]:
    print("✅ Validation passed")
else:
    print(f"❌ Validation failed: {len(result['errors'])} errors found")
    for error in result["errors"]:
        print(f"Row {error['row']}, Column {error['column']}: {error['error_type']} - {error['error_message']}")

Schema Format

The schema file should be a JSON file with the following structure:

{
  "name": "Schema Name",
  "description": "Schema description",
  "fields": [
    {
      "name": "field_name",
      "type": "string|number|integer|boolean",
      "required": true,
      "description": "Field description",
      "pattern": "regex_pattern",
      "enum": ["value1", "value2"],
      "min": 0,
      "max": 100
    }
  ]
}

Field Properties

Property Type Required Description
name string YES Field name (must match CSV header)
type string YES Data type: string, number, integer, or boolean
required boolean YES Whether the field must be present
description string NO Human-readable field description
pattern string NO Regex pattern for string validation
enum array NO Allowed values for the field
min integer NO Minimum value (for numeric fields)
max integer NO Maximum value (for numeric fields)

Data Types

String

  • Basic string validation
  • Optional regex pattern matching
  • Optional enum value restriction

Number

  • Floating-point number validation
  • Optional min/max range constraints

Integer

  • Whole number validation
  • Optional min/max range constraints

Boolean

  • Accepts: true, false, 1, 0, yes, no, on, off

Examples

Employee Data Validation

CSV File (employees.csv):

employee_id,first_name,last_name,email,department,salary,hire_date,is_active
1,John,Doe,john.doe@company.com,Engineering,75000,2023-01-15,true
2,Jane,Smith,jane.smith@company.com,Marketing,65000,2023-03-22,true

Schema File (employee_schema.json):

{
  "name": "Employee Data Schema",
  "description": "Schema for validating employee CSV files",
  "fields": [
    {
      "name": "employee_id",
      "type": "integer",
      "required": true,
      "description": "Unique employee identifier"
    },
    {
      "name": "first_name",
      "type": "string",
      "required": true,
      "description": "Employee's first name"
    },
    {
      "name": "last_name",
      "type": "string",
      "required": true,
      "description": "Employee's last name"
    },
    {
      "name": "email",
      "type": "string",
      "required": true,
      "pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$",
      "description": "Valid email address"
    },
    {
      "name": "department",
      "type": "string",
      "required": true,
      "enum": ["Engineering", "Marketing", "Sales", "HR", "Finance"],
      "description": "Employee department"
    },
    {
      "name": "salary",
      "type": "number",
      "required": true,
      "min": 30000,
      "max": 200000,
      "description": "Annual salary in USD"
    },
    {
      "name": "hire_date",
      "type": "string",
      "required": true,
      "pattern": "^\\d{4}-\\d{2}-\\d{2}$",
      "description": "Hire date in YYYY-MM-DD format"
    },
    {
      "name": "is_active",
      "type": "boolean",
      "required": true,
      "description": "Whether employee is currently active"
    }
  ]
}

Command Line Options

Option Description
<csv_file> Path to the CSV file to validate
<schema_file> Path to the JSON schema file
-h, --help Show help message
-v, --version Show version information

Return Value

The validation function returns a dictionary with the following structure:

{
    "is_valid": bool,
    "errors": [
        {
            "error_type": str,  # e.g., "RequiredFieldError", "TypeValidationError", etc.
            "error_message": str,
            "row": int,  # Row number (-1 for header errors)
            "column": str,  # Column name
            "value": str,  # The value that caused the error
            "details": dict  # Additional error details
        }
    ]
}

Error Types

  • RequiredFieldError: Required field is missing from CSV header
  • TypeValidationError: Value doesn't match expected data type
  • PatternValidationError: String doesn't match regex pattern
  • EnumValidationError: Value not in allowed enum values
  • RangeValidationError: Numeric value outside allowed range (too small or too large)
  • EmptyFileError: CSV file is empty or has no data rows
  • CSVFileError: General CSV file reading errors (file not found, permission denied, etc.)
  • SchemaValidationError: Schema structure validation errors
  • InvalidJSONError: Invalid JSON in schema file

Requirements

  • Python 3.7+
  • pydantic >= 2.0.0

License

This project is licensed under the MIT License.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Support

If you encounter any issues or have questions, please file an issue on the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

csv_schema_validator-0.2.0.tar.gz (136.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

csv_schema_validator-0.2.0-py3-none-any.whl (26.2 kB view details)

Uploaded Python 3

File details

Details for the file csv_schema_validator-0.2.0.tar.gz.

File metadata

  • Download URL: csv_schema_validator-0.2.0.tar.gz
  • Upload date:
  • Size: 136.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.8

File hashes

Hashes for csv_schema_validator-0.2.0.tar.gz
Algorithm Hash digest
SHA256 563a967314256e07f75a8cda97e4c08ee0d93e707f8946a5ffc6eeabbe4adcd0
MD5 125bc53562860d5aaecadef847dbf025
BLAKE2b-256 a6800567dd620a749c887934a793fd91bd5d302d0db33a3a3b6c4f9e4f7fe41a

See more details on using hashes here.

File details

Details for the file csv_schema_validator-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for csv_schema_validator-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 78f7124d279a4dae4afd56f94856745a47557741812a76618b7003cd7f6cf664
MD5 d53011098e0d3cfc20ccb972984a4f1b
BLAKE2b-256 247e12e9491c0126c8d91f1a7bec03ea56d62607da9c08eca8d90f81d332db64

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page