CSV file validation against given data schema.
Project description
CSV Schema Validator
A powerful Python library for validating CSV files against JSON schemas. Ensure your CSV data meets specific requirements with comprehensive validation including data types, patterns, ranges, and custom constraints.
Features
- Type Validation: Support for
string,number,integer, andbooleandata types - Pattern Matching: Regex pattern validation for strings (e.g., email formats, dates)
- Enum Validation: Restrict values to predefined options
- Range Validation: Min/max constraints for numeric fields
- Required Field Checking: Ensure mandatory fields are present
- Detailed Error Reporting: Comprehensive validation results with row/column information
- Command Line Interface: Easy-to-use CLI for quick validation
- Python API: Programmatic access for integration into larger workflows
Installation
pip install csv-schema-validator
Quick Start
Command Line Usage
# Validate a CSV file against a schema
csv-schema-validator employees.csv employee_schema.json
# Show help
csv-schema-validator --help
# Show version
csv-schema-validator --version
Python API Usage
from csv_schema_validator import validate_csv
# Define your schema
schema = {
"name": "Employee Data Schema",
"description": "Schema for validating employee CSV files",
"fields": [
{
"name": "employee_id",
"type": "integer",
"required": True,
"description": "Unique employee identifier"
},
{
"name": "email",
"type": "string",
"required": True,
"pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$",
"description": "Valid email address"
},
{
"name": "department",
"type": "string",
"required": True,
"enum": ["Engineering", "Marketing", "Sales", "HR", "Finance"],
"description": "Employee department"
},
{
"name": "salary",
"type": "number",
"required": True,
"min": 30000,
"max": 200000,
"description": "Annual salary in USD"
}
]
}
# Validate your CSV
result = validate_csv("employees.csv", schema)
if result["is_valid"]:
print("✅ Validation passed")
else:
print(f"❌ Validation failed: {len(result['errors'])} errors found")
for error in result["errors"]:
print(f"Row {error['row']}, Column {error['column']}: {error['error_type']} - {error['error_message']}")
Schema Format
The schema file should be a JSON file with the following structure:
{
"name": "Schema Name",
"description": "Schema description",
"fields": [
{
"name": "field_name",
"type": "string|number|integer|boolean",
"required": true,
"description": "Field description",
"pattern": "regex_pattern",
"enum": ["value1", "value2"],
"min": 0,
"max": 100
}
]
}
Field Properties
| Property | Type | Required | Description |
|---|---|---|---|
name |
string | YES | Field name (must match CSV header) |
type |
string | YES | Data type: string, number, integer, or boolean |
required |
boolean | YES | Whether the field must be present |
description |
string | NO | Human-readable field description |
pattern |
string | NO | Regex pattern for string validation |
enum |
array | NO | Allowed values for the field |
min |
integer | NO | Minimum value (for numeric fields) |
max |
integer | NO | Maximum value (for numeric fields) |
Data Types
String
- Basic string validation
- Optional regex pattern matching
- Optional enum value restriction
Number
- Floating-point number validation
- Optional min/max range constraints
Integer
- Whole number validation
- Optional min/max range constraints
Boolean
- Accepts:
true,false,1,0,yes,no,on,off
Examples
Employee Data Validation
CSV File (employees.csv):
employee_id,first_name,last_name,email,department,salary,hire_date,is_active
1,John,Doe,john.doe@company.com,Engineering,75000,2023-01-15,true
2,Jane,Smith,jane.smith@company.com,Marketing,65000,2023-03-22,true
Schema File (employee_schema.json):
{
"name": "Employee Data Schema",
"description": "Schema for validating employee CSV files",
"fields": [
{
"name": "employee_id",
"type": "integer",
"required": true,
"description": "Unique employee identifier"
},
{
"name": "first_name",
"type": "string",
"required": true,
"description": "Employee's first name"
},
{
"name": "last_name",
"type": "string",
"required": true,
"description": "Employee's last name"
},
{
"name": "email",
"type": "string",
"required": true,
"pattern": "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$",
"description": "Valid email address"
},
{
"name": "department",
"type": "string",
"required": true,
"enum": ["Engineering", "Marketing", "Sales", "HR", "Finance"],
"description": "Employee department"
},
{
"name": "salary",
"type": "number",
"required": true,
"min": 30000,
"max": 200000,
"description": "Annual salary in USD"
},
{
"name": "hire_date",
"type": "string",
"required": true,
"pattern": "^\\d{4}-\\d{2}-\\d{2}$",
"description": "Hire date in YYYY-MM-DD format"
},
{
"name": "is_active",
"type": "boolean",
"required": true,
"description": "Whether employee is currently active"
}
]
}
Command Line Options
| Option | Description |
|---|---|
<csv_file> |
Path to the CSV file to validate |
<schema_file> |
Path to the JSON schema file |
-h, --help |
Show help message |
-v, --version |
Show version information |
Return Value
The validation function returns a dictionary with the following structure:
{
"is_valid": bool,
"errors": [
{
"error_type": str, # e.g., "RequiredFieldError", "TypeValidationError", etc.
"error_message": str,
"row": int, # Row number (-1 for header errors)
"column": str, # Column name
"value": str, # The value that caused the error
"details": dict # Additional error details
}
]
}
Error Types
RequiredFieldError: Required field is missing from CSV headerTypeValidationError: Value doesn't match expected data typePatternValidationError: String doesn't match regex patternEnumValidationError: Value not in allowed enum valuesRangeValidationError: Numeric value outside allowed range (too small or too large)EmptyFileError: CSV file is empty or has no data rowsCSVFileError: General CSV file reading errors (file not found, permission denied, etc.)SchemaValidationError: Schema structure validation errorsInvalidJSONError: Invalid JSON in schema file
Requirements
- Python 3.7+
- pydantic >= 2.0.0
License
This project is licensed under the MIT License.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Support
If you encounter any issues or have questions, please file an issue on the GitHub repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file csv_schema_validator-0.2.0.tar.gz.
File metadata
- Download URL: csv_schema_validator-0.2.0.tar.gz
- Upload date:
- Size: 136.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
563a967314256e07f75a8cda97e4c08ee0d93e707f8946a5ffc6eeabbe4adcd0
|
|
| MD5 |
125bc53562860d5aaecadef847dbf025
|
|
| BLAKE2b-256 |
a6800567dd620a749c887934a793fd91bd5d302d0db33a3a3b6c4f9e4f7fe41a
|
File details
Details for the file csv_schema_validator-0.2.0-py3-none-any.whl.
File metadata
- Download URL: csv_schema_validator-0.2.0-py3-none-any.whl
- Upload date:
- Size: 26.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
78f7124d279a4dae4afd56f94856745a47557741812a76618b7003cd7f6cf664
|
|
| MD5 |
d53011098e0d3cfc20ccb972984a4f1b
|
|
| BLAKE2b-256 |
247e12e9491c0126c8d91f1a7bec03ea56d62607da9c08eca8d90f81d332db64
|