CLI tool and Python SDK for importing Excel files to SQLite with advanced transformations and validation
Project description
Excel to SQLite
A powerful CLI tool and Python SDK for importing Excel files into SQLite databases with advanced data transformation, validation, and quality profiling.
โจ Features
Core Functionality
- ๐ฅ Smart Import - Import Excel files into SQLite with automatic schema detection
- ๐ค Flexible Export - Export SQL data back to Excel with formatting
- ๐ Incremental Imports - Only process changed files using content hashing
- ๐ Multi-Sheet Support - Import/export multiple sheets in one operation
- โก High Performance - Powered by Pandas and SQLAlchemy 2.0
Data Transformations
- ๐ Value Mapping - Standardize data values (e.g., "NY" โ "New York")
- โ Calculated Columns - Create derived columns using expressions
- ๐ Reference Validation - Foreign key validation against lookup tables
- ๐ฃ Pre/Post Hooks - Execute custom code during import/export pipeline
Data Validation
- โ Custom Validators - Range, regex, unique, not-null, enum validators
- ๐ Validation Rules - Declarative rule-based validation system
- ๐ Data Profiling - Automatic quality analysis with detailed reports
- ๐ท๏ธ Metadata Tracking - Tag and categorize imports with rich metadata
Developer Experience
- ๐ Python SDK - Full-featured programmatic API
- ๐ฏ Type Hints - Complete type annotations throughout
- ๐ Well Documented - Comprehensive documentation and examples
- ๐งช Well Tested - Extensive test coverage
๐ฆ Installation
# Install from PyPI
pip install excel-to-sql
# Or with uv
uv pip install excel-to-sql
# Or install from source
git clone https://github.com/AliiiBenn/excel-to-sql.git
cd excel-to-sql
uv sync
๐ Quick Start
CLI Usage
# Initialize a new project
excel-to-sql init
# Define a mapping type (interactive)
excel-to-sql config add --type products
# Import an Excel file
excel-to-sql import --file data.xlsx --type products
# Check import status
excel-to-sql status
# Export to Excel
excel-to-sql export --table products --output report.xlsx
# Profile data quality
excel-to-sql profile --table products --output quality-report.html
Python SDK
from excel_to_sql import ExcelToSqlite
# Initialize SDK
sdk = ExcelToSqlite()
# Import data with transformations
result = sdk.import_excel(
file_path="data.xlsx",
type_name="products",
tags=["q1-2024", "verified"]
)
# Query data
df = sdk.query("SELECT * FROM products WHERE price > 100")
# Profile data
profile = sdk.profile_table("products")
print(f"Quality score: {profile['summary']['null_percentage']}% nulls")
# Export with multi-sheet support
sdk.export_to_excel(
output="report.xlsx",
sheet_mapping={
"Products": "products",
"Categories": "SELECT * FROM categories"
}
)
Advanced Transformations
from excel_to_sql import ExcelToSqlite, ValueMapping, CalculatedColumn
from excel_to_sql.validators import ValidationRule, RuleSet
sdk = ExcelToSqlite()
# Configure value mappings
value_mappings = {
"status": {"1": "Active", "0": "Inactive"},
"state": {"NY": "New York", "CA": "California"}
}
# Configure calculated columns
calculated_columns = [
CalculatedColumn("total", "quantity * price"),
CalculatedColumn("tax", "total * 0.1"),
CalculatedColumn("grand_total", "total + tax")
]
# Configure validation rules
validation_rules = [
ValidationRule("id", "unique"),
ValidationRule("email", "regex", {"pattern": r"^[^@]+@[^@]+\.[^@]+$"}),
ValidationRule("age", "range", {"min": 0, "max": 120})
]
Data Quality Reports
from excel_to_sql import QualityReport
# Generate quality report
report = QualityReport()
profile = report.generate(
df=df,
output_path="quality-report.html"
)
# Access quality metrics
print(f"Null percentage: {profile.null_percentage}%")
print(f"Unique values: {profile.unique_count}")
print(f"Issues found: {len(profile.get_issues())}")
๐ Configuration
Mapping configuration is stored in config/mappings.json:
{
"products": {
"target_table": "products",
"primary_key": ["id"],
"column_mappings": {
"ID": {"target": "id", "type": "integer"},
"Name": {"target": "name", "type": "string"},
"Price": {"target": "price", "type": "float"}
},
"value_mappings": [
{
"column": "status",
"mappings": {"1": "Active", "0": "Inactive"}
}
],
"calculated_columns": [
{
"name": "total",
"expression": "quantity * price"
}
],
"validation_rules": [
{
"column": "id",
"type": "unique"
}
],
"tags": ["import", "products"]
}
}
๐ง Available Validators
| Validator | Description | Example |
|---|---|---|
RangeValidator |
Numeric range validation | Age between 0-120 |
RegexValidator |
Pattern matching | Email validation |
UniqueValidator |
Uniqueness check | Primary keys |
NotNullValidator |
Required fields | Mandatory columns |
EnumValidator |
Allowed values | Status codes |
ReferenceValidator |
Foreign key check | Category exists |
CustomValidator |
Custom logic | Any Python function |
๐ Data Profiling
Generate comprehensive data quality reports:
from excel_to_sql import DataProfiler
profiler = DataProfiler()
profile = profiler.profile(df)
# Check for issues
for issue in profile.get_issues():
print(f"{issue['severity']}: {issue['issue']} in {issue['column']}")
Supported report formats:
- JSON - Machine-readable format
- Markdown - Human-readable documentation
- HTML - Interactive reports with styling
๐งช Testing
# Run all tests
uv run pytest
# Run with coverage
uv run pytest --cov=excel_to_sql --cov-report=html
# Run specific test file
uv run pytest tests/test_transformations.py -v
๐ Project Structure
excel-to-sqlite/
โโโ excel_to_sql/ # Main package
โ โโโ cli.py # CLI interface
โ โโโ sdk/ # Python SDK
โ โโโ entities/ # Domain entities
โ โโโ transformations/ # Data transformations
โ โโโ validators/ # Data validation
โ โโโ profiling/ # Quality analysis
โ โโโ metadata/ # Metadata management
โ โโโ models/ # Pydantic models
โโโ tests/ # Test suite
โโโ docs/ # Documentation
โโโ config/ # Configuration files
๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Links
๐ Version 0.2.0 Highlights
12 Major Features Added:
- โ Value Mapping for Data Standardization
- โ Calculated/Derived Columns
- โ Custom Validators
- โ Reference/Lookup Validation
- โ Data Profiling & Quality Reports
- โ Multi-Sheet Import
- โ Multi-Sheet Export
- โ Incremental/Delta Import
- โ Data Validation Rules
- โ Pre/Post Processing Hooks
- โ Python SDK / Programmatic API
- โ Metadata & Tags for Imports
68 tests added with comprehensive coverage for all new features.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file excel_to_sql-0.2.0.tar.gz.
File metadata
- Download URL: excel_to_sql-0.2.0.tar.gz
- Upload date:
- Size: 77.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
93f87af7ac9c619abb6f14c45366c4dc378e03dc9973a13ad0053771192ef600
|
|
| MD5 |
e1f672481d95d5cbbca1cad6dd547705
|
|
| BLAKE2b-256 |
e44b8bfb843800192825eb310cc7a516e71a56b2159bba6fc7235dd8ea5eda41
|
File details
Details for the file excel_to_sql-0.2.0-py3-none-any.whl.
File metadata
- Download URL: excel_to_sql-0.2.0-py3-none-any.whl
- Upload date:
- Size: 46.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aba1e4e101f7b3473880c2a270e34cddd9acb306d764869e37cf0bf809ce0d78
|
|
| MD5 |
06503bf98c177dd7ce55bb807ec15f97
|
|
| BLAKE2b-256 |
f7bfff3461cb111f1ee845fd8f0a6ab1ce728a4483f83298c1eeef6526aa1857
|