Skip to main content

CLI tool and Python SDK for importing Excel files to SQLite with advanced transformations and validation

Project description

Excel to SQLite

A powerful CLI tool and Python SDK for importing Excel files into SQLite databases with advanced data transformation, validation, and quality profiling.

Python PyPI License

โœจ Features

Core Functionality

  • ๐Ÿ“ฅ Smart Import - Import Excel files into SQLite with automatic schema detection
  • ๐Ÿ“ค Flexible Export - Export SQL data back to Excel with formatting
  • ๐Ÿ” Incremental Imports - Only process changed files using content hashing
  • ๐Ÿ“‘ Multi-Sheet Support - Import/export multiple sheets in one operation
  • โšก High Performance - Powered by Pandas and SQLAlchemy 2.0

Data Transformations

  • ๐Ÿ”„ Value Mapping - Standardize data values (e.g., "NY" โ†’ "New York")
  • โž• Calculated Columns - Create derived columns using expressions
  • ๐Ÿ”— Reference Validation - Foreign key validation against lookup tables
  • ๐ŸŽฃ Pre/Post Hooks - Execute custom code during import/export pipeline

Data Validation

  • โœ… Custom Validators - Range, regex, unique, not-null, enum validators
  • ๐Ÿ“ Validation Rules - Declarative rule-based validation system
  • ๐Ÿ” Data Profiling - Automatic quality analysis with detailed reports
  • ๐Ÿท๏ธ Metadata Tracking - Tag and categorize imports with rich metadata

Developer Experience

  • ๐Ÿ Python SDK - Full-featured programmatic API
  • ๐ŸŽฏ Type Hints - Complete type annotations throughout
  • ๐Ÿ“š Well Documented - Comprehensive documentation and examples
  • ๐Ÿงช Well Tested - Extensive test coverage

๐Ÿ“ฆ Installation

# Install from PyPI
pip install excel-to-sql

# Or with uv
uv pip install excel-to-sql

# Or install from source
git clone https://github.com/AliiiBenn/excel-to-sql.git
cd excel-to-sql
uv sync

๐Ÿš€ Quick Start

CLI Usage

# Initialize a new project
excel-to-sql init

# Define a mapping type (interactive)
excel-to-sql config add --type products

# Import an Excel file
excel-to-sql import --file data.xlsx --type products

# Check import status
excel-to-sql status

# Export to Excel
excel-to-sql export --table products --output report.xlsx

# Profile data quality
excel-to-sql profile --table products --output quality-report.html

Python SDK

from excel_to_sql import ExcelToSqlite

# Initialize SDK
sdk = ExcelToSqlite()

# Import data with transformations
result = sdk.import_excel(
    file_path="data.xlsx",
    type_name="products",
    tags=["q1-2024", "verified"]
)

# Query data
df = sdk.query("SELECT * FROM products WHERE price > 100")

# Profile data
profile = sdk.profile_table("products")
print(f"Quality score: {profile['summary']['null_percentage']}% nulls")

# Export with multi-sheet support
sdk.export_to_excel(
    output="report.xlsx",
    sheet_mapping={
        "Products": "products",
        "Categories": "SELECT * FROM categories"
    }
)

Advanced Transformations

from excel_to_sql import ExcelToSqlite, ValueMapping, CalculatedColumn
from excel_to_sql.validators import ValidationRule, RuleSet

sdk = ExcelToSqlite()

# Configure value mappings
value_mappings = {
    "status": {"1": "Active", "0": "Inactive"},
    "state": {"NY": "New York", "CA": "California"}
}

# Configure calculated columns
calculated_columns = [
    CalculatedColumn("total", "quantity * price"),
    CalculatedColumn("tax", "total * 0.1"),
    CalculatedColumn("grand_total", "total + tax")
]

# Configure validation rules
validation_rules = [
    ValidationRule("id", "unique"),
    ValidationRule("email", "regex", {"pattern": r"^[^@]+@[^@]+\.[^@]+$"}),
    ValidationRule("age", "range", {"min": 0, "max": 120})
]

Data Quality Reports

from excel_to_sql import QualityReport

# Generate quality report
report = QualityReport()
profile = report.generate(
    df=df,
    output_path="quality-report.html"
)

# Access quality metrics
print(f"Null percentage: {profile.null_percentage}%")
print(f"Unique values: {profile.unique_count}")
print(f"Issues found: {len(profile.get_issues())}")

๐Ÿ“– Configuration

Mapping configuration is stored in config/mappings.json:

{
  "products": {
    "target_table": "products",
    "primary_key": ["id"],
    "column_mappings": {
      "ID": {"target": "id", "type": "integer"},
      "Name": {"target": "name", "type": "string"},
      "Price": {"target": "price", "type": "float"}
    },
    "value_mappings": [
      {
        "column": "status",
        "mappings": {"1": "Active", "0": "Inactive"}
      }
    ],
    "calculated_columns": [
      {
        "name": "total",
        "expression": "quantity * price"
      }
    ],
    "validation_rules": [
      {
        "column": "id",
        "type": "unique"
      }
    ],
    "tags": ["import", "products"]
  }
}

๐Ÿ”ง Available Validators

Validator Description Example
RangeValidator Numeric range validation Age between 0-120
RegexValidator Pattern matching Email validation
UniqueValidator Uniqueness check Primary keys
NotNullValidator Required fields Mandatory columns
EnumValidator Allowed values Status codes
ReferenceValidator Foreign key check Category exists
CustomValidator Custom logic Any Python function

๐Ÿ“Š Data Profiling

Generate comprehensive data quality reports:

from excel_to_sql import DataProfiler

profiler = DataProfiler()
profile = profiler.profile(df)

# Check for issues
for issue in profile.get_issues():
    print(f"{issue['severity']}: {issue['issue']} in {issue['column']}")

Supported report formats:

  • JSON - Machine-readable format
  • Markdown - Human-readable documentation
  • HTML - Interactive reports with styling

๐Ÿงช Testing

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=excel_to_sql --cov-report=html

# Run specific test file
uv run pytest tests/test_transformations.py -v

๐Ÿ“ Project Structure

excel-to-sqlite/
โ”œโ”€โ”€ excel_to_sql/          # Main package
โ”‚   โ”œโ”€โ”€ cli.py            # CLI interface
โ”‚   โ”œโ”€โ”€ sdk/              # Python SDK
โ”‚   โ”œโ”€โ”€ entities/         # Domain entities
โ”‚   โ”œโ”€โ”€ transformations/  # Data transformations
โ”‚   โ”œโ”€โ”€ validators/       # Data validation
โ”‚   โ”œโ”€โ”€ profiling/        # Quality analysis
โ”‚   โ”œโ”€โ”€ metadata/         # Metadata management
โ”‚   โ””โ”€โ”€ models/           # Pydantic models
โ”œโ”€โ”€ tests/                # Test suite
โ”œโ”€โ”€ docs/                 # Documentation
โ””โ”€โ”€ config/               # Configuration files

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ”— Links

๐ŸŽ‰ Version 0.2.0 Highlights

12 Major Features Added:

  1. โœ… Value Mapping for Data Standardization
  2. โœ… Calculated/Derived Columns
  3. โœ… Custom Validators
  4. โœ… Reference/Lookup Validation
  5. โœ… Data Profiling & Quality Reports
  6. โœ… Multi-Sheet Import
  7. โœ… Multi-Sheet Export
  8. โœ… Incremental/Delta Import
  9. โœ… Data Validation Rules
  10. โœ… Pre/Post Processing Hooks
  11. โœ… Python SDK / Programmatic API
  12. โœ… Metadata & Tags for Imports

68 tests added with comprehensive coverage for all new features.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

excel_to_sql-0.2.0.tar.gz (77.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

excel_to_sql-0.2.0-py3-none-any.whl (46.0 kB view details)

Uploaded Python 3

File details

Details for the file excel_to_sql-0.2.0.tar.gz.

File metadata

  • Download URL: excel_to_sql-0.2.0.tar.gz
  • Upload date:
  • Size: 77.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for excel_to_sql-0.2.0.tar.gz
Algorithm Hash digest
SHA256 93f87af7ac9c619abb6f14c45366c4dc378e03dc9973a13ad0053771192ef600
MD5 e1f672481d95d5cbbca1cad6dd547705
BLAKE2b-256 e44b8bfb843800192825eb310cc7a516e71a56b2159bba6fc7235dd8ea5eda41

See more details on using hashes here.

File details

Details for the file excel_to_sql-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: excel_to_sql-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 46.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for excel_to_sql-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aba1e4e101f7b3473880c2a270e34cddd9acb306d764869e37cf0bf809ce0d78
MD5 06503bf98c177dd7ce55bb807ec15f97
BLAKE2b-256 f7bfff3461cb111f1ee845fd8f0a6ab1ce728a4483f83298c1eeef6526aa1857

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page