Skip to main content

DataFrame API with SQL pushdown execution and real SQL CRUD - the missing layer for SQL in Python

Project description

Moltres

CI Python 3.9+ License: MIT Code style: ruff

The Missing DataFrame Layer for SQL in Python

MOLTRES: Modern Operations Layer for Transformations, Relational Execution, and SQL

InstallationQuick StartDocumentationExamples


Moltres combines a DataFrame API (like Pandas/Polars), SQL pushdown execution (no data loading into memory), and real SQL CRUD operations (INSERT, UPDATE, DELETE) in one unified interface.

Transform millions of rows using familiar DataFrame operations—all executed directly in SQL without materializing data. Update, insert, and delete with column-aware, type-safe operations.

✨ Features

  • 🚀 PySpark-Style DataFrame API - Primary API with familiar operations (select, filter, join, groupBy, etc.) for seamless migration from PySpark
  • 🐼 Optional Pandas-Style Interface - Comprehensive Pandas-like API with string accessor, query(), dtypes, shape, pivot, sample, concat, and more
  • 🦀 Optional Polars-Style Interface - Polars LazyFrame-like API with expression-based operations, set operations, file I/O, CTEs, and more
  • 🎯 98% PySpark API Compatibility - Near-complete compatibility for seamless migration
  • 🗄️ SQL Pushdown Execution - All operations compile to SQL and run on your database—no data loading into memory
  • ✏️ Real SQL CRUD - INSERT, UPDATE, DELETE operations with DataFrame-style syntax
  • 📊 Multiple Formats - Read/write CSV, JSON, JSONL, Parquet, and more
  • 🐼 Pandas & Polars Integration - Pass pandas/polars DataFrames directly to moltres operations
  • 🌊 Streaming Support - Handle datasets larger than memory with chunked processing
  • Async Support - Full async/await support for all operations
  • 🔒 Security First - Built-in SQL injection prevention and validation

📦 Installation

pip install moltres

# Optional: For async support
pip install moltres[async-postgresql]  # PostgreSQL
pip install moltres[async-mysql]       # MySQL
pip install moltres[async-sqlite]     # SQLite

# Optional: For pandas/polars result formats
pip install moltres[pandas,polars]

🚀 Quick Start

Basic DataFrame Operations

from moltres import col, connect
from moltres.expressions import functions as F

# Connect to your database
db = connect("sqlite:///example.db")

# DataFrame operations with SQL pushdown (no data loading into memory)
df = (
    db.table("orders")
    .select()
    .join(db.table("customers").select(), on=[col("orders.customer_id") == col("customers.id")])
    .where(col("active") == True)
    .group_by("country")
    .agg(F.sum(col("amount")).alias("total_amount"))
)

# Execute and get results (SQL is compiled and executed here)
results = df.collect()  # Returns list of dicts by default

Pandas-Style Interface

df = db.table("users").pandas()

# Pandas-style operations
df[['id', 'name']]  # Select columns
df.query('age > 25 and country == "USA"')  # Query with AND/OR
df['name'].str.upper()  # String accessor
df.groupby('country').agg(age='mean')  # GroupBy

📚 See the Pandas Interface Guide →

Polars-Style Interface

df = db.table("users").polars()

# Polars-style operations
df.select("id", "name", (col("age") * 2).alias("double_age"))
df.filter((col("age") > 25) & (col("country") == "USA"))
df.group_by("country").agg(F.sum(col("age")))

📚 See the Polars Interface Guide →

CRUD Operations

from moltres.io.records import Records

# Insert rows
Records.from_list([
    {"id": 1, "name": "Alice", "email": "alice@example.com"},
    {"id": 2, "name": "Bob", "email": "bob@example.com"},
], database=db).insert_into("users")

# Update rows
db.update("users", where=col("active") == 0, set={"active": 1})

# Delete rows
db.delete("users", where=col("email").is_null())

📚 See CRUD Operations Guide →

📖 Documentation

Getting Started

Interface Guides

Core Topics

Advanced Topics

Reference

📚 Examples

Comprehensive examples demonstrating all Moltres features:

See the examples directory for all example files.

🛠️ Supported Operations

DataFrame Operations

  • select() / selectExpr() - Project columns or SQL expressions
  • where() / filter() - Filter rows
  • join() - Join with other DataFrames
  • group_by() / groupBy() - Group rows
  • agg() - Aggregate functions
  • order_by() / orderBy() / sort() - Sort rows
  • limit() - Limit number of rows
  • distinct() - Remove duplicate rows
  • withColumn() - Add or rename columns
  • pivot() - Pivot operations
  • explode() - Explode array/JSON columns

Column Expressions

  • Arithmetic: +, -, *, /, %
  • Comparisons: ==, !=, <, >, <=, >=
  • Boolean: &, |, ~
  • Functions: 130+ functions including mathematical, string, date/time, aggregate, window, array, JSON, and utility functions
  • Window Functions: over(), partition_by(), order_by() - Full PySpark compatibility

📚 See Expressions Guide →

Supported SQL Dialects

  • SQLite - Full support
  • PostgreSQL - Full support with dialect-specific optimizations
  • MySQL - Full support with dialect-specific optimizations
  • DuckDB - Full support with PostgreSQL-compatible optimizations
  • Other SQLAlchemy-supported databases - ANSI SQL fallback

🧪 Development

Setup

# Clone the repository
git clone https://github.com/eddiethedean/moltres.git
cd moltres

# Install in development mode
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

Running Tests

# Run all tests
pytest

# Run tests in parallel
pytest -n 9

# Run with coverage
pytest --cov=src/moltres --cov-report=html

Code Quality

# Linting
ruff check .

# Formatting
ruff format .

# Type checking (strict mode enabled)
mypy src

Pre-Commit CI Checks

# Run all CI checks (linting, type checking, tests)
make ci-check

# Quick linting check only
make ci-check-lint

🤝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Quick Start:

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add some amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

Before submitting:

  • Run tests: pytest
  • Check code quality: ruff check . && mypy src
  • Update documentation if needed

👤 Author

Odos Matthews

🙏 Acknowledgments

  • Inspired by PySpark's DataFrame API style, but focused on SQL feature support rather than PySpark feature parity
  • Built on SQLAlchemy for database connectivity and SQL compilation
  • Thanks to all contributors and users

📄 License

MIT License - see LICENSE file for details.


Made with ❤️ for the Python data community

⬆ Back to Top

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

moltres-0.18.0.tar.gz (264.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

moltres-0.18.0-py3-none-any.whl (302.5 kB view details)

Uploaded Python 3

File details

Details for the file moltres-0.18.0.tar.gz.

File metadata

  • Download URL: moltres-0.18.0.tar.gz
  • Upload date:
  • Size: 264.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for moltres-0.18.0.tar.gz
Algorithm Hash digest
SHA256 4134a5b430ce1e8cf2d36f12afe514015c2cee88aef7ae1915efb2a8ca82f6cf
MD5 c504b2746c325a5302f8e34cc9af6a3e
BLAKE2b-256 313484b91e08f0c188eada622627a24479030ca9ce7706962f35c01fbe9055d6

See more details on using hashes here.

File details

Details for the file moltres-0.18.0-py3-none-any.whl.

File metadata

  • Download URL: moltres-0.18.0-py3-none-any.whl
  • Upload date:
  • Size: 302.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for moltres-0.18.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f0783c6222d9f7220e898955abe65aade5e2981a389316c64fb9bb8988461be6
MD5 dffb77570171461e56c288acd9799ee6
BLAKE2b-256 ea87c7d06ee8b5420e86febd8c4fb5dfe639ed2bd1f4b15b1174d992b24be02b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page