DataFrame API with SQL pushdown execution and real SQL CRUD - the missing layer for SQL in Python

These details have not been verified by PyPI

Project links

Project description

Moltres

The Missing DataFrame Layer for SQL in Python

MOLTRES: Modern Operations Layer for Transformations, Relational Execution, and SQL

Installation • Quick Start • Examples • Documentation

Moltres combines a DataFrame API (like Pandas/Polars), SQL pushdown execution (no data loading into memory), and real SQL CRUD operations (INSERT, UPDATE, DELETE) in one unified interface.

Transform millions of rows using familiar DataFrame operations—all executed directly in SQL without materializing data. Update, insert, and delete with column-aware, type-safe operations.

✨ Features

🚀 DataFrame API - Familiar operations (select, filter, join, groupBy, etc.) like Pandas/Polars/PySpark
🎯 98% PySpark API Compatibility - Near-complete compatibility for seamless migration
🗄️ SQL Pushdown Execution - All operations compile to SQL and run on your database—no data loading into memory
✏️ Real SQL CRUD - INSERT, UPDATE, DELETE operations with DataFrame-style syntax
📊 Multiple Formats - Read/write CSV, JSON, JSONL, Parquet, and more
🌊 Streaming Support - Handle datasets larger than memory with chunked processing
⚡ Async Support - Full async/await support for all operations
🔒 Security First - Built-in SQL injection prevention and validation

📦 Installation

pip install moltres

# Optional: For async support
pip install moltres[async-postgresql]  # PostgreSQL
pip install moltres[async-mysql]       # MySQL
pip install moltres[async-sqlite]     # SQLite

# Optional: For pandas/polars result formats
pip install moltres[pandas,polars]

🚀 Quick Start

Basic DataFrame Operations

from moltres import col, connect
from moltres.expressions import functions as F

# Connect to your database
db = connect("sqlite:///example.db")

# DataFrame operations with SQL pushdown (no data loading into memory)
df = (
    db.table("orders")
    .select()
    .join(db.table("customers").select(), on=[("customer_id", "id")])
    .where(col("active") == True)  # noqa: E712
    .group_by("country")
    .agg(F.sum(col("amount")).alias("total_amount"))
)

# Execute and get results (SQL is compiled and executed here)
results = df.collect()  # Returns list of dicts by default
# Output: [{'country': 'UK', 'total_amount': 150.0}, {'country': 'USA', 'total_amount': 300.0}]

Raw SQL & SQL Expressions

# Raw SQL queries (PySpark-style)
df = db.sql("SELECT * FROM users WHERE age > 18")
# Output: [{'id': 1, 'name': 'Alice', 'age': 25}, {'id': 3, 'name': 'Charlie', 'age': 30}]

df = db.sql("SELECT * FROM orders WHERE id = :id", id=1).where(col("amount") > 100)
# Output: [] (empty if amount <= 100)

# SQL expression selection
df.selectExpr("amount * 1.1 as with_tax", "amount as amount_original")
# Output: [{'with_tax': 55.0, 'amount_original': 50.0}, {'with_tax': 165.0, 'amount_original': 150.0}]

CRUD Operations

from moltres.io.records import Records

# Insert rows
records = Records(
    _data=[
        {"id": 1, "name": "Alice", "email": "alice@example.com", "active": 1},
        {"id": 2, "name": "Bob", "email": "bob@example.com", "active": 0},
    ],
    _database=db,
)
result = records.insert_into("customers")  # Executes immediately
# Output: 2 (number of rows inserted)

# Update rows
df = db.table("customers").select()
result = df.write.update(
    "customers",
    where=col("active") == 0,
    set={"active": 1}
)  # Executes immediately
# Output: None (operation executes immediately, returns None)

# Delete rows
df.write.delete("customers", where=col("email").is_null())  # Executes immediately
# Output: None (operation executes immediately, returns None)

Async Support

import asyncio
from moltres import async_connect, col

async def main():
    db = await async_connect("postgresql+asyncpg://user:pass@localhost/db")
    
    df = await db.table("orders").select()
    results = await df.collect()
    
    # Streaming support
    async for chunk in await df.collect(stream=True):
        process_chunk(chunk)
    
    await db.close()

asyncio.run(main())

📖 Core Concepts

Lazy Evaluation

All DataFrame query operations are lazy—they build a logical plan that only executes when you call collect(). DataFrame write operations (insertInto, update, delete) execute eagerly (immediately), matching PySpark's behavior.

# This doesn't execute any SQL yet
df = db.table("users").select().where(col("age") > 18)

# SQL is compiled and executed here
results = df.collect()
# Output: [{'id': 1, 'name': 'Alice', 'age': 25}, {'id': 3, 'name': 'Charlie', 'age': 30}]

Column Expressions

Moltres supports multiple ways to reference columns:

String names: df.select("id", "name")
Dot notation: df.select(df.id, df.name) (PySpark-style)
col() function: df.select(col("id"), col("name"))
Mix and match: Combine all three methods in the same query

📚 See detailed examples:

Column expressions and functions

📥 Reading Data

Moltres supports reading from database tables, raw SQL queries, and files (CSV, JSON, Parquet, etc.). All readers return lazy DataFrame objects that can be transformed before execution.

Key Features:

Read from tables: db.table("table_name").select() or db.read.table("table_name")
Raw SQL queries: db.sql("SELECT * FROM users WHERE age > 18")
SQL expressions: df.selectExpr("amount * 1.1 as with_tax")
File formats: CSV, JSON, JSONL, Parquet, Text
Schema inference or explicit schemas
Lazy evaluation - files materialize only when .collect() is called

📚 See detailed examples:

📤 Writing Data

Write DataFrames to database tables or files (CSV, JSON, Parquet, etc.) using the write API.

Key Features:

Save to tables: df.write.save_as_table("table_name")
Insert into existing tables: df.write.insertInto("table_name")
Update/Delete operations: df.write.update() / df.write.delete()
Multiple file formats: CSV, JSON, JSONL, Parquet, Text
Write modes: append, overwrite, ignore, error_if_exists
Partitioned writes and streaming support

📚 See detailed examples:

🌊 Streaming for Large Datasets

Moltres supports streaming for datasets larger than memory. Process data in chunks without loading everything into RAM.

Key Features:

Stream reads: async for chunk in await df.collect(stream=True)
Stream writes: df.write.stream().csv("output.csv")
Configurable chunk sizes
Works with both sync and async operations

📚 See detailed examples:

🗄️ Table Management

Create, drop, and manage database tables with explicit schemas or from DataFrames.

Key Features:

Create tables: db.create_table("name", [column(...)])
Create from DataFrames: df.write.save_as_table("table_name")
Drop tables: db.drop_table("name", if_exists=True)
Constraints: UNIQUE, CHECK, and FOREIGN KEY constraints
Indexes: Create and drop indexes for better query performance
Temporary tables, primary keys, and schema validation

Example:

from moltres.table.schema import column, unique, check, foreign_key

# Create table with constraints
db.create_table(
    "users",
    [
        column("id", "INTEGER", primary_key=True),
        column("email", "TEXT"),
        column("age", "INTEGER"),
    ],
    constraints=[
        unique("email", name="uq_user_email"),
        check("age >= 0", name="ck_positive_age"),
    ],
).collect()

# Create table with foreign key
db.create_table(
    "orders",
    [
        column("id", "INTEGER", primary_key=True),
        column("user_id", "INTEGER"),
        column("total", "REAL"),
    ],
    constraints=[
        foreign_key("user_id", "users", "id", on_delete="CASCADE"),
    ],
).collect()

# Create indexes
db.create_index("idx_user_email", "users", "email").collect()
db.create_index("idx_order_user", "orders", "user_id").collect()
db.create_index("idx_order_user_status", "orders", ["user_id", "status"]).collect()

# Drop index
db.drop_index("idx_user_email", "users").collect()

📚 See detailed examples:

🔍 Schema Inspection & Reflection

Inspect and reflect existing database schemas without manually defining them.

Key Features:

List tables: db.get_table_names()
List views: db.get_view_names()
Get column metadata: db.get_columns("table_name")
Reflect single table: db.reflect_table("table_name")
Reflect entire database: db.reflect()
Full async support: All methods available on AsyncDatabase

Example:

# Get list of tables
tables = db.get_table_names()
# Output: ['users', 'orders', 'products']

# Get column information
columns = db.get_columns("users")
for col in columns:
    print(f"{col.name}: {col.type_name} (nullable={col.nullable}, pk={col.primary_key})")

# Reflect a single table
schema = db.reflect_table("users")
# Returns: TableSchema(name='users', columns=[ColumnDef(...), ...])

# Reflect entire database
all_schemas = db.reflect()
# Returns: {'users': TableSchema(...), 'orders': TableSchema(...)}

📚 See detailed examples:

Schema inspection and reflection

✏️ Data Mutations

Type-safe INSERT, UPDATE, DELETE, and MERGE operations with DataFrame-style syntax.

Key Features:

Insert: records.insert_into("table") or df.write.insertInto("table")
Update: update_rows(table, where=..., values={...}) or df.write.update()
Delete: delete_rows(table, where=...) or df.write.delete()
Merge (Upsert): merge_rows(table, data, on=[...], when_matched={...}, when_not_matched={...})
Transactions: with db.transaction() as txn: ...
Automatic batch operations for multiple rows

📚 See detailed examples:

📊 Result Formats

Moltres supports multiple result formats:

Records (default): List of dictionaries [{"id": 1, "name": "Alice"}, ...]
Pandas: df.collect(format="pandas") (requires pandas)
Polars: df.collect(format="polars") (requires polars)

Configure default format: db = connect("sqlite:///example.db", fetch_format="pandas")

⚙️ Configuration

Configure Moltres programmatically or via environment variables:

Programmatic:

db = connect(
    "sqlite:///example.db",
    echo=False,  # Enable SQL logging
    fetch_format="records",  # Default result format
    pool_size=5,  # Connection pool size
)
# Output: Database configured with custom settings

Environment Variables:

MOLTRES_DSN - Database connection string
MOLTRES_ECHO - Enable SQL logging (true/false)
MOLTRES_FETCH_FORMAT - Result format: "records", "pandas", or "polars"
MOLTRES_POOL_SIZE, MOLTRES_MAX_OVERFLOW, etc. - Connection pool settings

See connection examples for more details.

📈 Performance Monitoring

Optional performance monitoring hooks to track query execution:

from moltres.engine import register_performance_hook

def log_query(sql: str, elapsed: float, metadata: dict):
    print(f"Query took {elapsed:.3f}s, returned {metadata.get('rowcount', 0)} rows")

register_performance_hook("query_end", log_query)
# Output: Query took 0.000s, returned 2 rows (when query executes)

See the telemetry module for more details.

🔒 Security

Moltres includes built-in security features to prevent SQL injection:

SQL Identifier Validation - All table and column names are validated
Parameterized Queries - All user data is passed as parameters, never string concatenation
Input Sanitization - Comprehensive validation of identifiers and inputs

See docs/SECURITY.md for security best practices and guidelines.

📚 Examples

Comprehensive examples demonstrating all Moltres features:

01_connecting.py - Database connections (sync and async)
02_dataframe_basics.py - Basic DataFrame operations (select, filter, order by, limit)
03_async_dataframe.py - Asynchronous DataFrame operations
04_joins.py - Join operations (inner, left, with conditions)
05_groupby.py - GroupBy and aggregation operations
06_expressions.py - Column expressions, functions, and operators
07_file_reading.py - Reading files (CSV, JSON, JSONL, Parquet, Text)
08_file_writing.py - Writing DataFrames to files
09_table_operations.py - Table operations (create, drop, mutations)
10_create_dataframe.py - Creating DataFrames from Python data
11_window_functions.py - Window functions for analytical queries
12_sql_operations.py - Raw SQL and SQL operations (CTEs, unions, etc.)
13_transactions.py - Transaction management
14_reflection.py - Schema inspection and reflection

See the examples directory for all example files.

🛠️ Supported Operations

DataFrame Operations (PySpark-Compatible)

select() / selectExpr() - Project columns or SQL expressions
where() / filter() - Filter rows (supports SQL strings)
join() - Join with other DataFrames
group_by() / groupBy() - Group rows
agg() - Aggregate functions (supports strings and dictionaries)
order_by() / orderBy() / sort() - Sort rows
limit() - Limit number of rows
distinct() - Remove duplicate rows
withColumn() / withColumnRenamed() - Add or rename columns
pivot() - Pivot operations (including groupBy().pivot())
explode() - Explode array/JSON columns
db.sql() - Execute raw SQL queries

DataFrame Write Operations

df.write.insertInto("table") - Insert DataFrame into existing table (eager execution)
df.write.update("table", where=..., set={...}) - Update rows in table (eager execution)
df.write.delete("table", where=...) - Delete rows from table (eager execution)
df.write.save_as_table("table") / saveAsTable() - Write DataFrame to table (eager execution)

Column Expressions

Arithmetic: +, -, *, /, %
Comparisons: ==, !=, <, >, <=, >=
Boolean: &, |, ~
Functions: Comprehensive function library with 130+ functions including:
- Mathematical: pow(), sqrt(), abs(), floor(), ceil(), round(), sin(), cos(), tan(), log(), exp(), etc.
- String: concat(), upper(), lower(), substring(), trim(), length(), replace(), regexp_extract(), split(), etc.
- Date/Time: year(), month(), day(), hour(), minute(), second(), date_format(), to_date(), datediff(), date_add(), etc.
- Aggregate: sum(), avg(), min(), max(), count(), count_distinct(), stddev(), variance(), etc.
  - FILTER clause: Conditional aggregation with .filter() method (e.g., F.sum(col("amount")).filter(col("status") == "active"))
- Window: row_number(), rank(), dense_rank(), lag(), lead(), etc.
- Array: array(), array_length(), array_contains(), array_position(), etc.
- JSON: json_extract(), from_json(), to_json(), etc.
- Utility: coalesce(), greatest(), least(), when(), isnull(), isnotnull(), etc.
Window Functions: over(), partition_by(), order_by()

Supported SQL Dialects

✅ SQLite - Full support
✅ PostgreSQL - Full support with dialect-specific optimizations
✅ MySQL - Full support with dialect-specific optimizations
✅ Other SQLAlchemy-supported databases - ANSI SQL fallback

🧪 Development

Setup

# Clone the repository
git clone https://github.com/eddiethedean/moltres.git
cd moltres

# Install in development mode
pip install -e ".[dev]"

# Install pre-commit hooks
pre-commit install

Running Tests

# Run all tests
pytest

# Run tests in parallel
pytest -n 9

# Run with coverage
pytest --cov=src/moltres --cov-report=html

Code Quality

# Linting
ruff check .

# Formatting
ruff format .

# Type checking (strict mode enabled)
mypy src

📖 Documentation

Additional documentation is available:

Examples Directory - 14 comprehensive example files covering all features
Examples Guide - Common patterns and use cases
Why Moltres? - Understanding the gap Moltres fills
Security Guide - Security best practices
Troubleshooting - Common issues and solutions

🤝 Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

Quick Start:

Fork the repository
Create a feature branch (git checkout -b feature/amazing-feature)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

Before submitting:

Run tests: pytest
Check code quality: ruff check . && mypy src
Update documentation if needed

👤 Author

Odos Matthews

GitHub: @eddiethedean
Email: odosmatthews@gmail.com

🙏 Acknowledgments

Inspired by PySpark's DataFrame API style, but focused on SQL feature support rather than PySpark feature parity
Built on SQLAlchemy for database connectivity and SQL compilation
Thanks to all contributors and users

📄 License

MIT License - see LICENSE file for details.

Made with ❤️ for the Python data community

⬆ Back to Top

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.0

Apr 3, 2026

0.23.2

Dec 5, 2025

0.23.1

Dec 5, 2025

0.23.0

Dec 4, 2025

0.22.0

Dec 2, 2025

0.21.0

Dec 2, 2025

0.20.0

Dec 2, 2025

0.19.6

Dec 2, 2025

0.19.4

Dec 1, 2025

0.19.3

Dec 1, 2025

0.19.2

Nov 29, 2025

0.19.1

Nov 29, 2025

0.19.0

Nov 29, 2025

0.18.0

Nov 27, 2025

0.17.0

Nov 27, 2025

0.16.0

Nov 26, 2025

0.15.0

Nov 26, 2025

0.14.0

Nov 25, 2025

This version

0.13.0

Nov 25, 2025

0.12.0

Nov 25, 2025

0.11.0

Nov 24, 2025

0.10.0

Nov 24, 2025

0.9.0

Nov 24, 2025

0.8.0

Nov 22, 2025

0.7.0

Nov 22, 2025

0.6.0

Nov 21, 2025

0.5.0

Nov 21, 2025

0.4.0

Nov 21, 2025

0.3.0

Nov 20, 2025

0.2.0

Nov 20, 2025

0.1.0

Nov 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

moltres-0.13.0.tar.gz (174.4 kB view details)

Uploaded Nov 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

moltres-0.13.0-py3-none-any.whl (201.4 kB view details)

Uploaded Nov 25, 2025 Python 3

File details

Details for the file moltres-0.13.0.tar.gz.

File metadata

Download URL: moltres-0.13.0.tar.gz
Upload date: Nov 25, 2025
Size: 174.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for moltres-0.13.0.tar.gz
Algorithm	Hash digest
SHA256	`cb218123a1dc3256bdf34079af90f4f5c8d2320425a8c3e1bfd624a6e04baa39`
MD5	`a3dcf799c582fc726cfea057d46b5514`
BLAKE2b-256	`41057ef7a501f7ea7cbd8eeec8377472e2b222be3be9ee533974a1bc69d0a313`

See more details on using hashes here.

File details

Details for the file moltres-0.13.0-py3-none-any.whl.

File metadata

Download URL: moltres-0.13.0-py3-none-any.whl
Upload date: Nov 25, 2025
Size: 201.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for moltres-0.13.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1cd3e004d86dc41cbc004044177be664d658520875283bf4ad85e3b23cd48846`
MD5	`639c6447b091ae43feb0d0354db7f586`
BLAKE2b-256	`5cd8214adbe871846d08852fecb8138e71db942eb3d89a877021e4f0c3a29960`

See more details on using hashes here.

moltres 0.13.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Moltres

✨ Features

📦 Installation

🚀 Quick Start

Basic DataFrame Operations

Raw SQL & SQL Expressions

CRUD Operations

Async Support

📖 Core Concepts

Lazy Evaluation

Column Expressions

📥 Reading Data

📤 Writing Data

🌊 Streaming for Large Datasets

🗄️ Table Management

🔍 Schema Inspection & Reflection

✏️ Data Mutations

📊 Result Formats

⚙️ Configuration

📈 Performance Monitoring

🔒 Security

📚 Examples

🛠️ Supported Operations

DataFrame Operations (PySpark-Compatible)

DataFrame Write Operations

Column Expressions

Supported SQL Dialects

🧪 Development

Setup

Running Tests

Code Quality

📖 Documentation

🤝 Contributing

👤 Author

🙏 Acknowledgments

📄 License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes