Skip to main content

SQL interface for pandas DataFrames using SQLite with modern Python support

Project description

sequelframe

A lightweight Python library to run SQL queries on pandas DataFrames using SQLite.

Python Version License: MIT

Features

  • Full SQL on any pandas DataFrame — SELECT, INSERT, UPDATE, DELETE, CTEs, window functions
  • Multi-table support — load several DataFrames as separate tables and JOIN across them
  • In-memory mode — skip the temp file entirely for faster, zero-footprint usage
  • Transaction control — commit or roll back a group of writes atomically
  • Schema inspection — list tables, inspect columns
  • Parameterized queries for SQL injection protection
  • File format support: CSV, Excel, JSON, Parquet, Feather
  • Context manager support for automatic cleanup
  • Full type hints

Installation

pip install sequelframe

For Excel file support:

pip install sequelframe[excel]

Quick Start

import pandas as pd
from sequelframe import SequelFrame

df = pd.DataFrame({
    'name': ['Alice', 'Bob', 'Charlie'],
    'age':  [25, 30, 35],
    'city': ['New York', 'San Francisco', 'Los Angeles'],
})

with SequelFrame(df) as db:
    result = db.query('SELECT * FROM data WHERE age > 25')
    print(result)

Usage

Initialization

from sequelframe import SequelFrame

# From a pandas DataFrame
db = SequelFrame(df)

# From a file (CSV, Excel, JSON, Parquet, Feather)
db = SequelFrame('data.csv')
db = SequelFrame('data.xlsx')
db = SequelFrame('data.parquet')

# Custom table name
db = SequelFrame(df, table_name='customers')

# In-memory database — faster, no temp file written to disk
db = SequelFrame(df, in_memory=True)

# Custom directory for the temp .sqlite file
db = SequelFrame(df, temp_dir='/tmp/mydb')

# Print every SELECT result automatically
db = SequelFrame(df, verbose=True)

SELECT queries

# Basic SELECT — returns a DataFrame
result = db.query('SELECT * FROM data WHERE age > 25')

# runsql() is the same but also handles write statements
result = db.runsql('SELECT name, age FROM data ORDER BY age DESC')

# Parameterized queries (recommended — prevents SQL injection)
result = db.query('SELECT * FROM data WHERE name = ?', ('Alice',))
result = db.query('SELECT * FROM data WHERE age > ? AND city = ?', (25, 'New York'))

# CTEs and window functions work out of the box
result = db.query('''
    WITH ranked AS (
        SELECT name, age,
               ROW_NUMBER() OVER (ORDER BY age DESC) AS rank
        FROM data
    )
    SELECT * FROM ranked WHERE rank <= 3
''')

INSERT, UPDATE, DELETE

db.runsql("INSERT INTO data (name, age, city) VALUES ('David', 28, 'Boston')")

db.runsql("UPDATE data SET age = 31 WHERE name = 'Bob'")

db.runsql("DELETE FROM data WHERE age < 25")

# Parameterized writes
db.runsql("INSERT INTO data (name, age, city) VALUES (?, ?, ?)", ('Eve', 27, 'Seattle'))
db.runsql("UPDATE data SET age = ? WHERE name = ?", (32, 'Alice'))

After any write, db.df is automatically refreshed so it stays in sync.

Multiple tables and JOINs

Use add_table() to register additional DataFrames, then JOIN them in SQL:

customers = pd.DataFrame({
    'customer_id': [1, 2, 3],
    'name': ['Alice', 'Bob', 'Charlie'],
})
orders = pd.DataFrame({
    'order_id':   [101, 102, 103],
    'customer_id': [1, 1, 2],
    'amount':      [100, 150, 200],
})

with SequelFrame(customers, table_name='customers', in_memory=True) as db:
    db.add_table(orders, 'orders')   # chainable: db.add_table(...).add_table(...)

    result = db.query('''
        SELECT c.name,
               COUNT(o.order_id)  AS order_count,
               SUM(o.amount)      AS total
        FROM customers c
        LEFT JOIN orders o ON c.customer_id = o.customer_id
        GROUP BY c.customer_id, c.name
    ''')
    print(result)

Transactions

Wrap multiple writes in a single atomic transaction:

with SequelFrame(df, in_memory=True) as db:
    with db.transaction():
        db.runsql("INSERT INTO data VALUES ('Eve', 28, 'Miami')")
        db.runsql("UPDATE data SET age = age + 1 WHERE name = 'Alice'")
    # committed — db.df is refreshed automatically

    # Any exception inside rolls the whole block back:
    try:
        with db.transaction():
            db.runsql("DELETE FROM data")
            raise RuntimeError("oops")
    except RuntimeError:
        pass
    # table is still intact

Multi-statement scripts

db.execute_script('''
    ALTER TABLE data ADD COLUMN score REAL DEFAULT 0.0;
    UPDATE data SET score = age * 1.5;
''')

Schema inspection

# All tables in the database
db.tables()          # ['data', 'orders']

# Column info for a table
db.schema()          # primary table
db.schema('orders')  # specific table
# returns: cid | name | type | notnull | dflt_value | pk

Utility methods

db.show()           # all rows as a DataFrame
db.show(limit=10)   # first 10 rows

db.columns          # ['name', 'age', 'city']
db.shape            # (3, 3)
db.row_count        # live row count from the database

db.get_dataframe()  # copy of the current primary-table DataFrame
db.update_dataframe(new_df)  # replace primary table entirely

Context manager (recommended)

with SequelFrame('data.csv') as db:
    result = db.query('SELECT * FROM data WHERE age > 25')
    db.runsql('UPDATE data SET age = age + 1')
# connection closed and temp file deleted automatically

Manual cleanup

db = SequelFrame('data.csv')
try:
    result = db.query('SELECT * FROM data')
finally:
    db.kill()

Advanced examples

Aggregations

with SequelFrame('sales.csv') as db:
    report = db.query('''
        SELECT
            strftime('%Y-%m', date)  AS month,
            COUNT(*)                  AS num_sales,
            SUM(amount)               AS total_sales,
            AVG(amount)               AS avg_sale,
            MAX(amount)               AS max_sale
        FROM data
        GROUP BY month
        ORDER BY month DESC
    ''')

Window functions

with SequelFrame(df) as db:
    result = db.query('''
        SELECT
            name,
            age,
            ROW_NUMBER() OVER (ORDER BY age DESC) AS rank,
            AVG(age)     OVER ()                  AS avg_age
        FROM data
    ''')

What's new in 0.5.0

  • in_memory=True — use SQLite :memory: instead of a temp file
  • add_table(data, name) — register additional DataFrames for cross-table JOINs
  • tables() — list all tables in the database
  • schema(table_name) — inspect columns via PRAGMA table_info
  • transaction() — context manager for atomic commit / rollback
  • execute_script(sql) — run multiple ;-separated statements at once
  • row_count property — live row count from the database
  • Fixed: SELECT queries were being executed twice internally
  • Fixed: SQL comments at the start of a query (-- foo\nSELECT ...) were misclassified as write statements
  • Fixed: table names are now validated to prevent SQL injection

Migration from 0.4.x

Fully backward compatible. The only new required change: table names must now be valid SQL identifiers (letters, digits, underscores). Names with spaces or special characters will raise ValueError.

# Old style (still works)
from sequelframe import sequelframe
db = sequelframe('data.csv')
db.runsql('SELECT * FROM data')
db.kill()

# Recommended style
from sequelframe import SequelFrame
with SequelFrame('data.csv') as db:
    result = db.query('SELECT * FROM data')

Limitations

  • SQLite SQL dialect only (no RETURNING, no FULL OUTER JOIN prior to SQLite 3.39, etc.)
  • All data lives in memory — not suitable for datasets larger than available RAM
  • For very large datasets consider DuckDB or Polars

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

License

MIT License — see LICENSE for details.

Author

Sina Mirshahi

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sequelframe-0.5.0.tar.gz (14.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sequelframe-0.5.0-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file sequelframe-0.5.0.tar.gz.

File metadata

  • Download URL: sequelframe-0.5.0.tar.gz
  • Upload date:
  • Size: 14.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for sequelframe-0.5.0.tar.gz
Algorithm Hash digest
SHA256 64e28d54e101472f170d1658f40fc767eb8b20f6d53df12d22589ba44647cff2
MD5 37a8259d04ff2d84cbe9721cd9fa1ba8
BLAKE2b-256 b8b32c3fc957d6d2c62f3efc5eb93be6442279e60e90b468115b8ed39e6723f8

See more details on using hashes here.

File details

Details for the file sequelframe-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: sequelframe-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 11.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.9

File hashes

Hashes for sequelframe-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 91926cc2651321bfca7a09f9bd0c0c1af64e3131936b141cc2ed867b7cdef389
MD5 44cae1080c32e1485aba9effb7f307bd
BLAKE2b-256 cecbb0fd52d41be6c76f0b99179f254382fd7372594094619bea12c62de0d87f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page