Skip to main content

Embedded SQL OLAP engine for Python. Query Parquet, CSV, JSON, Arrow, Avro, Excel, and SQLite files directly with SQL, in-process. Zero server, no import step.

Project description

SlothDB

Run analytics faster.

SlothDB is an embedded SQL database that runs everywhere: on your laptop, on a server, and in the browser. Built from scratch as a DuckDB alternative. Up to 5x faster on real workloads (138 ms vs 540 ms on a 5-query warm JOIN batch; 5.43x peak on Avro SUM; 16-query suite median 1.70x). Built-in readers for Parquet, CSV, JSON, Avro, Arrow, Excel, and SQLite.

Join the SlothDB Discord

PyPI Downloads Downloads/month Python versions CI License PeerPush

SlothDB 60-second demo


Try it in 60 seconds

pip install slothdb
python -c "import slothdb; slothdb.demo()"

Generates a 100 000-row CSV, runs three queries, and prints the side-by-side with DuckDB shown above. No files to find, no setup.

Using your own files

import slothdb
db = slothdb.connect()
df = db.sql("SELECT region, SUM(revenue) FROM 'sales.parquet' GROUP BY region").fetchdf()

No server. No import step. No CREATE TABLE. Point SQL at files on disk.

What's new in 0.2.5

  • Nested aggregates work everywhere. ROUND(AVG(x)), AVG(x) + 1, SUM(x) / COUNT(*), CAST(SUM(y) AS DOUBLE) and similar shapes that wrap an aggregate inside a scalar function or arithmetic used to throw "Function execution for: AVG". Fixed.
  • ORDER BY by aggregate alias works. SELECT region, COUNT(*) AS cnt ... ORDER BY cnt DESC no longer silently sorts by column 0.
  • Arithmetic type promotion fixed. AVG(x) + 1 no longer drops the +1 and AVG(x) / COUNT(*) no longer returns inf.
  • 408 unit tests, 131,537 assertions, green on Windows, Linux, macOS.

Why SlothDB?

Same embedded model as DuckDB and SQLite. You link it into your process and point SQL at files. Different defaults:

  • 7 file formats built in - Parquet, CSV, JSON, Avro, Arrow, SQLite, Excel. DuckDB needs extensions for Avro and SQLite.
  • Faster than DuckDB on real workloads. 5-query warm JOIN batch: 138 ms vs 540 ms (3.9x). Peak speedups: 5.43x on Avro SUM, 5.08x on CSV COUNT(*), 2.83x on Parquet COUNT(*). Median across the 16-query suite: 1.70x. Full numbers on GitHub.
  • Stable C ABI. Numeric error codes don't shift between releases. Bindings built against 0.1.x keep working.
  • ~1-4 MB single binary, fully self-contained.

Quickstart

import slothdb

# In-memory
db = slothdb.connect()

# Query files directly
db.sql("SELECT * FROM 'data.csv' WHERE score > 90").show()
db.sql("SELECT COUNT(*) FROM 'logs.parquet'").show()
db.sql("SELECT * FROM read_json('events.json') LIMIT 5").show()
db.sql("SELECT * FROM sqlite_scan('app.db', 'users')").show()

# Persistent database
db = slothdb.connect("analytics.slothdb")

# DataFrame integration
df = db.sql("SELECT region, SUM(revenue) FROM 'sales.csv' GROUP BY region").fetchdf()

What's not production-ready yet

  • No multi-writer transactions (single writer, crash-safe checkpoint).
  • No distributed execution. Single-node embedded engine.
  • No secondary indexes. Scan-based execution; zone-map pruning helps on sorted data, but no B-tree / hash index for point lookups.
  • Window-function coverage is partial. Plain OVER / PARTITION BY works; ROWS BETWEEN ... frames and cumulative SUM OVER (ORDER BY) shapes have known gaps.
  • Authenticated S3 not implemented. s3:// URLs work for anonymous public-bucket reads only.
  • Some SQL corners still surprise you. Open an issue with a repro.
  • 0.2.x, about a year old. Treat as beta.

Performance

Format Query SlothDB DuckDB Speedup
Parquet COUNT(*) 12 ms 34 ms 2.83×
CSV COUNT(*) 33 ms 170 ms 5.08×
CSV GROUP BY region 100 ms 191 ms 1.91×
JSON SUM(revenue) 242 ms 314 ms 1.30×
Avro SUM(revenue) 140 ms 760 ms 5.43×
Avro GROUP BY region 170 ms 800 ms 4.71×

1M-row dataset, warm cache, 5-run median. Full 15-query table + methodology →

Community

There's a Discord: discord.gg/XJWyGmX5G. Bug reports, install help, weird query plans, "is this slower than it should be", feature ideas - any of it. The maintainer reads everything. GitHub issues are still the canonical tracker; the server is for the questions that come before you file one.

Links

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slothdb-0.2.5.tar.gz (14.1 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

slothdb-0.2.5-py3-none-win_amd64.whl (829.7 kB view details)

Uploaded Python 3Windows x86-64

slothdb-0.2.5-py3-none-manylinux2014_x86_64.whl (1.1 MB view details)

Uploaded Python 3

slothdb-0.2.5-py3-none-macosx_11_0_universal2.whl (1.8 MB view details)

Uploaded Python 3macOS 11.0+ universal2 (ARM64, x86-64)

File details

Details for the file slothdb-0.2.5.tar.gz.

File metadata

  • Download URL: slothdb-0.2.5.tar.gz
  • Upload date:
  • Size: 14.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for slothdb-0.2.5.tar.gz
Algorithm Hash digest
SHA256 3bcc79dfc31734b5ed222c2e07d6ca9e3e646ca13912625c33a6083788aab021
MD5 60d6aeff9fea30a7df347afbd860eb44
BLAKE2b-256 fd29187a76dd507ec37a1d5df7e9ce9214e069fdfc01c14fe1830590167de0de

See more details on using hashes here.

File details

Details for the file slothdb-0.2.5-py3-none-win_amd64.whl.

File metadata

  • Download URL: slothdb-0.2.5-py3-none-win_amd64.whl
  • Upload date:
  • Size: 829.7 kB
  • Tags: Python 3, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for slothdb-0.2.5-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 03f142a6cc94c98f14b2b5c6d63ce56ee8372885901e15c6db5c23347019185d
MD5 e01ac4a9fb99e580659a54e06af717db
BLAKE2b-256 38a0f0149411769073e03ca4f91417b201f020380d2b448d269c5ae3923ccfe4

See more details on using hashes here.

File details

Details for the file slothdb-0.2.5-py3-none-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for slothdb-0.2.5-py3-none-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 e1237d99f461776e86f9a415cf350aa7376f8cc9491170fa03a28d7a5ab73561
MD5 0e849f9da00850bec08ba95b73a57f57
BLAKE2b-256 83c047f28f2955c4f288d9f3295916dc7a30098f87fb3fac0029855332e36c21

See more details on using hashes here.

File details

Details for the file slothdb-0.2.5-py3-none-macosx_11_0_universal2.whl.

File metadata

File hashes

Hashes for slothdb-0.2.5-py3-none-macosx_11_0_universal2.whl
Algorithm Hash digest
SHA256 797bb5d90f71f5c2113225410a676f0d2aee56610fb55ade8400900c763dfe61
MD5 1ae25746757e0ad99b8ea079e6d01df2
BLAKE2b-256 f3c86a6bc2c92e467416bd4bbd40d6e1b8b86d0f10ace1f24b271151685dbaa2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page