Embedded SQL OLAP engine for Python. Query Parquet, CSV, JSON, Arrow, Avro, Excel, and SQLite files directly with SQL, in-process. Zero server, no import step.
Project description
Run analytics faster.
SlothDB is an embedded SQL database that runs everywhere: on your laptop, on a server, and in the browser. Built from scratch as a DuckDB alternative. Up to 5x faster on real workloads (138 ms vs 540 ms on a 5-query warm JOIN batch; 5.43x peak on Avro SUM; 16-query suite median 1.70x). Built-in readers for Parquet, CSV, JSON, Avro, Arrow, Excel, and SQLite.
Try it in 60 seconds
pip install slothdb
python -c "import slothdb; slothdb.demo()"
Generates a 100 000-row CSV, runs three queries, and prints the side-by-side with DuckDB shown above. No files to find, no setup.
Using your own files
import slothdb
db = slothdb.connect()
df = db.sql("SELECT region, SUM(revenue) FROM 'sales.parquet' GROUP BY region").fetchdf()
No server. No import step. No CREATE TABLE. Point SQL at files on disk.
What's new in 0.2.7
- ClickBench re-measured. SlothDB completes 40 of the 43 official queries and is faster than DuckDB on 29 of those 40, geomean 1.24x. The previous "33 of 43" framing counted 8 queries where DuckDB rejected the input as wins for SlothDB. Removed. Raw per-query times: official_results.md.
- Four benchmark-fitted shortcuts deleted from the engine. Two were returning wrong results on inputs outside the benchmark.
- DATE and TIMESTAMP columns render as ISO strings (
2013-07-02,2013-07-15 12:40:00) instead of raw epoch integers. slothdb_version()and the shell--versionreport the release tag.- 424 doctest cases pass.
What's new in 0.2.5
- Nested aggregates work everywhere.
ROUND(AVG(x)),AVG(x) + 1,SUM(x) / COUNT(*),CAST(SUM(y) AS DOUBLE)and similar shapes that wrap an aggregate inside a scalar function or arithmetic used to throw "Function execution for: AVG". Fixed. ORDER BYby aggregate alias works.SELECT region, COUNT(*) AS cnt ... ORDER BY cnt DESCno longer silently sorts by column 0.- Arithmetic type promotion fixed.
AVG(x) + 1no longer drops the+1andAVG(x) / COUNT(*)no longer returnsinf. - 408 unit tests, 131,537 assertions, green on Windows, Linux, macOS.
Why SlothDB?
Same embedded model as DuckDB and SQLite. You link it into your process and point SQL at files. Different defaults:
- 7 file formats built in - Parquet, CSV, JSON, Avro, Arrow, SQLite, Excel. DuckDB needs extensions for Avro and SQLite.
- Faster than DuckDB on real workloads. 5-query warm JOIN batch: 138 ms vs 540 ms (3.9x). Peak speedups: 5.43x on Avro SUM, 5.08x on CSV COUNT(*), 2.83x on Parquet COUNT(*). Median across the 16-query suite: 1.70x. Full numbers on GitHub.
- Stable C ABI. Numeric error codes don't shift between releases. Bindings built against 0.1.x keep working.
- ~1-4 MB single binary, fully self-contained.
Quickstart
import slothdb
# In-memory
db = slothdb.connect()
# Query files directly
db.sql("SELECT * FROM 'data.csv' WHERE score > 90").show()
db.sql("SELECT COUNT(*) FROM 'logs.parquet'").show()
db.sql("SELECT * FROM read_json('events.json') LIMIT 5").show()
db.sql("SELECT * FROM sqlite_scan('app.db', 'users')").show()
# Persistent database
db = slothdb.connect("analytics.slothdb")
# DataFrame integration
df = db.sql("SELECT region, SUM(revenue) FROM 'sales.csv' GROUP BY region").fetchdf()
What's not production-ready yet
- No multi-writer transactions (single writer, crash-safe checkpoint).
- No distributed execution. Single-node embedded engine.
- No secondary indexes. Scan-based execution; zone-map pruning helps on sorted data, but no B-tree / hash index for point lookups.
- Window-function coverage is partial. Plain OVER / PARTITION BY works;
ROWS BETWEEN ...frames and cumulativeSUM OVER (ORDER BY)shapes have known gaps. - Authenticated S3 not implemented.
s3://URLs work for anonymous public-bucket reads only. - Some SQL corners still surprise you. Open an issue with a repro.
- 0.2.x, about a year old. Treat as beta.
Performance
| Format | Query | SlothDB | DuckDB | Speedup |
|---|---|---|---|---|
| Parquet | COUNT(*) |
12 ms | 34 ms | 2.83× |
| CSV | COUNT(*) |
33 ms | 170 ms | 5.08× |
| CSV | GROUP BY region |
100 ms | 191 ms | 1.91× |
| JSON | SUM(revenue) |
242 ms | 314 ms | 1.30× |
| Avro | SUM(revenue) |
140 ms | 760 ms | 5.43× |
| Avro | GROUP BY region |
170 ms | 800 ms | 4.71× |
1M-row dataset, warm cache, 5-run median. Full 15-query table + methodology →
Community
There's a Discord: discord.gg/XJWyGmX5G. Bug reports, install help, weird query plans, "is this slower than it should be", feature ideas - any of it. The maintainer reads everything. GitHub issues are still the canonical tracker; the server is for the questions that come before you file one.
Links
- Source: https://github.com/SouravRoy-ETL/slothdb
- Discord: https://discord.gg/XJWyGmX5G
- Changelog: https://github.com/SouravRoy-ETL/slothdb/blob/main/CHANGELOG.md
- Issues: https://github.com/SouravRoy-ETL/slothdb/issues
- SQL reference: https://github.com/SouravRoy-ETL/slothdb/blob/main/docs/DOCUMENTATION.md
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file slothdb-0.2.7.tar.gz.
File metadata
- Download URL: slothdb-0.2.7.tar.gz
- Upload date:
- Size: 14.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1fabaffd380971fca0d9843f072236a0030e340dc9957e240523e8bc35ad1f7
|
|
| MD5 |
319b2211e4c860d877c4704e93d8b4e5
|
|
| BLAKE2b-256 |
64fe21a9f315c8e2bef5ddb01c60b1a73995d3b198977eceddac0f0c2f25a895
|
File details
Details for the file slothdb-0.2.7-py3-none-win_amd64.whl.
File metadata
- Download URL: slothdb-0.2.7-py3-none-win_amd64.whl
- Upload date:
- Size: 1.1 MB
- Tags: Python 3, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
40cfa04d8e16ba933ef5f0d789b8a595a6c74314a5c9ceec870e65e7fa6d356b
|
|
| MD5 |
665ebb86fb6911f5d3fb9088e34e70f7
|
|
| BLAKE2b-256 |
cf61c7747f17a173731f17530399c5c213b51939230a3f81ad46aee0a38bcb89
|
File details
Details for the file slothdb-0.2.7-py3-none-manylinux2014_x86_64.whl.
File metadata
- Download URL: slothdb-0.2.7-py3-none-manylinux2014_x86_64.whl
- Upload date:
- Size: 1.5 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d1c24e858487843a715a10f1d15ed244f89933c5d3a20d095d734ba8da80794b
|
|
| MD5 |
18195cff61f38dfb8cb1c9d15b31d8e4
|
|
| BLAKE2b-256 |
4118ac05beacd048004c3672464bdefb10bcae78ee47e15f15817e86faa45818
|
File details
Details for the file slothdb-0.2.7-py3-none-macosx_11_0_universal2.whl.
File metadata
- Download URL: slothdb-0.2.7-py3-none-macosx_11_0_universal2.whl
- Upload date:
- Size: 2.7 MB
- Tags: Python 3, macOS 11.0+ universal2 (ARM64, x86-64)
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b46e3f9df097b21d07e4e9f463ae281a62a5c12ff50c58536dcfda8f63aa797e
|
|
| MD5 |
2621948b8ea07ba4826901cc8a51bf0a
|
|
| BLAKE2b-256 |
56639769e10e05a1a7fdfd2515cb1072ffeb6941c0f7927c2314c52316745ee2
|