Skip to main content

An embedded analytical database engine. Zero dependencies. GPU accelerated.

Project description

SlothDB

An embedded analytical database engine
Zero dependencies · Single file · GPU accelerated

CI Release License Stars


SlothDB is a fast, in-process OLAP database for analytics. It runs inside your application with no server, no setup, and no external dependencies. Query CSV, Parquet, JSON, Excel, and more — directly from SQL.

SELECT department, COUNT(*), AVG(salary)
FROM 'employees.parquet'
WHERE hire_year >= 2020
GROUP BY department
ORDER BY AVG(salary) DESC;

Installation

Platform Command
Linux / macOS curl -fsSL https://raw.githubusercontent.com/SouravRoy-ETL/slothdb/main/install.sh | bash
Ubuntu / Debian sudo dpkg -i slothdb_0.1.0_amd64.deb (download .deb)
Fedora / RHEL sudo rpm -i slothdb-0.1.0.rpm (build from spec)
Arch Linux makepkg -si (use PKGBUILD)
macOS (Homebrew) brew install --build-from-source packaging/homebrew/slothdb.rb
Windows Download slothdb.exe
Python pip install slothdb

Then just run:

slothdb

Build from source:

git clone https://github.com/SouravRoy-ETL/slothdb.git
cd slothdb
cmake -B build -DSLOTHDB_BUILD_SHELL=ON
cmake --build build --config Release
./build/src/Release/slothdb

Quick Start

$ ./slothdb

slothdb> CREATE TABLE t (name VARCHAR, score INTEGER);
slothdb> INSERT INTO t VALUES ('Alice', 95), ('Bob', 87), ('Charlie', 92);
slothdb> SELECT name, score, RANK() OVER (ORDER BY score DESC) FROM t;
name            | score           | expr
----------------+-----------------+----------------
Alice           | 95              | 1
Charlie         | 92              | 2
Bob             | 87              | 3

Query files without importing:

SELECT * FROM 'data.csv';                              -- CSV
SELECT * FROM read_parquet('logs/*.parquet');           -- Parquet with globs
SELECT * FROM read_json('events.json');                -- JSON
SELECT * FROM read_xlsx('report.xlsx');                -- Excel
SELECT * FROM sqlite_scan('app.db', 'users');          -- SQLite

COPY results TO 'output.parquet' WITH (FORMAT PARQUET); -- Export

Persistent database:

$ ./slothdb analytics.slothdb    # data saved automatically

Why Switch from DuckDB to SlothDB?

DuckDB is great. SlothDB is what comes next.

1. GPU Acceleration — 20-100x faster on large datasets

DuckDB runs on CPU only. SlothDB offloads aggregation, sorting, and filtering to your GPU — CUDA on NVIDIA, Metal on Apple Silicon. On a 10M-row GROUP BY, that's the difference between 5 seconds and 50 milliseconds.

-- This runs on GPU automatically when data > 100K rows
SELECT department, COUNT(*), AVG(salary) FROM employees GROUP BY department;

2. Your Extensions Will Never Break Again

DuckDB extensions break on every release because they depend on internal C++ APIs. Teams waste days fixing extensions after upgrades. SlothDB's stable C ABI guarantees backward compatibility — an extension built for v1.0 works on v1.1, v2.0, and beyond. Zero maintenance.

3. Errors You Can Actually Handle in Code

DuckDB throws free-form error strings that change between versions. Your error-handling code breaks silently. SlothDB gives every error a stable numeric code + category — catch ErrorCode::TABLE_NOT_FOUND (2000) instead of parsing "Table 'foo' not found".

try { db.sql("SELECT * FROM nonexistent"); }
catch (const SlothDBException &e) {
    if (e.GetCode() == ErrorCode::TABLE_NOT_FOUND) { /* handle */ }
    // Works in v1.0, v2.0, v10.0 — the code never changes.
}

4. Every File Format Built In — No Extensions to Install

DuckDB requires installing extensions for Excel, Avro, SQLite, and HTTP access. SlothDB ships everything out of the box:

SELECT * FROM 'report.xlsx';                           -- Excel (DuckDB: needs extension)
SELECT * FROM read_avro('events.avro');                -- Avro (DuckDB: needs extension)
SELECT * FROM sqlite_scan('app.db', 'users');          -- SQLite (DuckDB: needs extension)
SELECT * FROM read_csv('data/*.csv');                  -- Glob patterns

5. QUALIFY — Snowflake's Best Feature, Built In

Filter window function results without subqueries. One query instead of three:

-- Get the top earner per department — no subquery needed
SELECT name, department, salary
FROM employees
QUALIFY ROW_NUMBER() OVER (PARTITION BY department ORDER BY salary DESC) = 1;

Full Comparison

SlothDB DuckDB
GPU acceleration CUDA + Apple Metal (20-100x on large data) CPU only
Extension stability Stable C ABI — never breaks C++ internal API — breaks every release
Error handling Numeric codes, stable across versions Free-form strings, change between versions
Built-in formats CSV, Parquet, JSON, Arrow, Avro, Excel, SQLite CSV, Parquet, JSON (others need extensions)
QUALIFY clause Yes Yes
Crash-safe persistence Atomic checkpoint (write-then-rename) Yes
Memory safety Bounds-checked file parsing, DoS limits Some unchecked paths
Zero dependencies Yes Yes
SQL features 130+ 130+

Python

import slothdb

db = slothdb.connect()                    # in-memory
db = slothdb.connect("analytics.slothdb") # persistent

result = db.sql("""
    SELECT department, COUNT(*), AVG(salary) 
    FROM 'employees.csv' 
    GROUP BY department
""")
print(result)
df = result.fetchdf()  # → pandas DataFrame

C/C++ Embedding

#include "slothdb/api/slothdb.h"

slothdb_database *db;
slothdb_connection *conn;
slothdb_result *result;

slothdb_open("analytics.slothdb", &db);
slothdb_connect(db, &conn);
slothdb_query(conn, "SELECT 42 AS answer", &result);
printf("%d\n", slothdb_value_int32(result, 0, 0));
slothdb_free_result(result);
slothdb_disconnect(conn);
slothdb_close(db);

Features

  • 130+ SQL features — SELECT, JOINs, CTEs, window functions, aggregates, MERGE, EXPLAIN, transactions (full reference)
  • QUALIFY clause — filter on window function results (Snowflake-style)
  • 7 file formats — CSV, JSON, Parquet, Arrow, Avro, Excel, SQLite — all built-in, no extensions
  • GPU acceleration — CUDA (NVIDIA) and Metal (Apple Silicon) for large-scale analytics
  • Single-file persistence.slothdb format with auto-save
  • Query optimizer — constant folding, filter pushdown, TopN optimization
  • Vectorized execution — columnar engine processing 2,048 values per batch
  • Parallel execution — morsel-driven parallelism across all CPU cores
  • Compression — RLE, dictionary, bitpacking with zone maps for scan skipping
  • Extension system — stable C ABI for third-party extensions
  • 325 tests — 131,000+ assertions across all subsystems

Documentation

Development

cmake -B build -DSLOTHDB_BUILD_SHELL=ON -DSLOTHDB_BUILD_TESTS=ON
cmake --build build --config Release
ctest --test-dir build -C Release    # run 325 tests
Build Option Description
-DSLOTHDB_BUILD_SHELL=ON Build CLI
-DSLOTHDB_CUDA=ON Enable NVIDIA GPU
-DSLOTHDB_METAL=ON Enable Apple GPU
-DSLOTHDB_SANITIZERS=ON Enable ASan/UBSan

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

slothdb-0.1.2.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

slothdb-0.1.2-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file slothdb-0.1.2.tar.gz.

File metadata

  • Download URL: slothdb-0.1.2.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for slothdb-0.1.2.tar.gz
Algorithm Hash digest
SHA256 a4f44d609c0307b287d617035044990c6d9fc4d83d45e8010a7bf2c4a814d5af
MD5 8fe3946b1c3c06a4fa0d83ca2f832ba6
BLAKE2b-256 ab069888a7a3e8f8964463d965300e6f0d7697e6da53547d59340cfe5842efeb

See more details on using hashes here.

File details

Details for the file slothdb-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: slothdb-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for slothdb-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c188237307a3b785d8bcd699bcf6a296cddff064228d8fcc10e3f002a3ccf543
MD5 3ef3118de43ae1cb6ec368083877cec8
BLAKE2b-256 bd0d677f6ed6ff3b5321531f54bde5d602998abca1753caf406b3afd05efe273

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page