BumbleBee Datalog analytics engine
Project description
BumbleBee DB
A high-performance Datalog-based analytics engine — with SQL support — from Python.
BumbleBee DB is a lightweight, in-memory analytics engine powered by Datalog — a declarative logic language that makes recursive queries, graph analysis, and complex joins natural to express. It also supports SQL as an alternative query language, so you can mix and match both in the same session. Query CSV files, Parquet files, and pandas DataFrames through a simple Python API. Built in C++ with push-based execution, columnar storage, and multithreading, it delivers serious performance on a single machine.
Quick Start
Install
pip install bumblebeedb
Platform support: Pre-built wheels are available for Linux x86_64 and macOS ARM (Apple Silicon). If your platform is not supported, you can still use BumbleBee DB in Google Colab.
Your first query
import bumblebeedb as bb
db = bb.db()
# Query a CSV file with SQL — alias names the output predicate
db.sql("""
SELECT DEPARTMENT_ID, COUNT(*) AS CNT, SUM(SALARY) AS TOTAL
FROM "examples/data/employees.csv"
GROUP BY DEPARTMENT_ID
""", alias="dept_stats")
# Get results as a pandas DataFrame
# 3 is the arity (number of columns) of the dept_stats predicate
df = db.get_table("dept_stats", 3).to_df(
col_names=["dept_id", "count", "total_salary"]
)
print(df)
Recursive Datalog — something SQL can't easily do
BumbleBee supports recursive Datalog rules, enabling graph analysis, transitive closure, and hierarchical queries with a clean, declarative syntax:
import bumblebeedb as bb
import pandas as pd
hierarchy = pd.DataFrame({
"manager": ["alice", "alice", "bob", "bob", "carol", "dave"],
"report": ["bob", "carol", "dave", "eve", "frank", "grace"],
})
db = bb.db()
# Load a pandas DataFrame as a predicate that can be queried
db.load_df(hierarchy, "manages")
# Recursive Datalog: compute all direct and indirect reports
db.run("""
reports_to(M, R) :- manages(M, R).
reports_to(M, R) :- manages(M, X), reports_to(X, R).
reports_to(X, Y)?
""")
df = db.get_table("reports_to", 2).to_df(col_names=["manager", "report"])
print("reports_to:")
print(df.sort_values(["manager", "report"]).to_string(index=False))
print()
# You can also run SQL on top of Datalog results — count reports per manager
db.sql("""
SELECT COL_0, COUNT(*) AS CNT
FROM reports_to
GROUP BY COL_0
""", alias="report_count")
df2 = db.get_table("report_count", 2).to_df(col_names=["manager", "num_reports"])
print("num_reports:")
print(df2.sort_values("num_reports", ascending=False).to_string(index=False))
print()
Features
- Python client library —
pip install bumblebeedb, query and get DataFrames back - Dual query languages — SQL and Datalog, including recursive Datalog
- High-performance engine — push-based execution, columnar storage, multithreading
- Read and write CSV and Parquet — import data from and export results to CSV and Parquet files
- DataFrame interop — load any pandas DataFrame as input via
db.load_df(), meaning you can connect to virtually any data source (databases, APIs, Excel, in-memory structures) before handing it off for analysis - Command-line interface — run queries directly from the terminal
Python API
| Method | Description |
|---|---|
db = bb.db(args={}) |
Create a new engine instance (only one per session). Optional args dict for CLI flags, e.g. {"-t": "4", "-d": ""} |
db.run(program) |
Run a Datalog program |
db.sql(query, alias="", overwrite=True) |
Run a SQL query. If alias is provided, wraps the query as (query) AS alias. If overwrite is True (default), any existing predicate with the same alias is replaced |
db.run_file(filepath) |
Run a program from a file |
db.load_df(df, alias) |
Load a pandas DataFrame as a predicate. Alias must start with a lowercase letter |
db.explain(program) |
Return the generated Datalog rules as a string without executing |
db.get_output_predicates() |
List all output predicates as (name, arity) tuples |
db.get_table(name, arity=-1) |
Get a result table. Arity is optional |
table.tuples() |
Get results as a list of tuples |
table.to_df(col_names=[]) |
Get results as a pandas DataFrame. Column names are optional |
db.remove_table(name, arity) |
Remove a predicate from the engine |
Examples
Check out the examples/ folder for a collection of ready-to-use examples in Python, SQL, and Datalog, covering data imports, aggregations, joins, recursion, exports, and more.
CLI Quick Start
BumbleBee is also available as a standalone command-line tool.
Prerequisites
- CMake 3.20+
- C++20-compatible compiler (GCC 13+ or Clang 18+ with libc++)
Build
cmake -S . -B cmake-build-release -DCMAKE_BUILD_TYPE=Release
cmake --build cmake-build-release --target BumbleBee -j 8
./cmake-build-release/BumbleBee --help
Run a query
cd examples
# SQL
../cmake-build-release/BumbleBee -i sql/01_import/basic_csv_import.sql
# Datalog
../cmake-build-release/BumbleBee -i dl/01_import/basic_csv_import.dl -a
| Flag | Description |
|---|---|
-a |
Print all predicates |
-t N |
Use N threads (default: all cores) |
-r |
Print profiling data |
--print-program |
Print the generated Datalog program and exit |
Optimizer
BumbleBee DB includes a rule-based query optimizer that applies logical rewrites such as filter push-down and column pruning. The current optimizer does not reorder joins — the execution order follows the join sequence as written in the query. A cost-based join reordering optimizer is planned as a future enhancement.
Roadmap
- Code generation: cover complex AND and OR clause combinations in filter expressions
- Predicate table types: allow users to explicitly declare column types for predicates (currently auto-deduced at runtime)
- Left and right joins: support for LEFT JOIN and RIGHT JOIN
- Sort-merge join: alternative to hash joins for pre-sorted or range-based workloads
- NULL handling: NULL values across all operators, including NULL-safe comparisons and aggregations
- Cost-based join optimizer: reorder joins based on cardinality estimates
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bumblebeedb-0.2.0-cp313-cp313-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: bumblebeedb-0.2.0-cp313-cp313-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 6.6 MB
- Tags: CPython 3.13, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
889804af61c7b5d726701552638956c8f36da1c97342d2a5a08496531bd5d080
|
|
| MD5 |
38e7f5d235a65ca0189b7c3bbfeaccec
|
|
| BLAKE2b-256 |
1a60e519526cd77be4e3656ee4ad3ff668e4beb04c9748e84c878afe0deeeb1b
|
Provenance
The following attestation bundles were made for bumblebeedb-0.2.0-cp313-cp313-manylinux_2_34_x86_64.whl:
Publisher:
publish.yml on dave90/BumbleBee
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bumblebeedb-0.2.0-cp313-cp313-manylinux_2_34_x86_64.whl -
Subject digest:
889804af61c7b5d726701552638956c8f36da1c97342d2a5a08496531bd5d080 - Sigstore transparency entry: 1079070854
- Sigstore integration time:
-
Permalink:
dave90/BumbleBee@74c660fdca1b6dbe7c0c6ed386008f4c28b79d32 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/dave90
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@74c660fdca1b6dbe7c0c6ed386008f4c28b79d32 -
Trigger Event:
release
-
Statement type:
File details
Details for the file bumblebeedb-0.2.0-cp313-cp313-macosx_14_0_arm64.whl.
File metadata
- Download URL: bumblebeedb-0.2.0-cp313-cp313-macosx_14_0_arm64.whl
- Upload date:
- Size: 5.3 MB
- Tags: CPython 3.13, macOS 14.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
af3eff266bfe1dc5f2610e2dcd74ae806eeea8b8833e149248353e23a9811591
|
|
| MD5 |
3504f0e2397075340f54b3c4a6c1f12f
|
|
| BLAKE2b-256 |
e19d1a0dfda09d25bb65b8975c02f306c6d0ced496bb15d8e1bd08a45792b154
|
Provenance
The following attestation bundles were made for bumblebeedb-0.2.0-cp313-cp313-macosx_14_0_arm64.whl:
Publisher:
publish.yml on dave90/BumbleBee
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bumblebeedb-0.2.0-cp313-cp313-macosx_14_0_arm64.whl -
Subject digest:
af3eff266bfe1dc5f2610e2dcd74ae806eeea8b8833e149248353e23a9811591 - Sigstore transparency entry: 1079070863
- Sigstore integration time:
-
Permalink:
dave90/BumbleBee@74c660fdca1b6dbe7c0c6ed386008f4c28b79d32 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/dave90
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@74c660fdca1b6dbe7c0c6ed386008f4c28b79d32 -
Trigger Event:
release
-
Statement type:
File details
Details for the file bumblebeedb-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: bumblebeedb-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 6.6 MB
- Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18dfbc5380a5a3f860935971d18fa72cbb19545030984f6177c451098685258a
|
|
| MD5 |
6a2be54ec689b05d787cbbdbc19dc7ae
|
|
| BLAKE2b-256 |
60b09784244a86b373285e7083ffb31b69dc1bbeea64d1406a6deae76674d203
|
Provenance
The following attestation bundles were made for bumblebeedb-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl:
Publisher:
publish.yml on dave90/BumbleBee
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bumblebeedb-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl -
Subject digest:
18dfbc5380a5a3f860935971d18fa72cbb19545030984f6177c451098685258a - Sigstore transparency entry: 1079070828
- Sigstore integration time:
-
Permalink:
dave90/BumbleBee@74c660fdca1b6dbe7c0c6ed386008f4c28b79d32 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/dave90
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@74c660fdca1b6dbe7c0c6ed386008f4c28b79d32 -
Trigger Event:
release
-
Statement type:
File details
Details for the file bumblebeedb-0.2.0-cp312-cp312-macosx_14_0_arm64.whl.
File metadata
- Download URL: bumblebeedb-0.2.0-cp312-cp312-macosx_14_0_arm64.whl
- Upload date:
- Size: 5.3 MB
- Tags: CPython 3.12, macOS 14.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
31d3c95d2415a1763c0846441c52954e54a874c69199df98d2746d1c9e679366
|
|
| MD5 |
cb26fe7934ea8ce92ccfa066caf0066c
|
|
| BLAKE2b-256 |
da7876c308fc4b8d3442bece1817641ceac32a3cdc1964022d58c8ba18a0a524
|
Provenance
The following attestation bundles were made for bumblebeedb-0.2.0-cp312-cp312-macosx_14_0_arm64.whl:
Publisher:
publish.yml on dave90/BumbleBee
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bumblebeedb-0.2.0-cp312-cp312-macosx_14_0_arm64.whl -
Subject digest:
31d3c95d2415a1763c0846441c52954e54a874c69199df98d2746d1c9e679366 - Sigstore transparency entry: 1079070846
- Sigstore integration time:
-
Permalink:
dave90/BumbleBee@74c660fdca1b6dbe7c0c6ed386008f4c28b79d32 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/dave90
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@74c660fdca1b6dbe7c0c6ed386008f4c28b79d32 -
Trigger Event:
release
-
Statement type:
File details
Details for the file bumblebeedb-0.2.0-cp311-cp311-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: bumblebeedb-0.2.0-cp311-cp311-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 6.6 MB
- Tags: CPython 3.11, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ef798eb369b4a11673eeb6c401d5a4b2ef03d81584b9ce7e1c4ef509b1ce8de
|
|
| MD5 |
b90117e6741ccfd387c332fd3262bdc3
|
|
| BLAKE2b-256 |
c95ba238d862da245bc66f6abacbdcede7100a95c4f64cd881d778bd6527da4c
|
Provenance
The following attestation bundles were made for bumblebeedb-0.2.0-cp311-cp311-manylinux_2_34_x86_64.whl:
Publisher:
publish.yml on dave90/BumbleBee
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bumblebeedb-0.2.0-cp311-cp311-manylinux_2_34_x86_64.whl -
Subject digest:
4ef798eb369b4a11673eeb6c401d5a4b2ef03d81584b9ce7e1c4ef509b1ce8de - Sigstore transparency entry: 1079070833
- Sigstore integration time:
-
Permalink:
dave90/BumbleBee@74c660fdca1b6dbe7c0c6ed386008f4c28b79d32 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/dave90
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@74c660fdca1b6dbe7c0c6ed386008f4c28b79d32 -
Trigger Event:
release
-
Statement type:
File details
Details for the file bumblebeedb-0.2.0-cp311-cp311-macosx_14_0_arm64.whl.
File metadata
- Download URL: bumblebeedb-0.2.0-cp311-cp311-macosx_14_0_arm64.whl
- Upload date:
- Size: 5.3 MB
- Tags: CPython 3.11, macOS 14.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d5df374fe45ac858d2d6f82a52032df08c3bb981904e2ebb96bef4441c2b4891
|
|
| MD5 |
1905c8739aaa59f32848c383baaf3f29
|
|
| BLAKE2b-256 |
ce233f84c7a490f0034e208c2b923cfe1fccf6d4e7466541e3bd26149870bdea
|
Provenance
The following attestation bundles were made for bumblebeedb-0.2.0-cp311-cp311-macosx_14_0_arm64.whl:
Publisher:
publish.yml on dave90/BumbleBee
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bumblebeedb-0.2.0-cp311-cp311-macosx_14_0_arm64.whl -
Subject digest:
d5df374fe45ac858d2d6f82a52032df08c3bb981904e2ebb96bef4441c2b4891 - Sigstore transparency entry: 1079070849
- Sigstore integration time:
-
Permalink:
dave90/BumbleBee@74c660fdca1b6dbe7c0c6ed386008f4c28b79d32 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/dave90
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@74c660fdca1b6dbe7c0c6ed386008f4c28b79d32 -
Trigger Event:
release
-
Statement type:
File details
Details for the file bumblebeedb-0.2.0-cp310-cp310-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: bumblebeedb-0.2.0-cp310-cp310-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 6.6 MB
- Tags: CPython 3.10, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a16db78b035080c7cb9573b3c2ad42a204a6c9b780d52ed87c951bc7092fc400
|
|
| MD5 |
6f10efa9dabb2cceb92617f21b98c986
|
|
| BLAKE2b-256 |
93508b1a98be194ea3a2e85e5400aeb8dae525404f16e689d1782f1b63412d1a
|
Provenance
The following attestation bundles were made for bumblebeedb-0.2.0-cp310-cp310-manylinux_2_34_x86_64.whl:
Publisher:
publish.yml on dave90/BumbleBee
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bumblebeedb-0.2.0-cp310-cp310-manylinux_2_34_x86_64.whl -
Subject digest:
a16db78b035080c7cb9573b3c2ad42a204a6c9b780d52ed87c951bc7092fc400 - Sigstore transparency entry: 1079070870
- Sigstore integration time:
-
Permalink:
dave90/BumbleBee@74c660fdca1b6dbe7c0c6ed386008f4c28b79d32 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/dave90
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@74c660fdca1b6dbe7c0c6ed386008f4c28b79d32 -
Trigger Event:
release
-
Statement type:
File details
Details for the file bumblebeedb-0.2.0-cp310-cp310-macosx_14_0_arm64.whl.
File metadata
- Download URL: bumblebeedb-0.2.0-cp310-cp310-macosx_14_0_arm64.whl
- Upload date:
- Size: 5.3 MB
- Tags: CPython 3.10, macOS 14.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef619c899f4d22d1be317c4e02e2d23344d3861d0b1897684c5a1665c42409e0
|
|
| MD5 |
21ce0ff1eaefa7fc70d322e1e5f265cb
|
|
| BLAKE2b-256 |
b40f357979a0babe52f01d6a4840cb82c63bf6f7f95d7c8704c21cd7c8dd0a97
|
Provenance
The following attestation bundles were made for bumblebeedb-0.2.0-cp310-cp310-macosx_14_0_arm64.whl:
Publisher:
publish.yml on dave90/BumbleBee
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bumblebeedb-0.2.0-cp310-cp310-macosx_14_0_arm64.whl -
Subject digest:
ef619c899f4d22d1be317c4e02e2d23344d3861d0b1897684c5a1665c42409e0 - Sigstore transparency entry: 1079070842
- Sigstore integration time:
-
Permalink:
dave90/BumbleBee@74c660fdca1b6dbe7c0c6ed386008f4c28b79d32 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/dave90
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@74c660fdca1b6dbe7c0c6ed386008f4c28b79d32 -
Trigger Event:
release
-
Statement type: