Skip to main content

BumbleBee Datalog analytics engine

Project description

BumbleBee DB

A high-performance Datalog-based analytics engine — with SQL support — from Python.

BumbleBee DB is a lightweight, in-memory analytics engine powered by Datalog — a declarative logic language that makes recursive queries, graph analysis, and complex joins natural to express. It also supports SQL as an alternative query language, so you can mix and match both in the same session. Query CSV files, Parquet files, and pandas DataFrames through a simple Python API. Built in C++ with push-based execution, columnar storage, and multithreading, it delivers serious performance on a single machine.

Quick Start

Install

pip install bumblebeedb

Platform support: Pre-built wheels are available for Linux x86_64 and macOS ARM (Apple Silicon). If your platform is not supported, you can still use BumbleBee DB in Google Colab.

Your first query

import bumblebeedb as bb

db = bb.db()

# Query a CSV file with SQL — alias names the output predicate
db.sql("""
    SELECT DEPARTMENT_ID, COUNT(*) AS CNT, SUM(SALARY) AS TOTAL
    FROM "examples/data/employees.csv"
    GROUP BY DEPARTMENT_ID
""", alias="dept_stats")

# Get results as a pandas DataFrame
# 3 is the arity (number of columns) of the dept_stats predicate
df = db.get_table("dept_stats", 3).to_df(
    col_names=["dept_id", "count", "total_salary"]
)
print(df)

Recursive Datalog — something SQL can't easily do

BumbleBee supports recursive Datalog rules, enabling graph analysis, transitive closure, and hierarchical queries with a clean, declarative syntax:

import bumblebeedb as bb
import pandas as pd

hierarchy = pd.DataFrame({
    "manager": ["alice", "alice", "bob", "bob", "carol", "dave"],
    "report":  ["bob",   "carol", "dave", "eve", "frank", "grace"],
})

db = bb.db()

# Load a pandas DataFrame as a predicate that can be queried
db.load_df(hierarchy, "manages")

# Recursive Datalog: compute all direct and indirect reports
db.run("""
    reports_to(M, R) :- manages(M, R).
    reports_to(M, R) :- manages(M, X), reports_to(X, R).
    reports_to(X, Y)?
""")

df = db.get_table("reports_to", 2).to_df(col_names=["manager", "report"])
print("reports_to:")
print(df.sort_values(["manager", "report"]).to_string(index=False))
print()

# You can also run SQL on top of Datalog results — count reports per manager
db.sql("""
    SELECT COL_0, COUNT(*) AS CNT
    FROM reports_to
    GROUP BY COL_0
""", alias="report_count")

df2 = db.get_table("report_count", 2).to_df(col_names=["manager", "num_reports"])
print("num_reports:")
print(df2.sort_values("num_reports", ascending=False).to_string(index=False))
print()

Features

  • Python client librarypip install bumblebeedb, query and get DataFrames back
  • Dual query languages — SQL and Datalog, including recursive Datalog
  • High-performance engine — push-based execution, columnar storage, multithreading
  • Read and write CSV and Parquet — import data from and export results to CSV and Parquet files
  • DataFrame interop — load any pandas DataFrame as input via db.load_df(), meaning you can connect to virtually any data source (databases, APIs, Excel, in-memory structures) before handing it off for analysis
  • Command-line interface — run queries directly from the terminal

Python API

Method Description
db = bb.db(args={}) Create a new engine instance (only one per session). Optional args dict for CLI flags, e.g. {"-t": "4", "-d": ""}
db.run(program) Run a Datalog program
db.sql(query, alias="", overwrite=True) Run a SQL query. If alias is provided, wraps the query as (query) AS alias. If overwrite is True (default), any existing predicate with the same alias is replaced
db.run_file(filepath) Run a program from a file
db.load_df(df, alias) Load a pandas DataFrame as a predicate. Alias must start with a lowercase letter
db.explain(program) Return the generated Datalog rules as a string without executing
db.get_output_predicates() List all output predicates as (name, arity) tuples
db.get_table(name, arity=-1) Get a result table. Arity is optional
table.tuples() Get results as a list of tuples
table.to_df(col_names=[]) Get results as a pandas DataFrame. Column names are optional
db.remove_table(name, arity) Remove a predicate from the engine

Examples

Check out the examples/ folder for a collection of ready-to-use examples in Python, SQL, and Datalog, covering data imports, aggregations, joins, recursion, exports, and more.

CLI Quick Start

BumbleBee is also available as a standalone command-line tool.

Prerequisites

  • CMake 3.20+
  • C++20-compatible compiler (GCC 13+ or Clang 18+ with libc++)

Build

cmake -S . -B cmake-build-release -DCMAKE_BUILD_TYPE=Release
cmake --build cmake-build-release --target BumbleBee -j 8
./cmake-build-release/BumbleBee --help

Run a query

cd examples

# SQL
../cmake-build-release/BumbleBee -i sql/01_import/basic_csv_import.sql

# Datalog
../cmake-build-release/BumbleBee -i dl/01_import/basic_csv_import.dl -a
Flag Description
-a Print all predicates
-t N Use N threads (default: all cores)
-r Print profiling data
--print-program Print the generated Datalog program and exit

Optimizer

BumbleBee DB includes a rule-based query optimizer that applies logical rewrites such as filter push-down and column pruning. The current optimizer does not reorder joins — the execution order follows the join sequence as written in the query. A cost-based join reordering optimizer is planned as a future enhancement.

Roadmap

  • Code generation: cover complex AND and OR clause combinations in filter expressions
  • Predicate table types: allow users to explicitly declare column types for predicates (currently auto-deduced at runtime)
  • Left and right joins: support for LEFT JOIN and RIGHT JOIN
  • Sort-merge join: alternative to hash joins for pre-sorted or range-based workloads
  • NULL handling: NULL values across all operators, including NULL-safe comparisons and aggregations
  • Cost-based join optimizer: reorder joins based on cardinality estimates

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

bumblebeedb-0.2.0-cp313-cp313-manylinux_2_34_x86_64.whl (6.6 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

bumblebeedb-0.2.0-cp313-cp313-macosx_14_0_arm64.whl (5.3 MB view details)

Uploaded CPython 3.13macOS 14.0+ ARM64

bumblebeedb-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl (6.6 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

bumblebeedb-0.2.0-cp312-cp312-macosx_14_0_arm64.whl (5.3 MB view details)

Uploaded CPython 3.12macOS 14.0+ ARM64

bumblebeedb-0.2.0-cp311-cp311-manylinux_2_34_x86_64.whl (6.6 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

bumblebeedb-0.2.0-cp311-cp311-macosx_14_0_arm64.whl (5.3 MB view details)

Uploaded CPython 3.11macOS 14.0+ ARM64

bumblebeedb-0.2.0-cp310-cp310-manylinux_2_34_x86_64.whl (6.6 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

bumblebeedb-0.2.0-cp310-cp310-macosx_14_0_arm64.whl (5.3 MB view details)

Uploaded CPython 3.10macOS 14.0+ ARM64

File details

Details for the file bumblebeedb-0.2.0-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for bumblebeedb-0.2.0-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 889804af61c7b5d726701552638956c8f36da1c97342d2a5a08496531bd5d080
MD5 38e7f5d235a65ca0189b7c3bbfeaccec
BLAKE2b-256 1a60e519526cd77be4e3656ee4ad3ff668e4beb04c9748e84c878afe0deeeb1b

See more details on using hashes here.

Provenance

The following attestation bundles were made for bumblebeedb-0.2.0-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: publish.yml on dave90/BumbleBee

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bumblebeedb-0.2.0-cp313-cp313-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for bumblebeedb-0.2.0-cp313-cp313-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 af3eff266bfe1dc5f2610e2dcd74ae806eeea8b8833e149248353e23a9811591
MD5 3504f0e2397075340f54b3c4a6c1f12f
BLAKE2b-256 e19d1a0dfda09d25bb65b8975c02f306c6d0ced496bb15d8e1bd08a45792b154

See more details on using hashes here.

Provenance

The following attestation bundles were made for bumblebeedb-0.2.0-cp313-cp313-macosx_14_0_arm64.whl:

Publisher: publish.yml on dave90/BumbleBee

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bumblebeedb-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for bumblebeedb-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 18dfbc5380a5a3f860935971d18fa72cbb19545030984f6177c451098685258a
MD5 6a2be54ec689b05d787cbbdbc19dc7ae
BLAKE2b-256 60b09784244a86b373285e7083ffb31b69dc1bbeea64d1406a6deae76674d203

See more details on using hashes here.

Provenance

The following attestation bundles were made for bumblebeedb-0.2.0-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: publish.yml on dave90/BumbleBee

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bumblebeedb-0.2.0-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for bumblebeedb-0.2.0-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 31d3c95d2415a1763c0846441c52954e54a874c69199df98d2746d1c9e679366
MD5 cb26fe7934ea8ce92ccfa066caf0066c
BLAKE2b-256 da7876c308fc4b8d3442bece1817641ceac32a3cdc1964022d58c8ba18a0a524

See more details on using hashes here.

Provenance

The following attestation bundles were made for bumblebeedb-0.2.0-cp312-cp312-macosx_14_0_arm64.whl:

Publisher: publish.yml on dave90/BumbleBee

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bumblebeedb-0.2.0-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for bumblebeedb-0.2.0-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 4ef798eb369b4a11673eeb6c401d5a4b2ef03d81584b9ce7e1c4ef509b1ce8de
MD5 b90117e6741ccfd387c332fd3262bdc3
BLAKE2b-256 c95ba238d862da245bc66f6abacbdcede7100a95c4f64cd881d778bd6527da4c

See more details on using hashes here.

Provenance

The following attestation bundles were made for bumblebeedb-0.2.0-cp311-cp311-manylinux_2_34_x86_64.whl:

Publisher: publish.yml on dave90/BumbleBee

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bumblebeedb-0.2.0-cp311-cp311-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for bumblebeedb-0.2.0-cp311-cp311-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 d5df374fe45ac858d2d6f82a52032df08c3bb981904e2ebb96bef4441c2b4891
MD5 1905c8739aaa59f32848c383baaf3f29
BLAKE2b-256 ce233f84c7a490f0034e208c2b923cfe1fccf6d4e7466541e3bd26149870bdea

See more details on using hashes here.

Provenance

The following attestation bundles were made for bumblebeedb-0.2.0-cp311-cp311-macosx_14_0_arm64.whl:

Publisher: publish.yml on dave90/BumbleBee

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bumblebeedb-0.2.0-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for bumblebeedb-0.2.0-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 a16db78b035080c7cb9573b3c2ad42a204a6c9b780d52ed87c951bc7092fc400
MD5 6f10efa9dabb2cceb92617f21b98c986
BLAKE2b-256 93508b1a98be194ea3a2e85e5400aeb8dae525404f16e689d1782f1b63412d1a

See more details on using hashes here.

Provenance

The following attestation bundles were made for bumblebeedb-0.2.0-cp310-cp310-manylinux_2_34_x86_64.whl:

Publisher: publish.yml on dave90/BumbleBee

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bumblebeedb-0.2.0-cp310-cp310-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for bumblebeedb-0.2.0-cp310-cp310-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 ef619c899f4d22d1be317c4e02e2d23344d3861d0b1897684c5a1665c42409e0
MD5 21ce0ff1eaefa7fc70d322e1e5f265cb
BLAKE2b-256 b40f357979a0babe52f01d6a4840cb82c63bf6f7f95d7c8704c21cd7c8dd0a97

See more details on using hashes here.

Provenance

The following attestation bundles were made for bumblebeedb-0.2.0-cp310-cp310-macosx_14_0_arm64.whl:

Publisher: publish.yml on dave90/BumbleBee

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page