Skip to main content

BumbleBee Datalog analytics engine

Project description

BumbleBee DB

A high-performance Datalog-based analytics engine — with SQL support — from Python.

BumbleBee DB is a lightweight, in-memory analytics engine powered by Datalog — a declarative logic language that makes recursive queries, graph analysis, and complex joins natural to express. It also supports SQL as an alternative query language, so you can mix and match both in the same session. Query CSV files, Parquet files, and pandas DataFrames through a simple Python API. Built in C++ with push-based execution, columnar storage, and multithreading, it delivers serious performance on a single machine.

Quick Start

Install

pip install bumblebeedb

Platform support: Pre-built wheels are available for Linux x86_64 and macOS ARM (Apple Silicon). If your platform is not supported, you can still use BumbleBee DB in Google Colab.

Your first query

import bumblebeedb as bb

db = bb.db()

# Query a CSV file with SQL — alias names the output predicate
db.sql("""
    SELECT DEPARTMENT_ID, COUNT(*) AS CNT, SUM(SALARY) AS TOTAL
    FROM "examples/data/employees.csv"
    GROUP BY DEPARTMENT_ID
""", alias="dept_stats")

# Get results as a pandas DataFrame
# 3 is the arity (number of columns) of the dept_stats predicate
df = db.get_table("dept_stats", 3).to_df(
    col_names=["dept_id", "count", "total_salary"]
)
print(df)

Recursive Datalog — something SQL can't easily do

BumbleBee supports recursive Datalog rules, enabling graph analysis, transitive closure, and hierarchical queries with a clean, declarative syntax:

import bumblebeedb as bb
import pandas as pd

hierarchy = pd.DataFrame({
    "manager": ["alice", "alice", "bob", "bob", "carol", "dave"],
    "report":  ["bob",   "carol", "dave", "eve", "frank", "grace"],
})

db = bb.db()

# Load a pandas DataFrame as a predicate that can be queried
db.load_df(hierarchy, "manages")

# Recursive Datalog: compute all direct and indirect reports
db.run("""
    reports_to(M, R) :- manages(M, R).
    reports_to(M, R) :- manages(M, X), reports_to(X, R).
    reports_to(X, Y)?
""")

df = db.get_table("reports_to", 2).to_df(col_names=["manager", "report"])
print("reports_to:")
print(df.sort_values(["manager", "report"]).to_string(index=False))
print()

# You can also run SQL on top of Datalog results — count reports per manager
db.sql("""
    SELECT COL_0, COUNT(*) AS CNT
    FROM reports_to
    GROUP BY COL_0
""", alias="report_count")

df2 = db.get_table("report_count", 2).to_df(col_names=["manager", "num_reports"])
print("num_reports:")
print(df2.sort_values("num_reports", ascending=False).to_string(index=False))
print()

Features

  • Python client librarypip install bumblebeedb, query and get DataFrames back
  • Dual query languages — SQL and Datalog, including recursive Datalog
  • High-performance engine — push-based execution, columnar storage, multithreading
  • Read and write CSV and Parquet — import data from and export results to CSV and Parquet files
  • Native NULL support — NULL across all operators: IS NULL/IS NOT NULL, NULL-aware aggregates, three-valued comparisons, and CSV/Parquet/pandas (None/pd.NA) round-trip
  • DataFrame interop — load any pandas DataFrame as input via db.load_df(), meaning you can connect to virtually any data source (databases, APIs, Excel, in-memory structures) before handing it off for analysis
  • Command-line interface — run queries directly from the terminal

Python API

Method Description
db = bb.db(args={}) Create a new engine instance (only one per session). Optional args dict for CLI flags, e.g. {"-t": "4", "-d": ""}
db.run(program) Run a Datalog program
db.sql(query, alias="", overwrite=True) Run a SQL query. If alias is provided, wraps the query as (query) AS alias. If overwrite is True (default), any existing predicate with the same alias is replaced
db.run_file(filepath) Run a program from a file
db.load_df(df, alias) Load a pandas DataFrame as a predicate. Alias must start with a lowercase letter
db.explain(program) Return the generated Datalog rules as a string without executing
db.get_output_predicates() List all output predicates as (name, arity) tuples
db.get_table(name, arity=-1) Get a result table. Arity is optional
table.tuples() Get results as a list of tuples
table.to_df(col_names=[]) Get results as a pandas DataFrame. Column names are optional
db.remove_table(name, arity) Remove a predicate from the engine

Examples

Check out the examples/ folder for a collection of ready-to-use examples in Python, SQL, and Datalog, covering data imports, aggregations, joins, recursion, exports, and more.

CLI Quick Start

BumbleBee is also available as a standalone command-line tool.

Prerequisites

  • CMake 3.20+
  • C++20-compatible compiler (GCC 13+ or Clang 18+ with libc++)

Build

cmake -S . -B cmake-build-release -DCMAKE_BUILD_TYPE=Release
cmake --build cmake-build-release --target BumbleBee -j 8
./cmake-build-release/BumbleBee --help

Run a query

cd examples

# SQL
../cmake-build-release/BumbleBee -i sql/01_import/basic_csv_import.sql

# Datalog
../cmake-build-release/BumbleBee -i dl/01_import/basic_csv_import.dl -a
Flag Description
-a Print all predicates
-t N Use N threads (default: all cores)
-r Print profiling data
--print-program Print the generated Datalog program and exit

Optimizer

BumbleBee DB includes a rule-based query optimizer that applies logical rewrites such as filter push-down and column pruning. The current optimizer does not reorder joins — the execution order follows the join sequence as written in the query. A cost-based join reordering optimizer is planned as a future enhancement.

Roadmap

  • Code generation: cover complex AND and OR clause combinations in filter expressions
  • Predicate table types: allow users to explicitly declare column types for predicates (currently auto-deduced at runtime)
  • Left and right joins: support for LEFT JOIN and RIGHT JOIN
  • Cost-based join optimizer: reorder joins based on cardinality estimates

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

bumblebeedb-0.3.0-cp313-cp313-manylinux_2_34_x86_64.whl (7.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.34+ x86-64

bumblebeedb-0.3.0-cp313-cp313-macosx_14_0_arm64.whl (5.8 MB view details)

Uploaded CPython 3.13macOS 14.0+ ARM64

bumblebeedb-0.3.0-cp312-cp312-manylinux_2_34_x86_64.whl (7.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.34+ x86-64

bumblebeedb-0.3.0-cp312-cp312-macosx_14_0_arm64.whl (5.8 MB view details)

Uploaded CPython 3.12macOS 14.0+ ARM64

bumblebeedb-0.3.0-cp311-cp311-manylinux_2_34_x86_64.whl (7.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.34+ x86-64

bumblebeedb-0.3.0-cp311-cp311-macosx_14_0_arm64.whl (5.8 MB view details)

Uploaded CPython 3.11macOS 14.0+ ARM64

bumblebeedb-0.3.0-cp310-cp310-manylinux_2_34_x86_64.whl (7.6 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.34+ x86-64

bumblebeedb-0.3.0-cp310-cp310-macosx_14_0_arm64.whl (5.8 MB view details)

Uploaded CPython 3.10macOS 14.0+ ARM64

File details

Details for the file bumblebeedb-0.3.0-cp313-cp313-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for bumblebeedb-0.3.0-cp313-cp313-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 9e8941a02de73b7f25d92d17eeda82d338aa15a54ff113197c8b0bdc7be27050
MD5 585d21e6a90b2c9502cdf7c24df81ad1
BLAKE2b-256 c1f2ebccb20e08c4e5b605af458afa2b3ba4de05cf46fc2ded9f0542b7c870b6

See more details on using hashes here.

Provenance

The following attestation bundles were made for bumblebeedb-0.3.0-cp313-cp313-manylinux_2_34_x86_64.whl:

Publisher: publish.yml on dave90/BumbleBee

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bumblebeedb-0.3.0-cp313-cp313-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for bumblebeedb-0.3.0-cp313-cp313-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 440bc0f469b22585a8282940580f11063c460bfce7745d35f5d3dad691d0ba0c
MD5 72ae8c8d9d136636854e971ddf1dc72b
BLAKE2b-256 929cd5fa52ee17ddc7a9dfac3242b31dbf48badaa3d2abac46fc36530d9b5849

See more details on using hashes here.

Provenance

The following attestation bundles were made for bumblebeedb-0.3.0-cp313-cp313-macosx_14_0_arm64.whl:

Publisher: publish.yml on dave90/BumbleBee

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bumblebeedb-0.3.0-cp312-cp312-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for bumblebeedb-0.3.0-cp312-cp312-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 4391dbd1c4b21a4a54d4eff5bd9385adb40a2b788a0140ff03573b8923541e0f
MD5 607a53df2217c265871e8bdff8935bd2
BLAKE2b-256 3f3e99068ec9335f925d94503da2b9b2494d72971b0d0ad0701d6f04c3381d08

See more details on using hashes here.

Provenance

The following attestation bundles were made for bumblebeedb-0.3.0-cp312-cp312-manylinux_2_34_x86_64.whl:

Publisher: publish.yml on dave90/BumbleBee

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bumblebeedb-0.3.0-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for bumblebeedb-0.3.0-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 0ca7963c2476afbe62c4bec9241e73ea95e556decb60efabf81ee6f29a82368d
MD5 98aad0980d810eaa7be654ba6d2c44f6
BLAKE2b-256 2d13191a2c9f8f8138843f3dc1e51ee381816fa1f576cb7f23a267fc3735e078

See more details on using hashes here.

Provenance

The following attestation bundles were made for bumblebeedb-0.3.0-cp312-cp312-macosx_14_0_arm64.whl:

Publisher: publish.yml on dave90/BumbleBee

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bumblebeedb-0.3.0-cp311-cp311-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for bumblebeedb-0.3.0-cp311-cp311-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 ed5dd9679f3c5637f25b25f57c29db86ef7f432cf8fe16aa70f894ac0f5e6806
MD5 bb5fc45746168876b4b01980155ac354
BLAKE2b-256 fb0f83ac4ef0158ea6169c2113a7ccb8d7c62f6aeb469ce5cca0db2eb04c8947

See more details on using hashes here.

Provenance

The following attestation bundles were made for bumblebeedb-0.3.0-cp311-cp311-manylinux_2_34_x86_64.whl:

Publisher: publish.yml on dave90/BumbleBee

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bumblebeedb-0.3.0-cp311-cp311-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for bumblebeedb-0.3.0-cp311-cp311-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 180c1443ca7f9eeef4c21040c0e210e2cd61d0f773e5bad700de3da5a3a6de7f
MD5 e9bec5fd1e6a1a709e687cd0422db308
BLAKE2b-256 e4661620ce98941c18a800c7fc51060d654c6eeb9a3d4ed8664b3ee60552ec08

See more details on using hashes here.

Provenance

The following attestation bundles were made for bumblebeedb-0.3.0-cp311-cp311-macosx_14_0_arm64.whl:

Publisher: publish.yml on dave90/BumbleBee

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bumblebeedb-0.3.0-cp310-cp310-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for bumblebeedb-0.3.0-cp310-cp310-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 0b40f6663dc34aae6df6fd1e57e12ecab9ecaf97c3a9dac538b64d5855b4f5bd
MD5 5aa2f6677df6b5392804c9013c888d8c
BLAKE2b-256 4135917c7e04b0beeb1dda3740eb019163eb51e7b767c0d6657558229515274c

See more details on using hashes here.

Provenance

The following attestation bundles were made for bumblebeedb-0.3.0-cp310-cp310-manylinux_2_34_x86_64.whl:

Publisher: publish.yml on dave90/BumbleBee

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bumblebeedb-0.3.0-cp310-cp310-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for bumblebeedb-0.3.0-cp310-cp310-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 4d8616a15cbc7f2eb4bf0080c2d42fd8a0ae22d69b620f43f6963ea2852c2d0f
MD5 041e432ad83edc6f969e886056820da6
BLAKE2b-256 de113e110f7701a6eb730c29594835985accd7168ada23c84c93f77777cb4e0f

See more details on using hashes here.

Provenance

The following attestation bundles were made for bumblebeedb-0.3.0-cp310-cp310-macosx_14_0_arm64.whl:

Publisher: publish.yml on dave90/BumbleBee

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page