Skip to main content

High-performance HTAP embedded database with Rust core and Python API

Project description

ApexBase

High-performance HTAP embedded database with Rust core and Python API

ApexBase is an embedded columnar database designed for Hybrid Transactional/Analytical Processing (HTAP) workloads. It combines a high-throughput columnar storage engine written in Rust with an ergonomic Python API, delivering analytical query performance that surpasses DuckDB and SQLite on most benchmarks — all in a single .apex file with zero external dependencies.

Table of Contents


Features

  • HTAP architecture — V4 Row Group columnar storage with DeltaStore for cell-level updates; fast inserts and fast analytical scans in one engine
  • Multi-database support — multiple isolated databases in one directory; cross-database queries with standard db.table SQL syntax
  • Single-file storage — custom .apex format per table, no server process, no external dependencies
  • Comprehensive SQL — DDL, DML, JOINs (INNER/LEFT/RIGHT/FULL/CROSS), subqueries (IN/EXISTS/scalar), CTEs (WITH ... AS), UNION/UNION ALL/INTERSECT/EXCEPT, window functions, EXPLAIN/ANALYZE, multi-statement execution
  • 70+ built-in functions — math (ABS, SQRT, POWER, LOG, trig), string (UPPER, LOWER, SUBSTR, REPLACE, CONCAT, REGEXP_REPLACE, ...), date (YEAR, MONTH, DAY, DATEDIFF, DATE_ADD, ...), conditional (COALESCE, IFNULL, NULLIF, CASE WHEN, GREATEST, LEAST)
  • Aggregation and analytics — COUNT, SUM, AVG, MIN, MAX, COUNT(DISTINCT), GROUP BY, HAVING, ORDER BY with NULLS FIRST/LAST
  • Window functions — ROW_NUMBER, RANK, DENSE_RANK, NTILE, PERCENT_RANK, CUME_DIST, LAG, LEAD, FIRST_VALUE, LAST_VALUE, NTH_VALUE, RUNNING_SUM, and windowed SUM/AVG/COUNT/MIN/MAX with PARTITION BY and ORDER BY
  • Transactions — BEGIN / COMMIT / ROLLBACK with OCC (Optimistic Concurrency Control), SAVEPOINT / ROLLBACK TO / RELEASE, statement-level auto-rollback
  • MVCC — multi-version concurrency control with snapshot isolation, version store, and garbage collection
  • Indexing — B-Tree and Hash indexes with CREATE INDEX / DROP INDEX / REINDEX; automatic multi-index AND intersection for compound predicates
  • Full-text search — built-in NanoFTS integration with fuzzy matching
  • Vector search — SIMD-accelerated nearest-neighbour search with 6 distance metrics (L2, cosine, dot, L1, L∞, L2²); heap-based O(n log k) TopK; single-query topk_distance() and batch batch_topk_distance() Python APIs; SQL explode_rename(topk_distance(...)) syntax; 3–4× faster than DuckDB at 1M rows
  • JIT compilation — Cranelift-based JIT for predicate evaluation and SIMD-vectorized aggregations
  • Zero-copy Python bridge — Arrow IPC between Rust and Python; direct conversion to Pandas, Polars, and PyArrow
  • Durability levels — configurable fast / safe / max with WAL support and crash recovery
  • Compact storage — dictionary encoding for low-cardinality strings, LZ4 and Zstd compression
  • File reading table functionsread_csv(), read_parquet(), read_json() directly in SQL FROM clauses; parallel mmap parsing; full SQL (WHERE / GROUP BY / JOIN / UNION) on top of any file
  • Parquet interop — COPY TO / COPY FROM Parquet files
  • PostgreSQL wire protocol — built-in server for DBeaver, psql, DataGrip, pgAdmin, Navicat, and any PostgreSQL-compatible client; two distribution modes (Python CLI or standalone Rust binary)
  • Arrow Flight gRPC server — high-performance columnar data transfer over HTTP/2; streams Arrow IPC RecordBatch directly, 4–7× faster than PG wire for large result sets; accessible via pyarrow.flight, Go arrow, Java arrow, and any Arrow Flight client
  • Cross-platform — Linux, macOS, and Windows; x86_64 and ARM64; Python 3.9 -- 3.13

Installation

pip install apexbase

Build from source (requires Rust toolchain):

maturin develop --release

Quick Start

from apexbase import ApexClient

# Open (or create) a database directory
client = ApexClient("./data")

# Create a table
client.create_table("users")

# Store records
client.store({"name": "Alice", "age": 30, "city": "Beijing"})
client.store([
    {"name": "Bob", "age": 25, "city": "Shanghai"},
    {"name": "Charlie", "age": 35, "city": "Beijing"},
])

# SQL query
results = client.execute("SELECT * FROM users WHERE age > 28 ORDER BY age DESC")

# Convert to DataFrame
df = results.to_pandas()

client.close()

Usage Guide

Database Management

ApexBase supports multiple isolated databases within a single root directory. Each named database lives in its own subdirectory; the default database uses the root directory.

# Switch to a named database (creates it if needed)
client.use_database("analytics")

# Combined: switch database + select/create a table in one call
client.use(database="analytics", table="events")

# List all databases
dbs = client.list_databases()  # ["analytics", "default", "hr"]

# Current database
print(client.current_database)  # "analytics"

# Cross-database SQL — standard db.table syntax
client.execute("SELECT * FROM default.users")
client.execute("SELECT u.name, e.event FROM default.users u JOIN analytics.events e ON u.id = e.user_id")
client.execute("INSERT INTO analytics.events (name) VALUES ('click')")
client.execute("UPDATE default.users SET age = 31 WHERE name = 'Alice'")
client.execute("DELETE FROM default.users WHERE age < 18")

All SQL operations (SELECT, INSERT, UPDATE, DELETE, JOIN, CREATE TABLE, DROP TABLE, ALTER TABLE) support database.table qualified names, allowing cross-database queries in a single statement.

Table Management

Each table is stored as a separate .apex file. Tables must be created before use.

# Create with optional schema
client.create_table("orders", schema={
    "order_id": "int64",
    "product": "string",
    "price": "float64",
})

# Switch tables
client.use_table("users")

# List / drop
tables = client.list_tables()
client.drop_table("orders")

Data Ingestion

import pandas as pd
import polars as pl
import pyarrow as pa

# Columnar dict (fastest for bulk data)
client.store({
    "name": ["D", "E", "F"],
    "age": [22, 32, 42],
})

# From pandas / polars / PyArrow (auto-creates table when table_name given)
client.from_pandas(pd.DataFrame({"name": ["G"], "age": [28]}), table_name="users")
client.from_polars(pl.DataFrame({"name": ["H"], "age": [38]}), table_name="users")
client.from_pyarrow(pa.table({"name": ["I"], "age": [48]}), table_name="users")

SQL

ApexBase supports a broad SQL dialect. Examples:

# DDL
client.execute("CREATE TABLE IF NOT EXISTS products")
client.execute("ALTER TABLE products ADD COLUMN name STRING")
client.execute("DROP TABLE IF EXISTS products")

# DML
client.execute("INSERT INTO users (name, age) VALUES ('Zoe', 29)")
client.execute("UPDATE users SET age = 31 WHERE name = 'Alice'")
client.execute("DELETE FROM users WHERE age < 20")

# SELECT with full clause support
client.execute("""
    SELECT city, COUNT(*) AS cnt, AVG(age) AS avg_age
    FROM users
    WHERE age BETWEEN 20 AND 40
    GROUP BY city
    HAVING cnt > 1
    ORDER BY avg_age DESC
    LIMIT 10
""")

# JOINs
client.execute("""
    SELECT u.name, o.product
    FROM users u
    INNER JOIN orders o ON u._id = o.user_id
""")

# Subqueries
client.execute("SELECT * FROM users WHERE age > (SELECT AVG(age) FROM users)")
client.execute("SELECT * FROM users WHERE city IN (SELECT city FROM cities WHERE pop > 1000000)")

# CTEs
client.execute("""
    WITH seniors AS (SELECT * FROM users WHERE age >= 30)
    SELECT city, COUNT(*) FROM seniors GROUP BY city
""")

# Window functions
client.execute("""
    SELECT name, age,
           ROW_NUMBER() OVER (ORDER BY age DESC) AS rank,
           AVG(age) OVER (PARTITION BY city) AS city_avg
    FROM users
""")

# Set operations
client.execute("""
    SELECT name FROM users WHERE city = 'Beijing'
    UNION ALL
    SELECT name FROM users WHERE city = 'Shanghai'
""")
client.execute("""
    SELECT user_id FROM orders
    INTERSECT
    SELECT user_id FROM wishlist
""")
client.execute("""
    SELECT user_id FROM orders
    EXCEPT
    SELECT user_id FROM support_tickets WHERE status = 'open'
""")

# Multi-statement
client.execute("""
    INSERT INTO users (name, age) VALUES ('New1', 20);
    INSERT INTO users (name, age) VALUES ('New2', 21);
    SELECT COUNT(*) FROM users
""")

# INSERT ... ON CONFLICT (upsert)
client.execute("""
    INSERT INTO users (name, age) VALUES ('Alice', 31)
    ON CONFLICT (name) DO UPDATE SET age = 31
""")

# CREATE TABLE AS
client.execute("CREATE TABLE seniors AS SELECT * FROM users WHERE age >= 30")

# EXPLAIN / EXPLAIN ANALYZE
client.execute("EXPLAIN SELECT * FROM users WHERE age > 25")

# Parquet interop
client.execute("COPY users TO '/tmp/users.parquet'")
client.execute("COPY users FROM '/tmp/users.parquet'")

File Reading Table Functions

Read external files directly in a SQL FROM clause — no import step required. The full SQL engine runs on top: WHERE, GROUP BY, ORDER BY, JOIN, UNION, etc.

# CSV: schema inferred automatically, parallel mmap parser
df = client.execute("SELECT * FROM read_csv('/data/sales.csv')").to_pandas()

# TSV — specify delimiter
df = client.execute("SELECT * FROM read_csv('/data/data.tsv', delimiter='\t')").to_pandas()

# No header row
df = client.execute("SELECT * FROM read_csv('/data/raw.csv', header=false)").to_pandas()

# Parquet: schema from file metadata, parallel column decode
table = client.execute("SELECT * FROM read_parquet('/data/events.parquet')").to_arrow()

# JSON / NDJSON: auto-detects format (NDJSON or pandas column-oriented)
df = client.execute("SELECT * FROM read_json('/data/logs.ndjson')").to_pandas()

# Full SQL on top of a file
result = client.execute("""
    SELECT city, COUNT(*) AS cnt, AVG(price)
    FROM read_csv('/data/orders.csv')
    WHERE price > 100
    GROUP BY city
    ORDER BY cnt DESC
    LIMIT 10
""")

# JOIN a file with a stored table
result = client.execute("""
    SELECT u.name, f.score
    FROM users u
    JOIN read_csv('/data/scores.csv') f ON u.id = f.user_id
    WHERE f.score > 90
""")

# EXCEPT using a file as a blocklist
result = client.execute("""
    SELECT email FROM users
    EXCEPT
    SELECT email FROM read_csv('/data/unsubscribed.csv')
""")
Function Options Description
read_csv(path) header=true, delimiter=',' Read CSV/TSV; auto-infers schema
read_parquet(path) Read Parquet; schema from file metadata
read_json(path) Read NDJSON or pandas JSON; auto-detects format

See docs/API_REFERENCE.md for full details.

Transactions

client.execute("BEGIN")
client.execute("INSERT INTO users (name, age) VALUES ('Tx1', 20)")
client.execute("SAVEPOINT sp1")
client.execute("INSERT INTO users (name, age) VALUES ('Tx2', 21)")
client.execute("ROLLBACK TO sp1")   # undo Tx2 only
client.execute("COMMIT")            # Tx1 persisted

Transactions use OCC validation — concurrent writes are detected at commit time.

Indexes

client.execute("CREATE INDEX idx_age ON users (age)")
client.execute("CREATE UNIQUE INDEX idx_name ON users (name)")

# Queries automatically use indexes when applicable
client.execute("SELECT * FROM users WHERE age = 30")  # index scan

client.execute("DROP INDEX idx_age ON users")
client.execute("REINDEX users")

Full-Text Search

ApexBase ships a native full-text search engine (NanoFTS) integrated directly into the SQL executor. FTS is available through all interfaces — Python API, PostgreSQL Wire, and Arrow Flight — without any Python-side middleware.

SQL interface (recommended)

# 1. Create the FTS index via SQL DDL
client.execute("CREATE FTS INDEX ON articles (title, content)")

# Optional: specify lazy loading and cache size
client.execute("CREATE FTS INDEX ON logs WITH (lazy_load=true, cache_size=50000)")

# 2. Query using MATCH() / FUZZY_MATCH() in WHERE
results = client.execute("SELECT * FROM articles WHERE MATCH('rust programming')")
results = client.execute("SELECT title, content FROM articles WHERE FUZZY_MATCH('pytohn')")

# Combine with other predicates
results = client.execute("""
    SELECT * FROM articles
    WHERE MATCH('machine learning') AND published_at > '2024-01-01'
    ORDER BY _id DESC LIMIT 20
""")

# FTS also works in aggregations
count = client.execute("SELECT COUNT(*) FROM articles WHERE MATCH('deep learning')")

# Manage indexes
client.execute("SHOW FTS INDEXES")           # list all FTS-enabled tables
client.execute("ALTER FTS INDEX ON articles DISABLE")  # disable, keep files
client.execute("DROP FTS INDEX ON articles") # remove index + delete files

Python API (alternative)

# Initialize FTS for current table
client.use_table("articles")
client.init_fts(index_fields=["title", "content"])

# Search
ids    = client.search_text("database")
fuzzy  = client.fuzzy_search_text("databse")   # tolerates typos
recs   = client.search_and_retrieve("python", limit=10)
top5   = client.search_and_retrieve_top("neural network", n=5)

# Lifecycle
client.get_fts_stats()
client.disable_fts()   # suspend without deleting files
client.drop_fts()      # remove index + delete files

Tip: The SQL interface (MATCH() / FUZZY_MATCH()) works over PG Wire and Arrow Flight without any extra setup; the Python API methods are Python-process-only.

Vector Search

ApexBase provides SIMD-accelerated nearest-neighbour search with a zero-copy mmap scan buffer. Supports 6 distance metrics and both single-query and batch modes.

import numpy as np

# Store vectors — numpy arrays are stored as FixedList columns (optimal)
client.create_table("items")
client.store({
    "label": ["a", "b", "c"],
    "vec":   [np.random.rand(128).astype(np.float32) for _ in range(3)],
})

query = np.random.rand(128).astype(np.float32)

# Single-query: returns ResultView with _id and dist columns
results = client.topk_distance('vec', query, k=10)
df = results.to_pandas()           # columns: _id, dist
top_ids = results.get_ids()        # numpy int64 array
records = client.retrieve_many(top_ids.tolist())  # full records

# Custom metric and column names
results = client.topk_distance('vec', query, k=5, metric='cosine',
                                id_col='item_id', dist_col='cosine_dist')

# Batch: N queries in one Rust call (scan_buf loaded once, Rayon parallel)
queries = np.random.rand(100, 128).astype(np.float32)
result  = client.batch_topk_distance('vec', queries, k=10)
# result.shape == (100, 10, 2)
ids   = result[:, :, 0].astype(np.int64)   # (100, 10)
dists = result[:, :, 1]                     # (100, 10)

# SQL: explode_rename(topk_distance(...)) — same query, SQL form
results = client.execute("""
    SELECT explode_rename(topk_distance(vec, [0.1, 0.2, 0.3], 10, 'l2'), '_id', 'dist')
    FROM items
""")

Supported metrics: 'l2' / 'euclidean', 'l2_squared', 'l1' / 'manhattan', 'linf' / 'chebyshev', 'cosine' / 'cosine_distance', 'dot' / 'inner_product'

Benchmark (1M rows × dim=128, k=10):

Metric ApexBase DuckDB Speedup
L2 ~12ms ~47ms 3.8× faster
Cosine ~13ms ~42ms 3.1× faster
Dot ~13ms ~36ms 2.8× faster

See docs/API_REFERENCE.md#vector-search for full details.

Record-Level Operations

record = client.retrieve(0)               # by internal _id
records = client.retrieve_many([0, 1, 2])
all_data = client.retrieve_all()

client.replace(0, {"name": "Alice2", "age": 31})
client.delete(0)
client.delete([1, 2, 3])

Column Operations

client.add_column("email", "String")
client.rename_column("email", "email_addr")
client.drop_column("email_addr")
client.get_column_dtype("age")    # "Int64"
client.list_fields()              # ["name", "age", "city"]

ResultView

Query results are returned as ResultView objects with multiple output formats:

results = client.execute("SELECT * FROM users")

df = results.to_pandas()       # pandas DataFrame (zero-copy by default)
pl_df = results.to_polars()    # polars DataFrame
arrow = results.to_arrow()     # PyArrow Table
dicts = results.to_dict()      # list of dicts

results.shape                  # (rows, columns)
results.columns                # column names
len(results)                   # row count
results.first()                # first row as dict
results.scalar()               # single value (for aggregates)
results.get_ids()              # numpy array of _id values

Context Manager

with ApexClient("./data") as client:
    client.create_table("tmp")
    client.store({"key": "value"})
    # Automatically closed on exit

Performance

ApexBase vs SQLite vs DuckDB (1M rows)

Three-way comparison on macOS 26.3, Apple arm (10 cores), 32 GB RAM. Python 3.11.10, ApexBase v1.8.0, SQLite v3.45.3, DuckDB v1.1.3, PyArrow v19.0.0.

Dataset: 1,000,000 rows × 5 columns (name, age, score, city, category). Average of 5 timed iterations after 2 warmup runs.

Query ApexBase SQLite DuckDB vs Best Other
Bulk Insert (1M rows) 289.05ms 903.62ms 889.55ms 3.1x faster
COUNT(*) 0.062ms 9.08ms 0.538ms 8.7x faster
SELECT * LIMIT 100 [cold] 0.027ms 0.075ms 0.239ms 2.8x faster
SELECT * LIMIT 100 [warm] 3.0µs 0.074ms 0.227ms 25x faster
SELECT * LIMIT 10K [cold] 0.794ms 6.85ms 4.33ms 5.5x faster
SELECT * LIMIT 10K [warm] 3.0µs 6.65ms 4.16ms >1000x faster
Filter (name = 'user_5000') 0.041ms 41.21ms 1.62ms 40x faster
Filter (age BETWEEN 25 AND 35) 0.028ms 165.16ms 87.88ms >3000x faster
GROUP BY city (10 groups) 0.029ms 356.94ms 3.80ms 131x faster
GROUP BY + HAVING 0.030ms 356.05ms 3.98ms 133x faster
ORDER BY score LIMIT 100 0.030ms 52.55ms 5.99ms 200x faster
Aggregation (5 funcs) 0.031ms 84.26ms 1.48ms 48x faster
Complex (Filter+Group+Order) 0.036ms 161.50ms 2.88ms 80x faster
Point Lookup (by _id) 0.033ms 0.045ms 2.82ms 1.4x faster
Insert 1K rows 0.613ms 1.34ms 2.63ms 2.2x faster
SELECT * → pandas (full scan) 0.734ms 1.14s 174.94ms 238x faster
GROUP BY city, category (100 grp) 0.023ms 667.78ms 5.25ms 228x faster
LIKE filter (name LIKE 'user_1%') 32.70ms 132.27ms 53.27ms 1.6x faster
Multi-cond (age>30 AND score>50) 0.033ms 337.28ms 188.33ms >5000x faster
ORDER BY city, score DESC LIMIT 100 0.033ms 68.59ms 7.15ms 217x faster
COUNT(DISTINCT city) 0.033ms 87.95ms 3.81ms 115x faster
IN filter (city IN 3 cities) 0.035ms 307.66ms 149.93ms >4000x faster
UPDATE rows (age = 25) 7.63ms 37.84ms 14.29ms 1.9x faster
Store+DELETE 1K (combined) 1.16ms 35.21ms 3.02ms 2.6x faster
DELETE 1K [pure delete only] 0.284ms 32.46ms 0.463ms 1.6x faster
Window ROW_NUMBER 0.037ms 497.17ms 43.05ms >1000x faster
FTS Index Build (1M rows) 805ms 1.49s 1.07s 1.3x faster
FTS Search ('Electronics') 0.133ms 20.79ms 20.00ms 150x faster

Summary: wins 28 of 28 benchmarks (28W / 0T / 0L). "Cold" = fresh DB open per iteration; "warm" = cached backend.

Cold comparison is fair: all three engines measured without gc.collect() interference.

Reproduce: python benchmarks/bench_vs_sqlite_duckdb.py --rows 1000000


Server Protocols

ApexBase ships two complementary server protocols for external access:

Protocol Port Best for Binary / CLI
PG Wire 5432 DBeaver, psql, DataGrip, BI tools apexbase-server
Arrow Flight 50051 Python (pyarrow), Go, Java, Spark apexbase-flight

Combined Launcher (Both Servers at Once)

# Start PG Wire + Arrow Flight simultaneously
apexbase-serve --dir /path/to/data

# Custom ports
apexbase-serve --dir /path/to/data --pg-port 5432 --flight-port 50051

# Disable one server
apexbase-serve --dir /path/to/data --no-flight   # PG Wire only
apexbase-serve --dir /path/to/data --no-pg       # Arrow Flight only
Flag Default Description
--dir, -d . Directory containing .apex database files
--host 127.0.0.1 Bind host for both servers
--pg-port 5432 PostgreSQL Wire port
--flight-port 50051 Arrow Flight gRPC port
--no-pg Disable PG Wire server
--no-flight Disable Arrow Flight server

PostgreSQL Wire Protocol Server

ApexBase includes a built-in PostgreSQL wire protocol server, allowing you to connect using DBeaver, psql, DataGrip, pgAdmin, Navicat, and any other tool that supports the PostgreSQL protocol.

Starting the Server

Method 1: Python CLI (after pip install apexbase)

apexbase-server --dir /path/to/data --port 5432

Options:

Flag Default Description
--dir, -d . Directory containing .apex database files
--host 127.0.0.1 Host to bind to (use 0.0.0.0 for remote access)
--port, -p 5432 Port to listen on

Method 2: Standalone Rust binary (no Python required)

# Build
cargo build --release --bin apexbase-server --no-default-features --features server

# Run
./target/release/apexbase-server --dir /path/to/data --port 5432

Connecting with Database Tools

The server emulates PostgreSQL 15.0, reports a pg_catalog and information_schema compatible metadata layer, and supports SimpleQuery protocol. No username or password is required (authentication is disabled).

DBeaver

  1. New Database Connection → choose PostgreSQL
  2. Fill in connection details:
    • Host: 127.0.0.1 (or the --host you specified)
    • Port: 5432 (or the --port you specified)
    • Database: apexbase (any value accepted)
    • Authentication: select No Authentication or leave username/password empty
  3. Click Test ConnectionFinish
  4. DBeaver will discover tables and columns automatically via pg_catalog / information_schema

psql

psql -h 127.0.0.1 -p 5432 -d apexbase

DataGrip / IntelliJ IDEA

  1. Database tool window → +Data SourcePostgreSQL
  2. Set Host, Port, Database as above; leave User and Password empty
  3. Click Test ConnectionOK

pgAdmin

  1. Add New ServerGeneral tab: give it a name
  2. Connection tab: set Host and Port; leave Username as postgres (ignored) and Password empty
  3. Save — tables appear under Databases > apexbase > Schemas > public > Tables

Navicat for PostgreSQL

  1. ConnectionPostgreSQL
  2. Set Host, Port; leave User and Password blank
  3. Test ConnectionOK

Other Compatible Tools

Any tool or library that speaks the PostgreSQL wire protocol (libpq) can connect, including:

  • TablePlus, Beekeeper Studio, Heidisql
  • Python: psycopg2 / asyncpg
  • Node.js: pg (node-postgres)
  • Go: pgx / lib/pq
  • Rust: tokio-postgres / sqlx
  • Java: JDBC PostgreSQL driver

Example with psycopg2:

import psycopg2

conn = psycopg2.connect(host="127.0.0.1", port=5432, dbname="apexbase")
cur = conn.cursor()
cur.execute("SELECT * FROM users LIMIT 10")
print(cur.fetchall())
conn.close()

Supported SQL over Wire Protocol

The wire protocol server passes SQL directly to the ApexBase query engine. All SQL features listed in Usage Guide are available, including JOINs, CTEs, window functions, transactions, and DDL.

Metadata Compatibility

The server implements a pg_catalog compatibility layer that responds to common catalog queries:

Catalog / View Purpose
pg_catalog.pg_namespace Schema listing
pg_catalog.pg_database Database listing
pg_catalog.pg_class Table discovery
pg_catalog.pg_attribute Column metadata
pg_catalog.pg_type Type information
pg_catalog.pg_settings Server settings
information_schema.tables Standard table listing
information_schema.columns Standard column listing
SET / SHOW statements Client configuration probes

This enables GUI tools to browse tables, inspect columns, and display data types without modification.

Supported Protocol Features

Feature Status
Simple Query Protocol ✅ Fully supported
Extended Query Protocol (prepared statements) ✅ Supported — schema cached, binary format for psycopg3
Cross-database SQL (db.table) ✅ Supported — USE dbname / \c dbname to switch context
pg_catalog / information_schema ✅ Compatible layer for GUI tools
All ApexBase SQL (JOINs, CTEs, window functions, DDL) ✅ Full pass-through to query engine

Limitations

  • Authentication is not implemented — the server accepts all connections regardless of username/password
  • SSL/TLS is not supported — use an SSH tunnel (ssh -L 5432:127.0.0.1:5432 user@host) for remote access

Arrow Flight gRPC Server

Arrow Flight sends Arrow IPC RecordBatch directly over gRPC (HTTP/2), bypassing per-row text serialization entirely. It is 4–7× faster than PG wire for large result sets (10K+ rows).

Query PG Wire Arrow Flight Speedup
SELECT 10K rows 5.1ms 0.7ms 7× faster
BETWEEN (~33K rows) 22ms 5.6ms 4× faster
Single row / point lookup ~7.5ms ~7.9ms equal

Starting the Flight Server

Python CLI:

apexbase-flight --dir /path/to/data --port 50051

Standalone Rust binary:

cargo build --release --bin apexbase-flight --no-default-features --features flight
./target/release/apexbase-flight --dir /path/to/data --port 50051

Python Client

import pyarrow.flight as fl
import pandas as pd

client = fl.connect("grpc://127.0.0.1:50051")

# SELECT — returns Arrow Table
table = client.do_get(fl.Ticket(b"SELECT * FROM users LIMIT 10000")).read_all()
df = table.to_pandas()              # zero-copy to pandas
pl_df = pl.from_arrow(table)        # zero-copy to polars

# DML / DDL
client.do_action(fl.Action("sql", b"INSERT INTO users (name, age) VALUES ('Alice', 30)"))
client.do_action(fl.Action("sql", b"CREATE TABLE logs (event STRING, ts INT64)"))

# List available actions
for action in client.list_actions():
    print(action.type, "—", action.description)

When to Use Arrow Flight vs PG Wire

Scenario Recommendation
DBeaver / Tableau / BI tools PG Wire (only option)
Python + small queries (<100 rows) Native API (fastest, in-process)
Python + large queries (10K+ rows, remote) Arrow Flight (4–7× faster than PG wire)
Go / Java / Spark workers Arrow Flight (native Arrow support)
Local Python (same machine) Native API (ApexClient.execute())

PyO3 Python API

Both servers are also accessible as blocking Python functions (released GIL):

import threading
from apexbase._core import start_pg_server, start_flight_server

t1 = threading.Thread(target=start_pg_server,     args=("/data", "0.0.0.0", 5432),  daemon=True)
t2 = threading.Thread(target=start_flight_server, args=("/data", "0.0.0.0", 50051), daemon=True)
t1.start()
t2.start()

Architecture

Python (ApexClient)
  |
  |-- Arrow IPC / columnar dict --------> ResultView (Pandas / Polars / PyArrow)
  |
Rust Core (PyO3 bindings)
  |
  +-- SQL Parser -----> Query Planner -----> Query Executor
  |                                              |
  |   +-- JIT Compiler (Cranelift)               |
  |   +-- Expression Evaluator (70+ functions)   |
  |   +-- Window Function Engine                 |
  |                                              |
  +-- Storage Engine                             |
  |     +-- V4 Row Group Format (.apex)          |
  |     +-- DeltaStore (cell-level updates)      |
  |     +-- WAL (write-ahead log)                |
  |     +-- Mmap on-demand reads                 |
  |     +-- LZ4 / Zstd compression              |
  |     +-- Dictionary encoding                  |
  |                                              |
  +-- Index Manager (B-Tree, Hash)               |
  +-- TxnManager (OCC + MVCC)                    |
  +-- NanoFTS (full-text search)                  |
  +-- PG Wire Protocol Server (pgwire)             |
  |   +-- DBeaver / psql / DataGrip / pgAdmin      |
  |   +-- pg_catalog & information_schema compat    |
  |                                                 |
  +-- Arrow Flight gRPC Server (tonic + HTTP/2)     |
      +-- pyarrow.flight / Go / Java / Spark        |
      +-- Arrow IPC — zero serialization overhead   |

Storage Format

ApexBase uses a custom V4 Row Group format:

  • Each table is a single .apex file containing a header, row groups, and a footer
  • Row groups store columns contiguously with per-column compression (LZ4 or Zstd)
  • Low-cardinality string columns are dictionary-encoded on disk
  • Null bitmaps are stored per column per row group
  • A DeltaStore file (.deltastore) holds cell-level updates that are merged on read and compacted automatically
  • WAL records provide crash recovery with idempotent replay

Query Execution

  • The SQL parser produces an AST that the query planner analyzes for optimization strategy
  • Fast paths bypass the full executor for common patterns (COUNT(*), SELECT * LIMIT N, point lookups, single-column GROUP BY)
  • Arrow RecordBatch is the internal data representation; results flow to Python via Arrow IPC with zero-copy when possible
  • Repeated identical read queries are served from an in-process result cache

API Reference

ApexClient

Constructor

ApexClient(
    dirpath="./data",           # data directory
    drop_if_exists=False,       # clear existing data on open
    batch_size=1000,            # batch size for operations
    enable_cache=True,          # enable query cache
    cache_size=10000,           # cache capacity
    prefer_arrow_format=True,   # prefer Arrow format for results
    durability="fast",          # "fast" | "safe" | "max"
)

Database Management

Method Description
use_database(database='default') Switch to a named database (creates it if needed)
use(database='default', table=None) Switch database and optionally select/create a table
list_databases() List all databases ('default' always included)
current_database Property: current database name

Table Management

Method Description
create_table(name, schema=None) Create a new table, optionally with pre-defined schema
drop_table(name) Drop a table
use_table(name) Switch active table
list_tables() List all tables in the current database
current_table Property: current table name

Data Storage

Method Description
store(data) Store data (dict, list, DataFrame, Arrow Table)
from_pandas(df, table_name=None) Import from pandas DataFrame
from_polars(df, table_name=None) Import from polars DataFrame
from_pyarrow(table, table_name=None) Import from PyArrow Table

Data Retrieval

Method Description
execute(sql) Execute SQL statement(s)
query(where, limit) Query with WHERE expression
retrieve(id) Get record by _id
retrieve_many(ids) Get multiple records by _id
retrieve_all() Get all records
count_rows(table) Count rows in table

Data Modification

Method Description
replace(id, data) Replace a record
batch_replace({id: data}) Batch replace records
delete(id) or delete([ids]) Delete record(s)

Column Operations

Method Description
add_column(name, type) Add a column
drop_column(name) Drop a column
rename_column(old, new) Rename a column
get_column_dtype(name) Get column data type
list_fields() List all fields

Full-Text Search

Method Description
init_fts(fields, lazy_load, cache_size) Initialize FTS
search_text(query) Search documents
fuzzy_search_text(query) Fuzzy search
search_and_retrieve(query, limit, offset) Search and return records
search_and_retrieve_top(query, n) Top N results
get_fts_stats() FTS statistics
disable_fts() / drop_fts() Disable or drop FTS

Vector Search

Method Description
topk_distance(col, query, k=10, metric='l2', id_col='_id', dist_col='dist') Single-query TopK: returns ResultView with id and distance columns
batch_topk_distance(col, queries, k=10, metric='l2') Batch TopK: ndarray of shape (N, k, 2) — ids and distances

Utility

Method Description
flush() Flush data to disk
set_auto_flush(rows, bytes) Set auto-flush thresholds
get_auto_flush() Get auto-flush config
estimate_memory_bytes() Estimate memory usage
close() Close the client

ResultView

Method / Property Description
to_pandas(zero_copy=True) Convert to pandas DataFrame
to_polars() Convert to polars DataFrame
to_arrow() Convert to PyArrow Table
to_dict() Convert to list of dicts
scalar() Get single scalar value
first() Get first row as dict
get_ids(return_list=False) Get record IDs
shape (rows, columns)
columns Column names
__len__() Row count
__iter__() Iterate over rows
__getitem__(idx) Index access

Documentation

Additional documentation is available in the docs/ directory.

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

apexbase-1.8.0.tar.gz (792.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

apexbase-1.8.0-cp313-cp313-win_amd64.whl (7.4 MB view details)

Uploaded CPython 3.13Windows x86-64

apexbase-1.8.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.1 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.17+ x86-64

apexbase-1.8.0-cp313-cp313-macosx_11_0_arm64.whl (6.6 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

apexbase-1.8.0-cp312-cp312-win_amd64.whl (7.4 MB view details)

Uploaded CPython 3.12Windows x86-64

apexbase-1.8.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.1 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64

apexbase-1.8.0-cp312-cp312-macosx_11_0_arm64.whl (6.6 MB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

apexbase-1.8.0-cp311-cp311-win_amd64.whl (7.4 MB view details)

Uploaded CPython 3.11Windows x86-64

apexbase-1.8.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.1 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

apexbase-1.8.0-cp311-cp311-macosx_11_0_arm64.whl (6.6 MB view details)

Uploaded CPython 3.11macOS 11.0+ ARM64

apexbase-1.8.0-cp310-cp310-win_amd64.whl (7.4 MB view details)

Uploaded CPython 3.10Windows x86-64

apexbase-1.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.1 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

apexbase-1.8.0-cp310-cp310-macosx_11_0_arm64.whl (6.6 MB view details)

Uploaded CPython 3.10macOS 11.0+ ARM64

apexbase-1.8.0-cp39-cp39-win_amd64.whl (7.4 MB view details)

Uploaded CPython 3.9Windows x86-64

apexbase-1.8.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (8.1 MB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

apexbase-1.8.0-cp39-cp39-macosx_11_0_arm64.whl (6.6 MB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file apexbase-1.8.0.tar.gz.

File metadata

  • Download URL: apexbase-1.8.0.tar.gz
  • Upload date:
  • Size: 792.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for apexbase-1.8.0.tar.gz
Algorithm Hash digest
SHA256 0536c7d9e430ae04c7ab263fd7c0d48d9884db37bfb649bb545bfed9c3eff72e
MD5 eea15732a0aa38c3afa8e7fa3f66415c
BLAKE2b-256 78fa1b4512596709ed72d700a656168fafbff607abaf52d39c0483c5c0adf338

See more details on using hashes here.

File details

Details for the file apexbase-1.8.0-cp313-cp313-win_amd64.whl.

File metadata

  • Download URL: apexbase-1.8.0-cp313-cp313-win_amd64.whl
  • Upload date:
  • Size: 7.4 MB
  • Tags: CPython 3.13, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for apexbase-1.8.0-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 eff2b401265ee8c9f877714ba02c80db583d8bc1bc7036a4b97ed8a268a0623a
MD5 bbcf97feac616d3a563fe984f1772674
BLAKE2b-256 0f20713fb67fde7f3934f6d491cc9f20a783bf5eda0ec7c91a7189bb816c1b89

See more details on using hashes here.

File details

Details for the file apexbase-1.8.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for apexbase-1.8.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 8935af10b73d4b2fda29c1de09e28170a247a5711e5a5d0346668aec1b3120a7
MD5 0e02eabe0d4784e0ab9d41e344b38480
BLAKE2b-256 9c9d3de196bac05f4b3fb48c3d1c202c6b2c39ef6011b5fb97b6b680bf3ea276

See more details on using hashes here.

File details

Details for the file apexbase-1.8.0-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for apexbase-1.8.0-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9a77bb8c4f05d1f0e98d6d1d11cb4e3b94f633beb21953356e0babc8cc86f821
MD5 64da241f9a5bba584229177a79430192
BLAKE2b-256 a366f5a917ae8157a48bf764e48ccab03188328c761fca27727b5490b7e7f551

See more details on using hashes here.

File details

Details for the file apexbase-1.8.0-cp312-cp312-win_amd64.whl.

File metadata

  • Download URL: apexbase-1.8.0-cp312-cp312-win_amd64.whl
  • Upload date:
  • Size: 7.4 MB
  • Tags: CPython 3.12, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for apexbase-1.8.0-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 fa8f0d1f8ba1918a4e50d8e30c920e036d21fa94c2b399d97f58e786558121d9
MD5 21f97f075763b85e5968c9c5b5bf9764
BLAKE2b-256 be5f978829812584d13bac46befe381cb78969de0548051009b9e5716ae5241f

See more details on using hashes here.

File details

Details for the file apexbase-1.8.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for apexbase-1.8.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 615e7bb47e0548387774c51a48a73e2f05fcd4c41c831ca1271f359e005a9d7f
MD5 6256fea4e5bfe0fb1d77c6a65909214e
BLAKE2b-256 5137d1f202519c679d1c1f2ed3edc28c5564e35b482e2d951cb48adfe8ea7a0e

See more details on using hashes here.

File details

Details for the file apexbase-1.8.0-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for apexbase-1.8.0-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 b045744a15dfae3ac2c59daaa1f7efac52d962878f6b8a1958460c985be900f4
MD5 4a56cc66b7d46157de557b471c6f2f42
BLAKE2b-256 92a4e8b17f527dfe518ddf3bc130f9887410b5fbb9479369a9677997e8417d6a

See more details on using hashes here.

File details

Details for the file apexbase-1.8.0-cp311-cp311-win_amd64.whl.

File metadata

  • Download URL: apexbase-1.8.0-cp311-cp311-win_amd64.whl
  • Upload date:
  • Size: 7.4 MB
  • Tags: CPython 3.11, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for apexbase-1.8.0-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 0d16b7c45c75be6546d0f90e8ae06a176a7cd21697f98d668e4ed3e67aee73c0
MD5 a25014ddb2ad7232f8b47fe8f8ebd2f7
BLAKE2b-256 74abc65a97700403753125b474041f3d989975ca99801fd83a405dace52b9872

See more details on using hashes here.

File details

Details for the file apexbase-1.8.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for apexbase-1.8.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c4e61e0346d60a6a090f899d7f729a139ac1dc29ee64ac25a206cbb2da891de0
MD5 5c68b68ce43bc03bb5483e4f338d03bc
BLAKE2b-256 dc48a4cfd0001d63be0f155aa79327c663d8084724bbdb44912bd25e3febe985

See more details on using hashes here.

File details

Details for the file apexbase-1.8.0-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for apexbase-1.8.0-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 1576cfb987c3623caffb025f181559255cd057da37416acf1092f9ae7852487e
MD5 434d99adf7689066382d075afb8e2aae
BLAKE2b-256 ae1b92adedabe090f11ebff0acfbc6543d6436f27ee26ec9da663d5ce1b89a29

See more details on using hashes here.

File details

Details for the file apexbase-1.8.0-cp310-cp310-win_amd64.whl.

File metadata

  • Download URL: apexbase-1.8.0-cp310-cp310-win_amd64.whl
  • Upload date:
  • Size: 7.4 MB
  • Tags: CPython 3.10, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for apexbase-1.8.0-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 44bc5930b39706477d727e02a142d51dd2273408ac8ea07a2c78f97a66371e8c
MD5 0f68da35b6d72cddc4007b97c485fe6e
BLAKE2b-256 ee1d4165f89dcc40c50cd2506a63e8fa6ad53e81235c8de348a15d7b85a19fee

See more details on using hashes here.

File details

Details for the file apexbase-1.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for apexbase-1.8.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 14fa6685c0cd0fd1bdc25c3b12837305f9e77412fc3efce81636ba4a9ff4bdc3
MD5 5ab247152f6558e5be9ee507584e6fd3
BLAKE2b-256 d7110df5ccac117536aafef71366acd45df490f31be6b5968beb8f3c70e01ff3

See more details on using hashes here.

File details

Details for the file apexbase-1.8.0-cp310-cp310-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for apexbase-1.8.0-cp310-cp310-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d769f16b2d45b3f5f54bd31b25110512140c2b0fd28553363e8eae9e0b26922c
MD5 5356c9ebb4b2015db4b25bba6b707772
BLAKE2b-256 5e4bb756d0d3787ef2eabb784d700abcafb12775efdf72b1c7d14212fe845e38

See more details on using hashes here.

File details

Details for the file apexbase-1.8.0-cp39-cp39-win_amd64.whl.

File metadata

  • Download URL: apexbase-1.8.0-cp39-cp39-win_amd64.whl
  • Upload date:
  • Size: 7.4 MB
  • Tags: CPython 3.9, Windows x86-64
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for apexbase-1.8.0-cp39-cp39-win_amd64.whl
Algorithm Hash digest
SHA256 78dbb4740616e372847a27d2baa4a3df0b299bf5c504af77a124921b2b1c325e
MD5 bf37a606b164ea00dfaaed607dcfa345
BLAKE2b-256 8fddc63d4d5a7475c8d31218cb7cf8aa0d330860adee2889ff89f02c10fc3325

See more details on using hashes here.

File details

Details for the file apexbase-1.8.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for apexbase-1.8.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 15f56cb5863c07b98790bc266fbb0accb554954aaf3f91b3a1d050183e9487c2
MD5 c5b99d5e05a284462ab1040103bd90b3
BLAKE2b-256 9887a6a17e5d6eb301fbfb5179473b0670e95b9b1aa684a4c6e8d43988415bc9

See more details on using hashes here.

File details

Details for the file apexbase-1.8.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for apexbase-1.8.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0614efac59e124ec4b542a791ba404575937ad9d89e9936dc69c6abb741e546b
MD5 8cca6dd2f04b148376c9cbcbca5cba8c
BLAKE2b-256 324f73657a9a791e4b0c854e47f8380a71171bb03719cd76347c1e880faef61e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page