High-performance HTAP embedded database with Rust core and Python API
Project description
ApexBase
High-performance HTAP embedded database with Rust core and Python API
ApexBase is an embedded columnar database designed for Hybrid Transactional/Analytical Processing (HTAP) workloads. It combines a high-throughput columnar storage engine written in Rust with an ergonomic Python API, delivering analytical query performance that surpasses DuckDB and SQLite on most benchmarks — all in a single .apex file with zero external dependencies.
Features
- HTAP architecture — V4 Row Group columnar storage with DeltaStore for cell-level updates; fast inserts and fast analytical scans in one engine
- Multi-database support — multiple isolated databases in one directory; cross-database queries with standard
db.tableSQL syntax - Single-file storage — custom
.apexformat per table, no server process, no external dependencies - Comprehensive SQL — DDL, DML, JOINs (INNER/LEFT/RIGHT/FULL/CROSS), subqueries (IN/EXISTS/scalar), CTEs (WITH ... AS), UNION/UNION ALL, window functions, EXPLAIN/ANALYZE, multi-statement execution
- 70+ built-in functions — math (ABS, SQRT, POWER, LOG, trig), string (UPPER, LOWER, SUBSTR, REPLACE, CONCAT, REGEXP_REPLACE, ...), date (YEAR, MONTH, DAY, DATEDIFF, DATE_ADD, ...), conditional (COALESCE, IFNULL, NULLIF, CASE WHEN, GREATEST, LEAST)
- Aggregation and analytics — COUNT, SUM, AVG, MIN, MAX, COUNT(DISTINCT), GROUP BY, HAVING, ORDER BY with NULLS FIRST/LAST
- Window functions — ROW_NUMBER, RANK, DENSE_RANK, NTILE, PERCENT_RANK, CUME_DIST, LAG, LEAD, FIRST_VALUE, LAST_VALUE, NTH_VALUE, RUNNING_SUM, and windowed SUM/AVG/COUNT/MIN/MAX with PARTITION BY and ORDER BY
- Transactions — BEGIN / COMMIT / ROLLBACK with OCC (Optimistic Concurrency Control), SAVEPOINT / ROLLBACK TO / RELEASE, statement-level auto-rollback
- MVCC — multi-version concurrency control with snapshot isolation, version store, and garbage collection
- Indexing — B-Tree and Hash indexes with CREATE INDEX / DROP INDEX / REINDEX; automatic multi-index AND intersection for compound predicates
- Full-text search — built-in NanoFTS integration with fuzzy matching
- JIT compilation — Cranelift-based JIT for predicate evaluation and SIMD-vectorized aggregations
- Zero-copy Python bridge — Arrow IPC between Rust and Python; direct conversion to Pandas, Polars, and PyArrow
- Durability levels — configurable
fast/safe/maxwith WAL support and crash recovery - Compact storage — dictionary encoding for low-cardinality strings, LZ4 and Zstd compression
- Parquet interop — COPY TO / COPY FROM Parquet files
- PostgreSQL wire protocol — built-in server for DBeaver, psql, DataGrip, pgAdmin, Navicat, and any PostgreSQL-compatible client; two distribution modes (Python CLI or standalone Rust binary)
- Arrow Flight gRPC server — high-performance columnar data transfer over HTTP/2; streams Arrow IPC RecordBatch directly, 4–7× faster than PG wire for large result sets; accessible via
pyarrow.flight, Go arrow, Java arrow, and any Arrow Flight client - Cross-platform — Linux, macOS, and Windows; x86_64 and ARM64; Python 3.9 -- 3.13
Installation
pip install apexbase
Build from source (requires Rust toolchain):
maturin develop --release
Quick Start
from apexbase import ApexClient
# Open (or create) a database directory
client = ApexClient("./data")
# Create a table
client.create_table("users")
# Store records
client.store({"name": "Alice", "age": 30, "city": "Beijing"})
client.store([
{"name": "Bob", "age": 25, "city": "Shanghai"},
{"name": "Charlie", "age": 35, "city": "Beijing"},
])
# SQL query
results = client.execute("SELECT * FROM users WHERE age > 28 ORDER BY age DESC")
# Convert to DataFrame
df = results.to_pandas()
client.close()
Usage Guide
Database Management
ApexBase supports multiple isolated databases within a single root directory. Each named database lives in its own subdirectory; the default database uses the root directory.
# Switch to a named database (creates it if needed)
client.use_database("analytics")
# Combined: switch database + select/create a table in one call
client.use(database="analytics", table="events")
# List all databases
dbs = client.list_databases() # ["analytics", "default", "hr"]
# Current database
print(client.current_database) # "analytics"
# Cross-database SQL — standard db.table syntax
client.execute("SELECT * FROM default.users")
client.execute("SELECT u.name, e.event FROM default.users u JOIN analytics.events e ON u.id = e.user_id")
client.execute("INSERT INTO analytics.events (name) VALUES ('click')")
client.execute("UPDATE default.users SET age = 31 WHERE name = 'Alice'")
client.execute("DELETE FROM default.users WHERE age < 18")
All SQL operations (SELECT, INSERT, UPDATE, DELETE, JOIN, CREATE TABLE, DROP TABLE, ALTER TABLE) support database.table qualified names, allowing cross-database queries in a single statement.
Table Management
Each table is stored as a separate .apex file. Tables must be created before use.
# Create with optional schema
client.create_table("orders", schema={
"order_id": "int64",
"product": "string",
"price": "float64",
})
# Switch tables
client.use_table("users")
# List / drop
tables = client.list_tables()
client.drop_table("orders")
Data Ingestion
import pandas as pd
import polars as pl
import pyarrow as pa
# Columnar dict (fastest for bulk data)
client.store({
"name": ["D", "E", "F"],
"age": [22, 32, 42],
})
# From pandas / polars / PyArrow (auto-creates table when table_name given)
client.from_pandas(pd.DataFrame({"name": ["G"], "age": [28]}), table_name="users")
client.from_polars(pl.DataFrame({"name": ["H"], "age": [38]}), table_name="users")
client.from_pyarrow(pa.table({"name": ["I"], "age": [48]}), table_name="users")
SQL
ApexBase supports a broad SQL dialect. Examples:
# DDL
client.execute("CREATE TABLE IF NOT EXISTS products")
client.execute("ALTER TABLE products ADD COLUMN name STRING")
client.execute("DROP TABLE IF EXISTS products")
# DML
client.execute("INSERT INTO users (name, age) VALUES ('Zoe', 29)")
client.execute("UPDATE users SET age = 31 WHERE name = 'Alice'")
client.execute("DELETE FROM users WHERE age < 20")
# SELECT with full clause support
client.execute("""
SELECT city, COUNT(*) AS cnt, AVG(age) AS avg_age
FROM users
WHERE age BETWEEN 20 AND 40
GROUP BY city
HAVING cnt > 1
ORDER BY avg_age DESC
LIMIT 10
""")
# JOINs
client.execute("""
SELECT u.name, o.product
FROM users u
INNER JOIN orders o ON u._id = o.user_id
""")
# Subqueries
client.execute("SELECT * FROM users WHERE age > (SELECT AVG(age) FROM users)")
client.execute("SELECT * FROM users WHERE city IN (SELECT city FROM cities WHERE pop > 1000000)")
# CTEs
client.execute("""
WITH seniors AS (SELECT * FROM users WHERE age >= 30)
SELECT city, COUNT(*) FROM seniors GROUP BY city
""")
# Window functions
client.execute("""
SELECT name, age,
ROW_NUMBER() OVER (ORDER BY age DESC) AS rank,
AVG(age) OVER (PARTITION BY city) AS city_avg
FROM users
""")
# UNION
client.execute("""
SELECT name FROM users WHERE city = 'Beijing'
UNION ALL
SELECT name FROM users WHERE city = 'Shanghai'
""")
# Multi-statement
client.execute("""
INSERT INTO users (name, age) VALUES ('New1', 20);
INSERT INTO users (name, age) VALUES ('New2', 21);
SELECT COUNT(*) FROM users
""")
# INSERT ... ON CONFLICT (upsert)
client.execute("""
INSERT INTO users (name, age) VALUES ('Alice', 31)
ON CONFLICT (name) DO UPDATE SET age = 31
""")
# CREATE TABLE AS
client.execute("CREATE TABLE seniors AS SELECT * FROM users WHERE age >= 30")
# EXPLAIN / EXPLAIN ANALYZE
client.execute("EXPLAIN SELECT * FROM users WHERE age > 25")
# Parquet interop
client.execute("COPY users TO '/tmp/users.parquet'")
client.execute("COPY users FROM '/tmp/users.parquet'")
Transactions
client.execute("BEGIN")
client.execute("INSERT INTO users (name, age) VALUES ('Tx1', 20)")
client.execute("SAVEPOINT sp1")
client.execute("INSERT INTO users (name, age) VALUES ('Tx2', 21)")
client.execute("ROLLBACK TO sp1") # undo Tx2 only
client.execute("COMMIT") # Tx1 persisted
Transactions use OCC validation — concurrent writes are detected at commit time.
Indexes
client.execute("CREATE INDEX idx_age ON users (age)")
client.execute("CREATE UNIQUE INDEX idx_name ON users (name)")
# Queries automatically use indexes when applicable
client.execute("SELECT * FROM users WHERE age = 30") # index scan
client.execute("DROP INDEX idx_age ON users")
client.execute("REINDEX users")
Full-Text Search
ApexBase ships a native full-text search engine (NanoFTS) integrated directly into the SQL executor. FTS is available through all interfaces — Python API, PostgreSQL Wire, and Arrow Flight — without any Python-side middleware.
SQL interface (recommended)
# 1. Create the FTS index via SQL DDL
client.execute("CREATE FTS INDEX ON articles (title, content)")
# Optional: specify lazy loading and cache size
client.execute("CREATE FTS INDEX ON logs WITH (lazy_load=true, cache_size=50000)")
# 2. Query using MATCH() / FUZZY_MATCH() in WHERE
results = client.execute("SELECT * FROM articles WHERE MATCH('rust programming')")
results = client.execute("SELECT title, content FROM articles WHERE FUZZY_MATCH('pytohn')")
# Combine with other predicates
results = client.execute("""
SELECT * FROM articles
WHERE MATCH('machine learning') AND published_at > '2024-01-01'
ORDER BY _id DESC LIMIT 20
""")
# FTS also works in aggregations
count = client.execute("SELECT COUNT(*) FROM articles WHERE MATCH('deep learning')")
# Manage indexes
client.execute("SHOW FTS INDEXES") # list all FTS-enabled tables
client.execute("ALTER FTS INDEX ON articles DISABLE") # disable, keep files
client.execute("DROP FTS INDEX ON articles") # remove index + delete files
Python API (alternative)
# Initialize FTS for current table
client.use_table("articles")
client.init_fts(index_fields=["title", "content"])
# Search
ids = client.search_text("database")
fuzzy = client.fuzzy_search_text("databse") # tolerates typos
recs = client.search_and_retrieve("python", limit=10)
top5 = client.search_and_retrieve_top("neural network", n=5)
# Lifecycle
client.get_fts_stats()
client.disable_fts() # suspend without deleting files
client.drop_fts() # remove index + delete files
Tip: The SQL interface (
MATCH()/FUZZY_MATCH()) works over PG Wire and Arrow Flight without any extra setup; the Python API methods are Python-process-only.
Record-Level Operations
record = client.retrieve(0) # by internal _id
records = client.retrieve_many([0, 1, 2])
all_data = client.retrieve_all()
client.replace(0, {"name": "Alice2", "age": 31})
client.delete(0)
client.delete([1, 2, 3])
Column Operations
client.add_column("email", "String")
client.rename_column("email", "email_addr")
client.drop_column("email_addr")
client.get_column_dtype("age") # "Int64"
client.list_fields() # ["name", "age", "city"]
ResultView
Query results are returned as ResultView objects with multiple output formats:
results = client.execute("SELECT * FROM users")
df = results.to_pandas() # pandas DataFrame (zero-copy by default)
pl_df = results.to_polars() # polars DataFrame
arrow = results.to_arrow() # PyArrow Table
dicts = results.to_dict() # list of dicts
results.shape # (rows, columns)
results.columns # column names
len(results) # row count
results.first() # first row as dict
results.scalar() # single value (for aggregates)
results.get_ids() # numpy array of _id values
Context Manager
with ApexClient("./data") as client:
client.create_table("tmp")
client.store({"key": "value"})
# Automatically closed on exit
Performance
ApexBase vs SQLite vs DuckDB (1M rows)
Three-way comparison on macOS 26.3, Apple arm (10 cores), 32 GB RAM. Python 3.11.10, ApexBase v1.5.0, SQLite v3.45.3, DuckDB v1.1.3, PyArrow v19.0.0.
Dataset: 1,000,000 rows × 5 columns (name, age, score, city, category). Average of 5 timed iterations after 2 warmup runs.
| Query | ApexBase | SQLite | DuckDB | vs Best Other |
|---|---|---|---|---|
| Bulk Insert (1M rows) | 273ms | 905ms | 863ms | 3.3x faster |
| COUNT(*) | 0.049ms | 8.26ms | 0.512ms | 10x faster |
| SELECT * LIMIT 100 [cold] ¹ | 0.113ms | 0.101ms | 0.470ms | 1.1x slower |
| SELECT * LIMIT 10K [cold] | 0.917ms | 6.53ms | 4.51ms | 4.9x faster |
| Filter (name = 'user_5000') | 0.035ms | 38.56ms | 1.58ms | 45x faster |
| Filter (age BETWEEN 25 AND 35) | 0.026ms | 155ms | 88.32ms | >3000x faster |
| GROUP BY city (10 groups) | 0.040ms | 344ms | 2.69ms | 67x faster |
| GROUP BY + HAVING | 0.026ms | 358ms | 2.99ms | 115x faster |
| ORDER BY score LIMIT 100 | 0.029ms | 50.29ms | 4.59ms | 158x faster |
| Aggregation (5 funcs) | 0.034ms | 78.22ms | 1.07ms | 31x faster |
| Complex (Filter+Group+Order) | 0.028ms | 152ms | 2.34ms | 84x faster |
| Point Lookup (by _id) | 0.026ms | 0.039ms | 2.51ms | 1.5x faster |
| Insert 1K rows | 0.602ms | 1.32ms | 2.44ms | 2.2x faster |
| SELECT * → pandas (full scan) | 0.605ms | 1100ms | 162ms | 268x faster |
| GROUP BY city, category (100 grp) | 0.017ms | 646ms | 4.14ms | 244x faster |
| LIKE filter (name LIKE 'user_1%') | 28.18ms | 129ms | 52.55ms | 1.9x faster |
| Multi-cond (age>30 AND score>50) | 0.033ms | 323ms | 189ms | >5000x faster |
| ORDER BY city, score DESC LIMIT 100 | 0.026ms | 65.62ms | 6.00ms | 231x faster |
| COUNT(DISTINCT city) | 0.026ms | 84.02ms | 3.23ms | 124x faster |
| IN filter (city IN 3 cities) | 0.029ms | 294ms | 153ms | >5000x faster |
| UPDATE rows (age = 25) | 207ms | 36.03ms | 14.35ms | 14.4x slower |
Summary: wins 19 of 21 benchmarks. Slower on UPDATE (disk-flush dominated) and cold SELECT * LIMIT 100¹.
¹ Cold-start note: ApexBase re-opens from disk on every iteration; SQLite reuses a warm connection. ApexBase true cold-start without GC interference: 0.027ms — 4× faster than SQLite's warm 0.101ms.
Reproduce: python benchmarks/bench_vs_sqlite_duckdb.py --rows 1000000
Server Protocols
ApexBase ships two complementary server protocols for external access:
| Protocol | Port | Best for | Binary / CLI |
|---|---|---|---|
| PG Wire | 5432 | DBeaver, psql, DataGrip, BI tools | apexbase-server |
| Arrow Flight | 50051 | Python (pyarrow), Go, Java, Spark | apexbase-flight |
Combined Launcher (Both Servers at Once)
# Start PG Wire + Arrow Flight simultaneously
apexbase-serve --dir /path/to/data
# Custom ports
apexbase-serve --dir /path/to/data --pg-port 5432 --flight-port 50051
# Disable one server
apexbase-serve --dir /path/to/data --no-flight # PG Wire only
apexbase-serve --dir /path/to/data --no-pg # Arrow Flight only
| Flag | Default | Description |
|---|---|---|
--dir, -d |
. |
Directory containing .apex database files |
--host |
127.0.0.1 |
Bind host for both servers |
--pg-port |
5432 |
PostgreSQL Wire port |
--flight-port |
50051 |
Arrow Flight gRPC port |
--no-pg |
— | Disable PG Wire server |
--no-flight |
— | Disable Arrow Flight server |
PostgreSQL Wire Protocol Server
ApexBase includes a built-in PostgreSQL wire protocol server, allowing you to connect using DBeaver, psql, DataGrip, pgAdmin, Navicat, and any other tool that supports the PostgreSQL protocol.
Starting the Server
Method 1: Python CLI (after pip install apexbase)
apexbase-server --dir /path/to/data --port 5432
Options:
| Flag | Default | Description |
|---|---|---|
--dir, -d |
. |
Directory containing .apex database files |
--host |
127.0.0.1 |
Host to bind to (use 0.0.0.0 for remote access) |
--port, -p |
5432 |
Port to listen on |
Method 2: Standalone Rust binary (no Python required)
# Build
cargo build --release --bin apexbase-server --no-default-features --features server
# Run
./target/release/apexbase-server --dir /path/to/data --port 5432
Connecting with Database Tools
The server emulates PostgreSQL 15.0, reports a pg_catalog and information_schema compatible metadata layer, and supports SimpleQuery protocol. No username or password is required (authentication is disabled).
DBeaver
- New Database Connection → choose PostgreSQL
- Fill in connection details:
- Host:
127.0.0.1(or the--hostyou specified) - Port:
5432(or the--portyou specified) - Database:
apexbase(any value accepted) - Authentication: select No Authentication or leave username/password empty
- Host:
- Click Test Connection → Finish
- DBeaver will discover tables and columns automatically via
pg_catalog/information_schema
psql
psql -h 127.0.0.1 -p 5432 -d apexbase
DataGrip / IntelliJ IDEA
- Database tool window → + → Data Source → PostgreSQL
- Set Host, Port, Database as above; leave User and Password empty
- Click Test Connection → OK
pgAdmin
- Add New Server → General tab: give it a name
- Connection tab: set Host and Port; leave Username as
postgres(ignored) and Password empty - Save — tables appear under Databases > apexbase > Schemas > public > Tables
Navicat for PostgreSQL
- Connection → PostgreSQL
- Set Host, Port; leave User and Password blank
- Test Connection → OK
Other Compatible Tools
Any tool or library that speaks the PostgreSQL wire protocol (libpq) can connect, including:
- TablePlus, Beekeeper Studio, Heidisql
- Python:
psycopg2/asyncpg - Node.js:
pg(node-postgres) - Go:
pgx/lib/pq - Rust:
tokio-postgres/sqlx - Java: JDBC PostgreSQL driver
Example with psycopg2:
import psycopg2
conn = psycopg2.connect(host="127.0.0.1", port=5432, dbname="apexbase")
cur = conn.cursor()
cur.execute("SELECT * FROM users LIMIT 10")
print(cur.fetchall())
conn.close()
Supported SQL over Wire Protocol
The wire protocol server passes SQL directly to the ApexBase query engine. All SQL features listed in Usage Guide are available, including JOINs, CTEs, window functions, transactions, and DDL.
Metadata Compatibility
The server implements a pg_catalog compatibility layer that responds to common catalog queries:
| Catalog / View | Purpose |
|---|---|
pg_catalog.pg_namespace |
Schema listing |
pg_catalog.pg_database |
Database listing |
pg_catalog.pg_class |
Table discovery |
pg_catalog.pg_attribute |
Column metadata |
pg_catalog.pg_type |
Type information |
pg_catalog.pg_settings |
Server settings |
information_schema.tables |
Standard table listing |
information_schema.columns |
Standard column listing |
SET / SHOW statements |
Client configuration probes |
This enables GUI tools to browse tables, inspect columns, and display data types without modification.
Supported Protocol Features
| Feature | Status |
|---|---|
| Simple Query Protocol | ✅ Fully supported |
| Extended Query Protocol (prepared statements) | ✅ Supported — schema cached, binary format for psycopg3 |
Cross-database SQL (db.table) |
✅ Supported — USE dbname / \c dbname to switch context |
pg_catalog / information_schema |
✅ Compatible layer for GUI tools |
| All ApexBase SQL (JOINs, CTEs, window functions, DDL) | ✅ Full pass-through to query engine |
Limitations
- Authentication is not implemented — the server accepts all connections regardless of username/password
- SSL/TLS is not supported — use an SSH tunnel (
ssh -L 5432:127.0.0.1:5432 user@host) for remote access
Arrow Flight gRPC Server
Arrow Flight sends Arrow IPC RecordBatch directly over gRPC (HTTP/2), bypassing per-row text serialization entirely. It is 4–7× faster than PG wire for large result sets (10K+ rows).
| Query | PG Wire | Arrow Flight | Speedup |
|---|---|---|---|
| SELECT 10K rows | 5.1ms | 0.7ms | 7× faster |
| BETWEEN (~33K rows) | 22ms | 5.6ms | 4× faster |
| Single row / point lookup | ~7.5ms | ~7.9ms | equal |
Starting the Flight Server
Python CLI:
apexbase-flight --dir /path/to/data --port 50051
Standalone Rust binary:
cargo build --release --bin apexbase-flight --no-default-features --features flight
./target/release/apexbase-flight --dir /path/to/data --port 50051
Python Client
import pyarrow.flight as fl
import pandas as pd
client = fl.connect("grpc://127.0.0.1:50051")
# SELECT — returns Arrow Table
table = client.do_get(fl.Ticket(b"SELECT * FROM users LIMIT 10000")).read_all()
df = table.to_pandas() # zero-copy to pandas
pl_df = pl.from_arrow(table) # zero-copy to polars
# DML / DDL
client.do_action(fl.Action("sql", b"INSERT INTO users (name, age) VALUES ('Alice', 30)"))
client.do_action(fl.Action("sql", b"CREATE TABLE logs (event STRING, ts INT64)"))
# List available actions
for action in client.list_actions():
print(action.type, "—", action.description)
When to Use Arrow Flight vs PG Wire
| Scenario | Recommendation |
|---|---|
| DBeaver / Tableau / BI tools | PG Wire (only option) |
| Python + small queries (<100 rows) | Native API (fastest, in-process) |
| Python + large queries (10K+ rows, remote) | Arrow Flight (4–7× faster than PG wire) |
| Go / Java / Spark workers | Arrow Flight (native Arrow support) |
| Local Python (same machine) | Native API (ApexClient.execute()) |
PyO3 Python API
Both servers are also accessible as blocking Python functions (released GIL):
import threading
from apexbase._core import start_pg_server, start_flight_server
t1 = threading.Thread(target=start_pg_server, args=("/data", "0.0.0.0", 5432), daemon=True)
t2 = threading.Thread(target=start_flight_server, args=("/data", "0.0.0.0", 50051), daemon=True)
t1.start()
t2.start()
Architecture
Python (ApexClient)
|
|-- Arrow IPC / columnar dict --------> ResultView (Pandas / Polars / PyArrow)
|
Rust Core (PyO3 bindings)
|
+-- SQL Parser -----> Query Planner -----> Query Executor
| |
| +-- JIT Compiler (Cranelift) |
| +-- Expression Evaluator (70+ functions) |
| +-- Window Function Engine |
| |
+-- Storage Engine |
| +-- V4 Row Group Format (.apex) |
| +-- DeltaStore (cell-level updates) |
| +-- WAL (write-ahead log) |
| +-- Mmap on-demand reads |
| +-- LZ4 / Zstd compression |
| +-- Dictionary encoding |
| |
+-- Index Manager (B-Tree, Hash) |
+-- TxnManager (OCC + MVCC) |
+-- NanoFTS (full-text search) |
+-- PG Wire Protocol Server (pgwire) |
| +-- DBeaver / psql / DataGrip / pgAdmin |
| +-- pg_catalog & information_schema compat |
| |
+-- Arrow Flight gRPC Server (tonic + HTTP/2) |
+-- pyarrow.flight / Go / Java / Spark |
+-- Arrow IPC — zero serialization overhead |
Storage Format
ApexBase uses a custom V4 Row Group format:
- Each table is a single
.apexfile containing a header, row groups, and a footer - Row groups store columns contiguously with per-column compression (LZ4 or Zstd)
- Low-cardinality string columns are dictionary-encoded on disk
- Null bitmaps are stored per column per row group
- A DeltaStore file (
.deltastore) holds cell-level updates that are merged on read and compacted automatically - WAL records provide crash recovery with idempotent replay
Query Execution
- The SQL parser produces an AST that the query planner analyzes for optimization strategy
- Fast paths bypass the full executor for common patterns (COUNT(*), SELECT * LIMIT N, point lookups, single-column GROUP BY)
- Arrow RecordBatch is the internal data representation; results flow to Python via Arrow IPC with zero-copy when possible
- Repeated identical read queries are served from an in-process result cache
API Reference
ApexClient
Constructor
ApexClient(
dirpath="./data", # data directory
drop_if_exists=False, # clear existing data on open
batch_size=1000, # batch size for operations
enable_cache=True, # enable query cache
cache_size=10000, # cache capacity
prefer_arrow_format=True, # prefer Arrow format for results
durability="fast", # "fast" | "safe" | "max"
)
Database Management
| Method | Description |
|---|---|
use_database(database='default') |
Switch to a named database (creates it if needed) |
use(database='default', table=None) |
Switch database and optionally select/create a table |
list_databases() |
List all databases ('default' always included) |
current_database |
Property: current database name |
Table Management
| Method | Description |
|---|---|
create_table(name, schema=None) |
Create a new table, optionally with pre-defined schema |
drop_table(name) |
Drop a table |
use_table(name) |
Switch active table |
list_tables() |
List all tables in the current database |
current_table |
Property: current table name |
Data Storage
| Method | Description |
|---|---|
store(data) |
Store data (dict, list, DataFrame, Arrow Table) |
from_pandas(df, table_name=None) |
Import from pandas DataFrame |
from_polars(df, table_name=None) |
Import from polars DataFrame |
from_pyarrow(table, table_name=None) |
Import from PyArrow Table |
Data Retrieval
| Method | Description |
|---|---|
execute(sql) |
Execute SQL statement(s) |
query(where, limit) |
Query with WHERE expression |
retrieve(id) |
Get record by _id |
retrieve_many(ids) |
Get multiple records by _id |
retrieve_all() |
Get all records |
count_rows(table) |
Count rows in table |
Data Modification
| Method | Description |
|---|---|
replace(id, data) |
Replace a record |
batch_replace({id: data}) |
Batch replace records |
delete(id) or delete([ids]) |
Delete record(s) |
Column Operations
| Method | Description |
|---|---|
add_column(name, type) |
Add a column |
drop_column(name) |
Drop a column |
rename_column(old, new) |
Rename a column |
get_column_dtype(name) |
Get column data type |
list_fields() |
List all fields |
Full-Text Search
| Method | Description |
|---|---|
init_fts(fields, lazy_load, cache_size) |
Initialize FTS |
search_text(query) |
Search documents |
fuzzy_search_text(query) |
Fuzzy search |
search_and_retrieve(query, limit, offset) |
Search and return records |
search_and_retrieve_top(query, n) |
Top N results |
get_fts_stats() |
FTS statistics |
disable_fts() / drop_fts() |
Disable or drop FTS |
Utility
| Method | Description |
|---|---|
flush() |
Flush data to disk |
set_auto_flush(rows, bytes) |
Set auto-flush thresholds |
get_auto_flush() |
Get auto-flush config |
estimate_memory_bytes() |
Estimate memory usage |
close() |
Close the client |
ResultView
| Method / Property | Description |
|---|---|
to_pandas(zero_copy=True) |
Convert to pandas DataFrame |
to_polars() |
Convert to polars DataFrame |
to_arrow() |
Convert to PyArrow Table |
to_dict() |
Convert to list of dicts |
scalar() |
Get single scalar value |
first() |
Get first row as dict |
get_ids(return_list=False) |
Get record IDs |
shape |
(rows, columns) |
columns |
Column names |
__len__() |
Row count |
__iter__() |
Iterate over rows |
__getitem__(idx) |
Index access |
Documentation
Additional documentation is available in the docs/ directory.
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file apexbase-1.5.0.tar.gz.
File metadata
- Download URL: apexbase-1.5.0.tar.gz
- Upload date:
- Size: 671.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49450f430dca54d8d4fc63a6beb1584d4b05c9887b941e0cc88ad217306a6f54
|
|
| MD5 |
8bca436bafe808cb6228484bd91027f8
|
|
| BLAKE2b-256 |
0609358935c82774291cd484dc28aa7fee220709670ae38e09cf3d952739d154
|
File details
Details for the file apexbase-1.5.0-cp313-cp313-win_amd64.whl.
File metadata
- Download URL: apexbase-1.5.0-cp313-cp313-win_amd64.whl
- Upload date:
- Size: 6.6 MB
- Tags: CPython 3.13, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b0918def0a018b18388dd0f7e3c673c8370c0871485a57f5a90ce844bbd3beaf
|
|
| MD5 |
85a280e84a9ca9342e3a9ba2bb04ae82
|
|
| BLAKE2b-256 |
b37a62e0036e5d48a480f10a1b4fcc936adc595c5b972e176a8f154f972bc8ad
|
File details
Details for the file apexbase-1.5.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: apexbase-1.5.0-cp313-cp313-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 7.2 MB
- Tags: CPython 3.13, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c747497dfd70dadf5575aeb6994ab2dd0257ef4128ef36e5e28d9134179b2c19
|
|
| MD5 |
edeabee385f8bccac29e140c199ea48e
|
|
| BLAKE2b-256 |
18f019973c74fc7c64ce39bde868fde93a024ddcd5ddc0ba64e536acc6e244e9
|
File details
Details for the file apexbase-1.5.0-cp313-cp313-macosx_11_0_arm64.whl.
File metadata
- Download URL: apexbase-1.5.0-cp313-cp313-macosx_11_0_arm64.whl
- Upload date:
- Size: 6.1 MB
- Tags: CPython 3.13, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a53ce617be0631cb6dfd0ccb2a0194b3732558b16d28f8964acc4b01437b908
|
|
| MD5 |
e36af4606a2e6d56e68ef957762681a7
|
|
| BLAKE2b-256 |
88c407d86478f48719c5f105b5bab36d9c429a8162810a01ecacf9484c400e64
|
File details
Details for the file apexbase-1.5.0-cp312-cp312-win_amd64.whl.
File metadata
- Download URL: apexbase-1.5.0-cp312-cp312-win_amd64.whl
- Upload date:
- Size: 6.6 MB
- Tags: CPython 3.12, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cae7faa209ad9d6a2d46db517e8f17744f4fb5a7ee03ebe3a80033097101ccfb
|
|
| MD5 |
c5bc54144e8bce6c37cdb79316464b05
|
|
| BLAKE2b-256 |
14313226daad6a272b209b11c01b6343ac7388f8e72f3101391e20e05ca0e46e
|
File details
Details for the file apexbase-1.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: apexbase-1.5.0-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 7.2 MB
- Tags: CPython 3.12, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2cd7e94d2abc7efdfbcb874382d8ec73ec6a7d30bd94adc9e756e72109707c06
|
|
| MD5 |
82d240b0393fa077513f868d3db8a9e6
|
|
| BLAKE2b-256 |
0fdc9bff0dd16d735ccf35d93b1327f3a816d750e814e9f0c93658b185ef4d1a
|
File details
Details for the file apexbase-1.5.0-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: apexbase-1.5.0-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 6.1 MB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5dcf57250c8234fba57d40c845222dfc13ce28b91fee2beac8492e7c5872cadf
|
|
| MD5 |
129758fdd8c728bb5c77e0ced538e452
|
|
| BLAKE2b-256 |
fa62722f9d16aa804a93e5fee0fdbf1a40a2ecfa96bcec130f635da0e35fc96a
|
File details
Details for the file apexbase-1.5.0-cp311-cp311-win_amd64.whl.
File metadata
- Download URL: apexbase-1.5.0-cp311-cp311-win_amd64.whl
- Upload date:
- Size: 6.6 MB
- Tags: CPython 3.11, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da568011d833c3a0f03a9991d2b4af5f2e802ffbd7cd5ac84a5e1b457a58fe26
|
|
| MD5 |
0932916e0e6b95d0640aed3d42312723
|
|
| BLAKE2b-256 |
538b0d77a8455625cfe30fb889c60646588853b9aad1d2195d44247d1f16c2ad
|
File details
Details for the file apexbase-1.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: apexbase-1.5.0-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 7.2 MB
- Tags: CPython 3.11, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c3beb3aa9c0e29f5ff34842217c25e6031006ca1a61ce6409b32ff681d79f6f
|
|
| MD5 |
01e4911c49737a2b7edaeb461c8df76c
|
|
| BLAKE2b-256 |
48effc4a1aafb546a4a7d856dd85c25e55d04162f3ca090e4ff0dbf104648ff5
|
File details
Details for the file apexbase-1.5.0-cp311-cp311-macosx_11_0_arm64.whl.
File metadata
- Download URL: apexbase-1.5.0-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 6.1 MB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0007a33c275043c4e91614c23fd303f97a9894fe21ec46afe4d378f69d35f25f
|
|
| MD5 |
63972a0fee5ab5e5b7843d7da4dd4b72
|
|
| BLAKE2b-256 |
70b79a033e392976c65b83e34869ed283c68101adb90bcee925a2ec3d287cc07
|
File details
Details for the file apexbase-1.5.0-cp310-cp310-win_amd64.whl.
File metadata
- Download URL: apexbase-1.5.0-cp310-cp310-win_amd64.whl
- Upload date:
- Size: 6.6 MB
- Tags: CPython 3.10, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb4b2669456c2b8939001dbba04df2cf91282d6917d44230499303afef9ea355
|
|
| MD5 |
5d8541f9c21c557b8dd00554573fab2b
|
|
| BLAKE2b-256 |
66f92ed7ff7f386fa338b2ed92fbba92c24907d46ad594a2ec80f74f18fc3d80
|
File details
Details for the file apexbase-1.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: apexbase-1.5.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 7.2 MB
- Tags: CPython 3.10, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ada2ae9d3766226207deaf9ef547a8213494f61d1452a8447c46f9e377ec153d
|
|
| MD5 |
7bb384bbe04765664853a119786a213c
|
|
| BLAKE2b-256 |
ec04ef60123aad574c58a6765719df47c7225b53f5267244ad472e4d58c3eeb9
|
File details
Details for the file apexbase-1.5.0-cp310-cp310-macosx_11_0_arm64.whl.
File metadata
- Download URL: apexbase-1.5.0-cp310-cp310-macosx_11_0_arm64.whl
- Upload date:
- Size: 6.1 MB
- Tags: CPython 3.10, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a2b2806675c7f99e166a8226f70b1f9254b227f66ab3a10bf2e8aba4f23a5ec
|
|
| MD5 |
1f04c0b18386653340623bba6851a886
|
|
| BLAKE2b-256 |
70009a32d7b7d9f03737b236ba3c9aa74226a41e745a15dd4acb6beedc15eac7
|
File details
Details for the file apexbase-1.5.0-cp39-cp39-win_amd64.whl.
File metadata
- Download URL: apexbase-1.5.0-cp39-cp39-win_amd64.whl
- Upload date:
- Size: 6.6 MB
- Tags: CPython 3.9, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f5aa234c55d99999d6f1a80e9dc50cf2d61bd86d543ff31013dbcddef8c71d33
|
|
| MD5 |
8f3361eb02356c56aca443d0d246655c
|
|
| BLAKE2b-256 |
5cf2dc555b0deb00c2210d02b1f14e643628b17bc77a4e3d1c2a81fd2a6a8905
|
File details
Details for the file apexbase-1.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: apexbase-1.5.0-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 7.2 MB
- Tags: CPython 3.9, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c31905b7a0b9ed418558a1036dfc2da13b62f711081833a33a9d506314b0fdc1
|
|
| MD5 |
6968381d316501c183e293fb7c1c41ac
|
|
| BLAKE2b-256 |
aaab72da4adbedba22681809dd0e6d3776ad924115a8901731ec7672a8b3c3bb
|
File details
Details for the file apexbase-1.5.0-cp39-cp39-macosx_11_0_arm64.whl.
File metadata
- Download URL: apexbase-1.5.0-cp39-cp39-macosx_11_0_arm64.whl
- Upload date:
- Size: 6.1 MB
- Tags: CPython 3.9, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c584a2b7e23ccb60cbc1588a6daf9916cef7a559fbf2151c95a5258ff9a98df0
|
|
| MD5 |
846dfd0742617cbefa58a4431c6bfa08
|
|
| BLAKE2b-256 |
b79a5b68520d5dc27b5ad42f737b645edc81c57cab871638f28c3323a3d2e135
|