GalaxDB Python client -- SQL, vector search, and local embeddings in one database
Project description
galaxdb-client
The Python client for GalaxDB -- an AI-native database that combines SQL, vector search, and local embeddings in a single binary.
No external API keys. No separate vector database. No data pipeline. One connection string.
Installation
pip install galaxdb-client
Requires Python 3.9+. Pre-built wheels for Linux x86_64, macOS Intel, macOS Apple Silicon, and Windows x86_64.
What GalaxDB gives you
- Full SQL -- CREATE, INSERT, UPDATE, DELETE, SELECT with WHERE filters
- Local embeddings -- text to vector conversion runs inside the process, no API key needed
- Semantic search --
SEMANTIC_MATCH(col, 'query', threshold)in any WHERE clause - HNSW vector index -- recall@10 = 0.990 on SIFT-1M at ef=200
- Time-travel queries --
SELECT ... AT VERSION 'tag'to query historical snapshots - Training export --
CREATE VERSION TAG ... FOR TRAININGexports a Lance dataset, zero-copy PyTorch-ready - Near-dedup --
WHERE NOT DUPLICATEremoves near-duplicate rows using MinHash LSH - Crash safety -- WAL + checksum, 7 chaos scenarios pass in under 11 seconds
- Encryption at rest -- AES-256-GCM on every block and WAL record
Quick start -- embedded mode (no server)
import galaxdb
# Open or create a database at a local path
db = galaxdb.Database("/tmp/mydb")
# Create a table
db.execute("CREATE TABLE products (id INT PRIMARY KEY, name TEXT, price INT)")
# Insert rows
db.execute("INSERT INTO products (id, name, price) VALUES (1, 'Laptop', 1200)")
db.execute("INSERT INTO products (id, name, price) VALUES (2, 'Headphones', 150)")
db.execute("INSERT INTO products (id, name, price) VALUES (3, 'Keyboard', 80)")
# Query with filter
rows = db.execute("SELECT * FROM products WHERE price > 100")
for row in rows:
print(row)
# {'id': '1', 'name': 'Laptop', 'price': '1200'}
# {'id': '2', 'name': 'Headphones', 'price': '150'}
# Update
db.execute("UPDATE products SET price = 1100 WHERE id = 1")
# Delete
db.execute("DELETE FROM products WHERE id = 3")
# Table info
print(db.table_exists("products")) # True
print(db.table_count) # 1
Semantic search with local embeddings
Start the server with the embedding sidecar to enable SEMANTIC_MATCH:
galaxdb-server \
--data-dir ./data \
--port 5433 \
--sidecar /usr/local/bin/galaxdb-sidecar \
--model sentence-transformers/all-MiniLM-L6-v2
Then connect from Python:
import galaxdb
conn = galaxdb.connect("host=localhost port=5433 dbname=galaxdb sslmode=disable")
# Create a table with an embedding column
conn.execute("""
CREATE TABLE docs (
id INT PRIMARY KEY,
body TEXT EMBEDDING MODEL 'sentence-transformers/all-MiniLM-L6-v2' DIM 384
)
""")
# Insert rows -- embeddings are computed automatically by the local sidecar
conn.execute("INSERT INTO docs (id, body) VALUES (1, 'machine learning and neural networks')")
conn.execute("INSERT INTO docs (id, body) VALUES (2, 'rust programming language systems')")
conn.execute("INSERT INTO docs (id, body) VALUES (3, 'cooking recipes italian pasta')")
conn.execute("INSERT INTO docs (id, body) VALUES (4, 'deep learning transformers attention')")
# Semantic search -- no external API, no separate vector DB
rows = conn.execute(
"SELECT id, body FROM docs WHERE SEMANTIC_MATCH(body, 'artificial intelligence', 0.4)"
)
for row in rows:
print(row)
# Returns rows 1 and 4 -- the AI/ML related documents
conn.close()
Time-travel queries
# Create a named snapshot
conn.execute("CREATE VERSION TAG 'v1' FOR TRAINING WITH TRAINING PRECISION 'float32'")
# Insert more data after the snapshot
conn.execute("INSERT INTO docs (id, body) VALUES (5, 'new document added later')")
# Query the snapshot -- only sees data from before the tag
rows = conn.execute("SELECT * FROM docs AT VERSION 'v1'")
# Returns rows 1-4, not row 5
Training export
import galaxdb
import lance
import torch
db = galaxdb.Database("./data")
# Create a training snapshot
db.execute("CREATE VERSION TAG 'train-v1' FOR TRAINING WITH TRAINING PRECISION 'float32'")
# Export as a Lance dataset
path = db.training_dataset("train-v1")
# Load into PyTorch -- zero-copy, memory-mapped
dataset = lance.dataset(path).to_pytorch()
loader = torch.utils.data.DataLoader(dataset, batch_size=32)
Bulk insert
conn.execute("""
BULK INSERT INTO products (id, name, price) VALUES
(10, 'Monitor', 400),
(11, 'Mouse', 30),
(12, 'Webcam', 90)
""")
Near-duplicate deduplication
# Select only unique documents (one per near-duplicate cluster)
rows = conn.execute("SELECT * FROM docs WHERE NOT DUPLICATE")
Backup and restore
conn.execute("BACKUP TO '/path/to/backup'")
conn.execute("RESTORE FROM '/path/to/backup'")
Server mode -- connect to a running GalaxDB server
import galaxdb
# Connect using a PostgreSQL-style connection string
conn = galaxdb.connect("host=localhost port=5433 dbname=galaxdb sslmode=disable")
conn.execute("CREATE TABLE users (id INT PRIMARY KEY, name TEXT, age INT)")
conn.execute("INSERT INTO users (id, name, age) VALUES (1, 'Alice', 30)")
rows = conn.execute("SELECT * FROM users WHERE age > 25")
for row in rows:
print(row)
conn.close()
Any PostgreSQL client works -- psycopg2, SQLAlchemy, tokio-postgres, pg (Node.js), JDBC.
Docker
docker run -d -p 5433:5433 -p 9090:9090 \
-v /data:/data \
-v ~/.cache/huggingface:/root/.cache/huggingface \
harbi256/galaxdb:latest \
--data-dir /data \
--sidecar /usr/local/bin/galaxdb-sidecar \
--model sentence-transformers/all-MiniLM-L6-v2
Observability
# Health check
curl http://localhost:9090/health
# {"status":"ok","version":"1.0.0-beta.1","subsystems":{"sidecar_healthy":true}}
# Prometheus metrics
curl http://localhost:9090/metrics
Links
License
Apache 2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file galaxdb_client-0.1.1.tar.gz.
File metadata
- Download URL: galaxdb_client-0.1.1.tar.gz
- Upload date:
- Size: 415.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
74e33bda0b5cb132bfe75a2fb8403062dde987ad080088a1d97b0230d2874e55
|
|
| MD5 |
72a5038bdec6c5ee9af36947e147ba84
|
|
| BLAKE2b-256 |
59dd46a4cd494a777624f407d6b6e82bcbfa188634552e71dc197b37277019ae
|
File details
Details for the file galaxdb_client-0.1.1-cp313-cp313-macosx_10_12_x86_64.whl.
File metadata
- Download URL: galaxdb_client-0.1.1-cp313-cp313-macosx_10_12_x86_64.whl
- Upload date:
- Size: 22.4 MB
- Tags: CPython 3.13, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c8bc70eaba04fb5d7e339b510ee62df261b88bb975c0ac55b4611a75cb9fd37
|
|
| MD5 |
fe5335cd9d47ade9470c85d89cc2920a
|
|
| BLAKE2b-256 |
7e7dad607b48d8cb50ae1f6aa7abe1d5b91818b37e2880320828338ce68b9fc6
|
File details
Details for the file galaxdb_client-0.1.1-cp312-cp312-manylinux_2_35_x86_64.whl.
File metadata
- Download URL: galaxdb_client-0.1.1-cp312-cp312-manylinux_2_35_x86_64.whl
- Upload date:
- Size: 23.5 MB
- Tags: CPython 3.12, manylinux: glibc 2.35+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
944295219d15bd297aefaaec8e92be6b7b4287f87e4d5248bc8a9c383b361468
|
|
| MD5 |
426bb3af2f74d87a92175586e1321fc5
|
|
| BLAKE2b-256 |
f2abeabf9f670c710a05f8f77effa6d2aab0fa867651289b7062673efd7cb092
|