Skip to main content

GalaxDB Python client -- SQL, vector search, and local embeddings in one database

Project description

galaxdb-client

The Python client for GalaxDB -- an AI-native database that combines SQL, vector search, and local embeddings in a single binary.

No external API keys. No separate vector database. No data pipeline. One connection string.

Installation

pip install galaxdb-client

Requires Python 3.9+. Pre-built wheels for Linux x86_64, macOS Intel, macOS Apple Silicon, and Windows x86_64.

What GalaxDB gives you

  • Full SQL -- CREATE, INSERT, UPDATE, DELETE, SELECT with WHERE filters
  • Local embeddings -- text to vector conversion runs inside the process, no API key needed
  • Semantic search -- SEMANTIC_MATCH(col, 'query', threshold) in any WHERE clause
  • HNSW vector index -- recall@10 = 0.990 on SIFT-1M at ef=200
  • Time-travel queries -- SELECT ... AT VERSION 'tag' to query historical snapshots
  • Training export -- CREATE VERSION TAG ... FOR TRAINING exports a Lance dataset, zero-copy PyTorch-ready
  • Near-dedup -- WHERE NOT DUPLICATE removes near-duplicate rows using MinHash LSH
  • Crash safety -- WAL + checksum, 7 chaos scenarios pass in under 11 seconds
  • Encryption at rest -- AES-256-GCM on every block and WAL record

Quick start -- embedded mode (no server)

import galaxdb

# Open or create a database at a local path
db = galaxdb.Database("/tmp/mydb")

# Create a table
db.execute("CREATE TABLE products (id INT PRIMARY KEY, name TEXT, price INT)")

# Insert rows
db.execute("INSERT INTO products (id, name, price) VALUES (1, 'Laptop', 1200)")
db.execute("INSERT INTO products (id, name, price) VALUES (2, 'Headphones', 150)")
db.execute("INSERT INTO products (id, name, price) VALUES (3, 'Keyboard', 80)")

# Query with filter
rows = db.execute("SELECT * FROM products WHERE price > 100")
for row in rows:
    print(row)
# {'id': '1', 'name': 'Laptop', 'price': '1200'}
# {'id': '2', 'name': 'Headphones', 'price': '150'}

# Update
db.execute("UPDATE products SET price = 1100 WHERE id = 1")

# Delete
db.execute("DELETE FROM products WHERE id = 3")

# Table info
print(db.table_exists("products"))  # True
print(db.table_count)               # 1

Semantic search with local embeddings

Start the server with the embedding sidecar to enable SEMANTIC_MATCH:

galaxdb-server \
  --data-dir ./data \
  --port 5433 \
  --sidecar /usr/local/bin/galaxdb-sidecar \
  --model sentence-transformers/all-MiniLM-L6-v2

Then connect from Python:

import galaxdb

conn = galaxdb.connect("host=localhost port=5433 dbname=galaxdb sslmode=disable")

# Create a table with an embedding column
conn.execute("""
    CREATE TABLE docs (
        id   INT PRIMARY KEY,
        body TEXT EMBEDDING MODEL 'sentence-transformers/all-MiniLM-L6-v2' DIM 384
    )
""")

# Insert rows -- embeddings are computed automatically by the local sidecar
conn.execute("INSERT INTO docs (id, body) VALUES (1, 'machine learning and neural networks')")
conn.execute("INSERT INTO docs (id, body) VALUES (2, 'rust programming language systems')")
conn.execute("INSERT INTO docs (id, body) VALUES (3, 'cooking recipes italian pasta')")
conn.execute("INSERT INTO docs (id, body) VALUES (4, 'deep learning transformers attention')")

# Semantic search -- no external API, no separate vector DB
rows = conn.execute(
    "SELECT id, body FROM docs WHERE SEMANTIC_MATCH(body, 'artificial intelligence', 0.4)"
)
for row in rows:
    print(row)
# Returns rows 1 and 4 -- the AI/ML related documents

conn.close()

Time-travel queries

# Create a named snapshot
conn.execute("CREATE VERSION TAG 'v1' FOR TRAINING WITH TRAINING PRECISION 'float32'")

# Insert more data after the snapshot
conn.execute("INSERT INTO docs (id, body) VALUES (5, 'new document added later')")

# Query the snapshot -- only sees data from before the tag
rows = conn.execute("SELECT * FROM docs AT VERSION 'v1'")
# Returns rows 1-4, not row 5

Training export

import galaxdb
import lance
import torch

db = galaxdb.Database("./data")

# Create a training snapshot
db.execute("CREATE VERSION TAG 'train-v1' FOR TRAINING WITH TRAINING PRECISION 'float32'")

# Export as a Lance dataset
path = db.training_dataset("train-v1")

# Load into PyTorch -- zero-copy, memory-mapped
dataset = lance.dataset(path).to_pytorch()
loader = torch.utils.data.DataLoader(dataset, batch_size=32)

Bulk insert

conn.execute("""
    BULK INSERT INTO products (id, name, price) VALUES
      (10, 'Monitor', 400),
      (11, 'Mouse', 30),
      (12, 'Webcam', 90)
""")

Near-duplicate deduplication

# Select only unique documents (one per near-duplicate cluster)
rows = conn.execute("SELECT * FROM docs WHERE NOT DUPLICATE")

Backup and restore

conn.execute("BACKUP TO '/path/to/backup'")
conn.execute("RESTORE FROM '/path/to/backup'")

Server mode -- connect to a running GalaxDB server

import galaxdb

# Connect using a PostgreSQL-style connection string
conn = galaxdb.connect("host=localhost port=5433 dbname=galaxdb sslmode=disable")

conn.execute("CREATE TABLE users (id INT PRIMARY KEY, name TEXT, age INT)")
conn.execute("INSERT INTO users (id, name, age) VALUES (1, 'Alice', 30)")

rows = conn.execute("SELECT * FROM users WHERE age > 25")
for row in rows:
    print(row)

conn.close()

Any PostgreSQL client works -- psycopg2, SQLAlchemy, tokio-postgres, pg (Node.js), JDBC.

Docker

docker run -d -p 5433:5433 -p 9090:9090 \
  -v /data:/data \
  -v ~/.cache/huggingface:/root/.cache/huggingface \
  harbi256/galaxdb:latest \
  --data-dir /data \
  --sidecar /usr/local/bin/galaxdb-sidecar \
  --model sentence-transformers/all-MiniLM-L6-v2

Observability

# Health check
curl http://localhost:9090/health
# {"status":"ok","version":"1.0.0-beta.1","subsystems":{"sidecar_healthy":true}}

# Prometheus metrics
curl http://localhost:9090/metrics

Links

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

galaxdb_client-0.1.1.tar.gz (415.9 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

galaxdb_client-0.1.1-cp313-cp313-macosx_10_12_x86_64.whl (22.4 MB view details)

Uploaded CPython 3.13macOS 10.12+ x86-64

galaxdb_client-0.1.1-cp312-cp312-manylinux_2_35_x86_64.whl (23.5 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.35+ x86-64

File details

Details for the file galaxdb_client-0.1.1.tar.gz.

File metadata

  • Download URL: galaxdb_client-0.1.1.tar.gz
  • Upload date:
  • Size: 415.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for galaxdb_client-0.1.1.tar.gz
Algorithm Hash digest
SHA256 74e33bda0b5cb132bfe75a2fb8403062dde987ad080088a1d97b0230d2874e55
MD5 72a5038bdec6c5ee9af36947e147ba84
BLAKE2b-256 59dd46a4cd494a777624f407d6b6e82bcbfa188634552e71dc197b37277019ae

See more details on using hashes here.

File details

Details for the file galaxdb_client-0.1.1-cp313-cp313-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for galaxdb_client-0.1.1-cp313-cp313-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 1c8bc70eaba04fb5d7e339b510ee62df261b88bb975c0ac55b4611a75cb9fd37
MD5 fe5335cd9d47ade9470c85d89cc2920a
BLAKE2b-256 7e7dad607b48d8cb50ae1f6aa7abe1d5b91818b37e2880320828338ce68b9fc6

See more details on using hashes here.

File details

Details for the file galaxdb_client-0.1.1-cp312-cp312-manylinux_2_35_x86_64.whl.

File metadata

File hashes

Hashes for galaxdb_client-0.1.1-cp312-cp312-manylinux_2_35_x86_64.whl
Algorithm Hash digest
SHA256 944295219d15bd297aefaaec8e92be6b7b4287f87e4d5248bc8a9c383b361468
MD5 426bb3af2f74d87a92175586e1321fc5
BLAKE2b-256 f2abeabf9f670c710a05f8f77effa6d2aab0fa867651289b7062673efd7cb092

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page