Python SDK for HyperBinder - a neurosymbolic database for AI applications
Project description
HyperBinder Python SDK
A Python client for HyperBinder — the compositional semantic database that combines vector search, graph traversal, and SQL-like queries with per-field encoding strategies.
Installation
pip install hybi
This installs the HTTP-only Python SDK — enough to talk to a running HyperBinder server.
Quick Start
from hybi import HyperBinder, RelationalTable, Field, Encoding
import pandas as pd
# Connect to a running HyperBinder server
hb = HyperBinder("http://localhost:8000")
# Sample data
df = pd.DataFrame({
"id": ["1", "2", "3"],
"category": ["AI", "Cloud", "Analytics"],
"text": [
"Artificial intelligence and machine learning solutions",
"Cloud computing and infrastructure services",
"Data analytics and business intelligence",
],
"revenue": [5000000, 3000000, 2000000],
})
# Define a schema with per-field encoding
schema = RelationalTable(
primary_key="id",
columns={
"id": Field(encoding=Encoding.EXACT),
"category": Field(encoding=Encoding.EXACT),
"text": Field(encoding=Encoding.SEMANTIC),
"revenue": Field(encoding=Encoding.NUMERIC),
},
)
# Ingest
result = hb.ingest(df, collection="companies", schema=schema, dim=384)
print(f"Ingested {result.rows_ingested} rows")
# Semantic search
results = hb.search("AI and machine learning", collection="companies", top_k=3)
for r in results:
print(f"{r.data['text']}: {r.score:.3f}")
# SQL-like query
filtered = hb.select(
collection="companies",
where=[("revenue", ">", 2500000)],
order_by=[("revenue", True)],
)
for row in filtered.rows:
print(row)
# Hybrid query (semantic + filters)
results = hb.search(
"cloud services",
collection="companies",
filters=[("revenue", ">", 2000000)],
top_k=5,
)
Key Features
🎯 Per-Field Encoding Strategies
Unlike vector databases that encode entire documents into a single vector, HyperBinder lets you specify different encoding strategies for each field:
schema = RelationalTable(
primary_key="product_id",
columns={
"product_id": Field(encoding=Encoding.EXACT), # Hash-based exact match
"category": Field(encoding=Encoding.EXACT), # Categorical exact match
"name": Field(encoding=Encoding.SEMANTIC), # Embedding-based similarity
"description": Field(encoding=Encoding.SEMANTIC), # Embedding-based similarity
"price": Field(encoding=Encoding.NUMERIC), # Numeric comparison
"stock": Field(encoding=Encoding.NUMERIC), # Numeric comparison
},
)
This enables queries that blend matching types in one call:
# Find products semantically similar to "laptop computer"
# WHERE category exactly matches "Electronics" (not similar, exact)
# AND price is between 500 and 1500 (numeric range)
# AND stock > 0 (numeric comparison)
results = hb.search(
"laptop computer",
collection="products",
filters=[
("category", "==", "Electronics"),
("price", ">=", 500),
("price", "<=", 1500),
("stock", ">", 0),
],
top_k=10,
)
- Exact match where you need it (IDs, categories)
- Semantic search where you need it (descriptions, text)
- Numeric comparisons where you need it (prices, counts)
- All in one query, one database
📊 Hybrid Queries (Semantic + Structured)
Combine semantic search with SQL-like filters:
# Semantic search with exact filters
results = hb.search(
"machine learning research",
collection="papers",
filters=[
("year", ">=", "2020"),
("citations", ">", 1000),
("peer_reviewed", "==", "true"),
],
top_k=10,
)
# Pure SQL-like query
result = hb.select(
collection="papers",
where=[
("author", "==", "Vaswani"),
("year", ">=", "2017"),
],
order_by=[("citations", True)],
limit=10,
)
Supported operators: =, ==, !=, <>, >, >=, <, <=
Data Ingestion
Basic ingestion with a schema
Always define a schema with encoding types:
from hybi import HyperBinder, RelationalTable, Field, Encoding
import pandas as pd
hb = HyperBinder("http://localhost:8000")
df = pd.DataFrame({
"id": ["1", "2", "3"],
"name": ["Product A", "Product B", "Product C"],
"category": ["Electronics", "Books", "Clothing"],
"description": ["High-quality electronics", "Bestselling books", "Fashion items"],
"price": [299.99, 19.99, 49.99],
})
schema = RelationalTable(
primary_key="id",
columns={
"id": Field(encoding=Encoding.EXACT),
"name": Field(encoding=Encoding.SEMANTIC),
"category": Field(encoding=Encoding.EXACT),
"description": Field(encoding=Encoding.SEMANTIC),
"price": Field(encoding=Encoding.NUMERIC),
},
)
result = hb.ingest(df, collection="products", schema=schema, dim=384)
print(f"Ingested {result.rows_ingested} rows")
Encoding types
| Encoding | Use for | How it works | Example fields |
|---|---|---|---|
EXACT |
IDs, categories, tags | Hash-based exact match | id, status, category |
SEMANTIC |
Text, descriptions, titles | Embedding-based similarity | title, description, content |
NUMERIC |
Numbers, prices, counts | Numeric comparison | price, quantity, rating |
Without a schema
If you don't provide a schema, HyperBinder will auto-detect encoding per column, but results may be suboptimal:
# Not recommended — auto-detection may not choose the optimal encoding
result = hb.ingest(df, collection="products", dim=384)
Searching
Semantic search
results = hb.search("laptop computers", collection="products", top_k=5)
for r in results:
print(f"Score: {r.score:.3f}")
print(f"Name: {r.data['name']}")
print(f"Desc: {r.data['description']}")
Hybrid: semantic + filters
results = hb.search(
"artificial intelligence",
collection="products",
filters=[
("category", "==", "Electronics"),
("price", ">=", 100),
("price", "<=", 500),
("in_stock", "==", "true"),
],
top_k=10,
)
Pure SQL-like
result = hb.select(
collection="products",
columns=["name", "price", "category"],
where=[
("category", "==", "Electronics"),
("price", ">", 200),
],
order_by=[("price", True)], # True = descending
limit=10,
)
for row in result.rows:
print(row)
Collection management
products = hb.collection("products")
if products.exists():
print(f"Collection has {products.count()} rows")
stats = products.stats()
print(f"Columns: {stats.columns}")
print(f"Dimension: {stats.dimension}")
for coll in hb.list_collections():
print(f"{coll.name}: {coll.rows} rows")
# Delete all rows but keep the collection structure
<!-- FORWARD-LOOKING: Collection.truncate() fluent form ships with PR
feat/namespace-row-counts. Until that lands on master, use the
equivalent hb.truncate(collection="products") instead. Remove
this comment once feat/namespace-row-counts is merged. -->
products.truncate()
# Delete the entire collection
products.delete()
Advanced features
Multi-hop graph traversal
results = hb.multihop(
collection="knowledge_graph",
start={"entity": "Albert Einstein"},
hops=[("discovered", "theory"), ("influenced", "scientist")],
top_k=10,
)
RAG context assembly
context = hb.get_context(
"What are the latest AI developments?",
collection="research_papers",
top_k=5,
)
prompt = f"""Context: {context.text}
Question: What are the latest AI developments?
Answer:"""
Aggregations
result = hb.aggregate(
collection="sales",
group_by=["region", "product_type"],
aggregations=[
("revenue", "sum", "total_revenue"),
("orders", "count", "order_count"),
("revenue", "avg", "avg_order"),
],
order_by=["total_revenue"],
)
for group in result.groups:
print(f"{group['region']}: ${group['total_revenue']:,.2f}")
Common issues
Search returns zero results
- Make sure you ingested with a schema, not just the raw DataFrame.
- Confirm the collection has rows:
hb.collection("products").count().
Duplicate results after re-ingest
Clear the collection before re-ingesting:
hb.collection("products").truncate() # keep schema, drop rows
# or
hb.collection("products").delete() # drop everything
Quick reference
from hybi import HyperBinder, RelationalTable, Field, Encoding
hb = HyperBinder("http://localhost:8000")
# Schema
schema = RelationalTable(
primary_key="id",
columns={
"id": Field(encoding=Encoding.EXACT),
"text": Field(encoding=Encoding.SEMANTIC),
"category": Field(encoding=Encoding.EXACT),
"price": Field(encoding=Encoding.NUMERIC),
},
)
# Ingest
hb.ingest(df, collection="data", schema=schema, dim=384)
# Search
results = hb.search("query", collection="data", top_k=10)
# Hybrid search
results = hb.search(
"query",
collection="data",
filters=[("category", "==", "value"), ("price", ">", 100)],
top_k=10,
)
# SQL-like
result = hb.select(collection="data", where=[...], order_by=[...])
# Collection management
hb.collection("data").exists()
hb.collection("data").count()
hb.collection("data").truncate() # ships with feat/namespace-row-counts
hb.collection("data").delete()
Contributing
See the Contributing Guide for details.
License
MIT License — see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hybi-0.1.1-py3-none-any.whl.
File metadata
- Download URL: hybi-0.1.1-py3-none-any.whl
- Upload date:
- Size: 329.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
96dc256022d74ae7d05a91cb0cccfd8cd561bf25f91a21540b20a39f7748e223
|
|
| MD5 |
f9716aea2cecd9ca05dd835482c3ea4c
|
|
| BLAKE2b-256 |
069ab9efb0a6bea0eab56acfaf360e62e756f6bd7ecee49b9ead24a434460fd2
|