Skip to main content

Python SDK for HyperBinder - a neurosymbolic database for AI applications

Project description

HyperBinder Python SDK

A Python client for HyperBinder — the compositional semantic database that combines vector search, graph traversal, and SQL-like queries with per-field encoding strategies.

Installation

pip install hybi

This installs the HTTP-only Python SDK — enough to talk to a running HyperBinder server.

Quick Start

from hybi import HyperBinder, RelationalTable, Field, Encoding
import pandas as pd

# Connect to a running HyperBinder server
hb = HyperBinder("http://localhost:8000")

# Sample data
df = pd.DataFrame({
    "id": ["1", "2", "3"],
    "category": ["AI", "Cloud", "Analytics"],
    "text": [
        "Artificial intelligence and machine learning solutions",
        "Cloud computing and infrastructure services",
        "Data analytics and business intelligence",
    ],
    "revenue": [5000000, 3000000, 2000000],
})

# Define a schema with per-field encoding
schema = RelationalTable(
    primary_key="id",
    columns={
        "id": Field(encoding=Encoding.EXACT),
        "category": Field(encoding=Encoding.EXACT),
        "text": Field(encoding=Encoding.SEMANTIC),
        "revenue": Field(encoding=Encoding.NUMERIC),
    },
)

# Ingest
result = hb.ingest(df, collection="companies", schema=schema, dim=384)
print(f"Ingested {result.rows_ingested} rows")

# Semantic search
results = hb.search("AI and machine learning", collection="companies", top_k=3)
for r in results:
    print(f"{r.data['text']}: {r.score:.3f}")

# SQL-like query
filtered = hb.select(
    collection="companies",
    where=[("revenue", ">", 2500000)],
    order_by=[("revenue", True)],
)
for row in filtered.rows:
    print(row)

# Hybrid query (semantic + filters)
results = hb.search(
    "cloud services",
    collection="companies",
    filters=[("revenue", ">", 2000000)],
    top_k=5,
)

Key Features

🎯 Per-Field Encoding Strategies

Unlike vector databases that encode entire documents into a single vector, HyperBinder lets you specify different encoding strategies for each field:

schema = RelationalTable(
    primary_key="product_id",
    columns={
        "product_id": Field(encoding=Encoding.EXACT),     # Hash-based exact match
        "category": Field(encoding=Encoding.EXACT),       # Categorical exact match
        "name": Field(encoding=Encoding.SEMANTIC),        # Embedding-based similarity
        "description": Field(encoding=Encoding.SEMANTIC), # Embedding-based similarity
        "price": Field(encoding=Encoding.NUMERIC),        # Numeric comparison
        "stock": Field(encoding=Encoding.NUMERIC),        # Numeric comparison
    },
)

This enables queries that blend matching types in one call:

# Find products semantically similar to "laptop computer"
# WHERE category exactly matches "Electronics" (not similar, exact)
# AND price is between 500 and 1500 (numeric range)
# AND stock > 0 (numeric comparison)
results = hb.search(
    "laptop computer",
    collection="products",
    filters=[
        ("category", "==", "Electronics"),
        ("price", ">=", 500),
        ("price", "<=", 1500),
        ("stock", ">", 0),
    ],
    top_k=10,
)
  • Exact match where you need it (IDs, categories)
  • Semantic search where you need it (descriptions, text)
  • Numeric comparisons where you need it (prices, counts)
  • All in one query, one database

📊 Hybrid Queries (Semantic + Structured)

Combine semantic search with SQL-like filters:

# Semantic search with exact filters
results = hb.search(
    "machine learning research",
    collection="papers",
    filters=[
        ("year", ">=", "2020"),
        ("citations", ">", 1000),
        ("peer_reviewed", "==", "true"),
    ],
    top_k=10,
)

# Pure SQL-like query
result = hb.select(
    collection="papers",
    where=[
        ("author", "==", "Vaswani"),
        ("year", ">=", "2017"),
    ],
    order_by=[("citations", True)],
    limit=10,
)

Supported operators: =, ==, !=, <>, >, >=, <, <=

Data Ingestion

Basic ingestion with a schema

Always define a schema with encoding types:

from hybi import HyperBinder, RelationalTable, Field, Encoding
import pandas as pd

hb = HyperBinder("http://localhost:8000")

df = pd.DataFrame({
    "id": ["1", "2", "3"],
    "name": ["Product A", "Product B", "Product C"],
    "category": ["Electronics", "Books", "Clothing"],
    "description": ["High-quality electronics", "Bestselling books", "Fashion items"],
    "price": [299.99, 19.99, 49.99],
})

schema = RelationalTable(
    primary_key="id",
    columns={
        "id": Field(encoding=Encoding.EXACT),
        "name": Field(encoding=Encoding.SEMANTIC),
        "category": Field(encoding=Encoding.EXACT),
        "description": Field(encoding=Encoding.SEMANTIC),
        "price": Field(encoding=Encoding.NUMERIC),
    },
)

result = hb.ingest(df, collection="products", schema=schema, dim=384)
print(f"Ingested {result.rows_ingested} rows")

Encoding types

Encoding Use for How it works Example fields
EXACT IDs, categories, tags Hash-based exact match id, status, category
SEMANTIC Text, descriptions, titles Embedding-based similarity title, description, content
NUMERIC Numbers, prices, counts Numeric comparison price, quantity, rating

Without a schema

If you don't provide a schema, HyperBinder will auto-detect encoding per column, but results may be suboptimal:

# Not recommended — auto-detection may not choose the optimal encoding
result = hb.ingest(df, collection="products", dim=384)

Searching

Semantic search

results = hb.search("laptop computers", collection="products", top_k=5)
for r in results:
    print(f"Score: {r.score:.3f}")
    print(f"Name:  {r.data['name']}")
    print(f"Desc:  {r.data['description']}")

Hybrid: semantic + filters

results = hb.search(
    "artificial intelligence",
    collection="products",
    filters=[
        ("category", "==", "Electronics"),
        ("price", ">=", 100),
        ("price", "<=", 500),
        ("in_stock", "==", "true"),
    ],
    top_k=10,
)

Pure SQL-like

result = hb.select(
    collection="products",
    columns=["name", "price", "category"],
    where=[
        ("category", "==", "Electronics"),
        ("price", ">", 200),
    ],
    order_by=[("price", True)],  # True = descending
    limit=10,
)
for row in result.rows:
    print(row)

Collection management

products = hb.collection("products")
if products.exists():
    print(f"Collection has {products.count()} rows")

stats = products.stats()
print(f"Columns:   {stats.columns}")
print(f"Dimension: {stats.dimension}")

for coll in hb.list_collections():
    print(f"{coll.name}: {coll.rows} rows")

# Delete all rows but keep the collection structure
<!-- FORWARD-LOOKING: Collection.truncate() fluent form ships with PR
     feat/namespace-row-counts. Until that lands on master, use the
     equivalent hb.truncate(collection="products") instead. Remove
     this comment once feat/namespace-row-counts is merged. -->
products.truncate()

# Delete the entire collection
products.delete()

Advanced features

Multi-hop graph traversal

results = hb.multihop(
    collection="knowledge_graph",
    start={"entity": "Albert Einstein"},
    hops=[("discovered", "theory"), ("influenced", "scientist")],
    top_k=10,
)

RAG context assembly

context = hb.get_context(
    "What are the latest AI developments?",
    collection="research_papers",
    top_k=5,
)

prompt = f"""Context: {context.text}

Question: What are the latest AI developments?
Answer:"""

Aggregations

result = hb.aggregate(
    collection="sales",
    group_by=["region", "product_type"],
    aggregations=[
        ("revenue", "sum", "total_revenue"),
        ("orders", "count", "order_count"),
        ("revenue", "avg", "avg_order"),
    ],
    order_by=["total_revenue"],
)

for group in result.groups:
    print(f"{group['region']}: ${group['total_revenue']:,.2f}")

Common issues

Search returns zero results

  • Make sure you ingested with a schema, not just the raw DataFrame.
  • Confirm the collection has rows: hb.collection("products").count().

Duplicate results after re-ingest

Clear the collection before re-ingesting:

hb.collection("products").truncate()  # keep schema, drop rows
# or
hb.collection("products").delete()    # drop everything

Quick reference

from hybi import HyperBinder, RelationalTable, Field, Encoding

hb = HyperBinder("http://localhost:8000")

# Schema
schema = RelationalTable(
    primary_key="id",
    columns={
        "id": Field(encoding=Encoding.EXACT),
        "text": Field(encoding=Encoding.SEMANTIC),
        "category": Field(encoding=Encoding.EXACT),
        "price": Field(encoding=Encoding.NUMERIC),
    },
)

# Ingest
hb.ingest(df, collection="data", schema=schema, dim=384)

# Search
results = hb.search("query", collection="data", top_k=10)

# Hybrid search
results = hb.search(
    "query",
    collection="data",
    filters=[("category", "==", "value"), ("price", ">", 100)],
    top_k=10,
)

# SQL-like
result = hb.select(collection="data", where=[...], order_by=[...])

# Collection management
hb.collection("data").exists()
hb.collection("data").count()
hb.collection("data").truncate()  # ships with feat/namespace-row-counts
hb.collection("data").delete()

Contributing

See the Contributing Guide for details.

License

MIT License — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hybi-0.1.1-py3-none-any.whl (329.4 kB view details)

Uploaded Python 3

File details

Details for the file hybi-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: hybi-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 329.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for hybi-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 96dc256022d74ae7d05a91cb0cccfd8cd561bf25f91a21540b20a39f7748e223
MD5 f9716aea2cecd9ca05dd835482c3ea4c
BLAKE2b-256 069ab9efb0a6bea0eab56acfaf360e62e756f6bd7ecee49b9ead24a434460fd2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page