Unified SQLAlchemy helpers for Apache AGE + pgvector + BM25/FTS + hybrid search.

Project description

`age_search`

A unified SQLAlchemy extension that combines:

Apache AGE → graph traversal (Cypher)
pgvector → semantic vector search (cosine, HNSW / IVFFLAT)
Postgres FTS → built-in full-text search
BM25 (pg_search / ParadeDB) → high-quality lexical ranking
Hybrid search → lexical + semantic fusion
Graph-constrained search → search + expand via graph topology

All inside one Postgres database, one SQLAlchemy session, one transaction model.

This package does not try to pretend graphs are tables. Instead, it gives you clean primitives that compose.

Why this exists

Most systems need all of the following at once:

semantic similarity (embeddings)
keyword relevance (BM25 / FTS)
graph structure (relationships, hops, communities)
transactional consistency
deployability (migrations, pooling, ORM)

Postgres already supports all of this — but the integration story is painful.

This package provides:

safe engine/session setup
sane defaults
index + migration helpers
ORM-friendly APIs
zero magic that fights SQLAlchemy internals

Core design principles

Relational tables own the data
AGE owns topology
Vectors stay in tables
Graph nodes reference table primary keys
Hybrid search is explicit and debuggable
Everything works under normal SQLAlchemy pooling

Installation

pip install -e .

Dependencies:

Python ≥ 3.10
SQLAlchemy ≥ 2.0
psycopg3
pgvector
Apache AGE installed server-side
Optional: pg_search (BM25)

Engine setup (IMPORTANT)

AGE requires per-connection initialization.

Always create your engine using:

from age_search import create_engine_all_in_one

engine = create_engine_all_in_one(
    DATABASE_URL,
    graph_name="knowledge_graph",
)

This automatically:

registers pgvector adapters
runs LOAD 'age'
sets search_path = ag_catalog, public
is safe under connection pooling

Canonical `Doc` model

This is the reference model used throughout the examples.

from sqlalchemy.orm import Mapped, mapped_column
from sqlalchemy import Integer, Text

from age_search import (
    Base,
    GraphNodeMixin,
    VectorMixin,
    FTSSearchMixin,
    BM25SearchMixin,
    GraphRelationship,
)

class Doc(
    Base,
    GraphNodeMixin,
    VectorMixin,
    FTSSearchMixin,
    BM25SearchMixin,
):
    __tablename__ = "docs"

    id: Mapped[int] = mapped_column(Integer, primary_key=True)
    content: Mapped[str] = mapped_column(Text, nullable=False)

    # Graph configuration
    graph_label = "Doc"
    graph_id_field = "id"
    vertex_property_key = "id"

    # Vector configuration
    vector_dim = 1536   # cosine by default

    # FTS
    fts_config = "english"

    # BM25
    bm25_key_field = "id"
    bm25_default_field = "content"

    # Graph relationships
    related = GraphRelationship("RELATED_TO", target_label="Doc")

Database initialization (one-time)

Create extensions, graph, and indexes with a single call.

`init_db.py`

from sqlalchemy import create_engine
from age_search.migrations import install_all, InstallSpec
from models.doc import Doc

engine = create_engine(DATABASE_URL)

install_all(
    engine,
    models=[Doc],
    spec=InstallSpec(
        graph_name="knowledge_graph",
        enable_fts=True,
        enable_bm25=True,        # requires pg_search extension
        vector_index="hnsw",     # or "ivfflat"
        analyze_after=True,
    ),
)

print("Database initialized.")

This creates:

age, vector, (optional) pg_search extensions
AGE graph
FTS GIN index
BM25 index
pgvector cosine index (HNSW or IVFFLAT)
runs ANALYZE

Optional: auto-sync graph vertices

Keep AGE graph nodes in sync with ORM rows automatically.

from age_search.hooks import install_graph_sync
from models.doc import Doc

install_graph_sync(Doc)

Behavior:

insert/update → MERGE (Doc {id})
delete → DETACH DELETE

Safe under normal ORM usage.

Writing data

doc1 = Doc(content="Graph neural networks for fraud detection", embedding=vec1)
doc2 = Doc(content="Vector databases and hybrid search", embedding=vec2)

session.add_all([doc1, doc2])
session.commit()

If graph sync is enabled, vertices are created automatically.

Graph operations (AGE)

Create relationships

doc1.related.add(session, doc2)
session.commit()

Traverse neighbors

neighbors = doc1.related(session).limit(10).all()

Returns JSON-decoded AGE nodes, not ORM objects (by design).

Vector search (pgvector)

Cosine similarity is the default.

hits = Doc.vector_search(
    session,
    query_vec,
    k=20,
    distance="cosine",
)

Uses:

embedding <-> query_vec
HNSW or IVFFLAT index automatically

Full-text search (Postgres FTS)

hits = Doc.fts_search(
    session,
    "graph neural networks",
    k=20,
)

Uses:

tsvector
websearch_to_tsquery
GIN index

BM25 search (pg_search / ParadeDB)

rows = Doc.bm25_search(
    session,
    "graph neural networks",
    k=20,
    with_snippet=True,
)

Each row contains:

id
BM25 score
optional snippet

To return ORM objects:

docs = Doc.bm25_search_objects(session, "graph neural networks")

Hybrid search (lexical + semantic)

Simple hybrid (RRF)

from age_search import hybrid_search

results = hybrid_search(
    session,
    Doc,
    query_text="graph neural networks",
    query_vec=query_embedding,
    prefer_bm25=True,
)

This:

runs BM25 (or FTS fallback)
runs vector search
fuses ranks via Reciprocal Rank Fusion
returns ORM objects in fused order

Typed hybrid results (scores + metadata)

from age_search.hybrid2 import hybrid_search_results

results = hybrid_search_results(
    session,
    Doc,
    query_text="graph neural networks",
    query_vec=query_embedding,
)

for r in results:
    print(
        r.id,
        r.rrf_score,
        r.bm25_score,
        r.semantic_rank,
        r.snippet,
    )

This is what you want for:

debugging
evals
ranking analysis
explainability

Graph-constrained search

Expand after search

from age_search import graph_expand_ids

seed_ids = [r.id for r in results]

expanded_ids = graph_expand_ids(
    session,
    graph_name="knowledge_graph",
    label="Doc",
    seed_ids=seed_ids,
    edge="RELATED_TO",
    hops=2,
)

You can then:

re-rank
fetch objects
or run another hybrid search inside this subset

Hierarchical labels (taxonomy)

You typically want two layers:

Relational taxonomy tables (source of truth): fast filtering, constraints, auditing
AGE mirror (optional): traversal/reasoning (PARENT_OF, HAS_LABEL)

Relational taxonomy

Use the built-in Label model (adjacency list via parent_id):

from age_search import Base
from age_search.taxonomy import Label

For a document↔label join table, create it explicitly so your doc table name can be anything:

from age_search.taxonomy import make_doc_labels_table

doc_labels = make_doc_labels_table(Base.metadata, doc_table="docs")

To expand a subtree in pure SQL (recursive CTE):

from age_search.taxonomy import descendant_label_ids

ids = descendant_label_ids(session, root_label_id=42)

AGE mirror (optional)

Mirror taxonomy into AGE:

(:Label {id, slug, name})
(:Label)-[:PARENT_OF]->(:Label)
(:Doc)-[:HAS_LABEL]->(:Label)

Then you can do graph-constrained hybrid search in one call:

from age_search import hybrid_search_results_in_label_subtree

results = hybrid_search_results_in_label_subtree(
    session,
    Doc,
    graph_name="knowledge_graph",
    root_label_id=42,
    query_text="graph neural networks",
    query_vec=query_embedding,
)

Weighted edges

You can attach properties (including a numeric weight) when creating a relationship:

# adds/updates relationship properties on the AGE edge
doc1.related.add(session, doc2, weight=0.8, props={"source": "cooccur"})
session.commit()

Community detection helpers (connected components)

For a simple baseline "community" definition, you can compute connected components from an AGE edge list:

from age_search.community import graph_connected_components

communities = graph_connected_components(
    session,
    graph_name="knowledge_graph",
    label="Doc",
    edge="RELATED_TO",
)

Benchmark + eval harness

There’s a lightweight, dependency-free eval module (age_search.eval) with common IR metrics. You provide EvalCase objects and a search(case) -> ranked_ids function:

from age_search.eval import EvalCase, evaluate

cases = [
    EvalCase(name="q1", relevant_ids={1, 2, 3}),
]

report = evaluate(cases, search=lambda c: [1, 9, 2, 8], benchmark=True)
print(report)

Development + release notes

CI runs ruff + pytest on PRs.
Releases: use GitHub Actions → workflow Publish Python distribution to PyPI (manual dispatch) to bump version, tag, build, and publish.

Index strategies (cosine)

HNSW (default)

best recall/latency
heavier build
good general default

IVFFLAT

faster build
smaller
requires ANALYZE
tune with:

SET ivfflat.probes = 10;

Switch via:

InstallSpec(vector_index="ivfflat")

CLI (optional)

agegraph doctor
agegraph init --bm25 --vector-index hnsw
agegraph index --models-module your_app.models

Useful for:

ops
CI
smoke tests

Mental model summary

Layer	Technology	Role
Tables	SQLAlchemy	source of truth
Vectors	pgvector	semantic similarity
Lexical	FTS / BM25	keyword relevance
Graph	Apache AGE	topology
Fusion	RRF	hybrid ranking
Transactions	Postgres	consistency

Nothing is hidden. Everything composes.

What this is good for

RAG systems
knowledge graphs
recommendation engines
fraud / AML
search + reasoning
graph-aware retrieval
eval pipelines

What this deliberately does NOT do

pretend graphs are tables
auto-load graph neighbors via lazy ORM relationships
hide Cypher behind magic joins
force a specific embedding model
lock you into one search strategy

Project details

Release history Release notifications | RSS feed

This version

0.1.0

Jan 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

age_search-0.1.0.tar.gz (74.5 kB view details)

Uploaded Jan 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

age_search-0.1.0-py3-none-any.whl (30.2 kB view details)

Uploaded Jan 12, 2026 Python 3

File details

Details for the file age_search-0.1.0.tar.gz.

File metadata

Download URL: age_search-0.1.0.tar.gz
Upload date: Jan 12, 2026
Size: 74.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for age_search-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`da72652a6f317817b2ff2253331c9df693d5e85d9f45029d2eff96a75b604a7f`
MD5	`40386a91bb5f76b55ce918d0eef83ab6`
BLAKE2b-256	`76c2c205b0cb1e564e7776bcaf5320b2aaa9cd4d2411f08b95f128c702f85809`

See more details on using hashes here.

Provenance

The following attestation bundles were made for age_search-0.1.0.tar.gz:

Publisher: release.yml on webcoderz/age-search

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: age_search-0.1.0.tar.gz
- Subject digest: da72652a6f317817b2ff2253331c9df693d5e85d9f45029d2eff96a75b604a7f
- Sigstore transparency entry: 815215815
- Sigstore integration time: Jan 12, 2026
Source repository:
- Permalink: webcoderz/age-search@e2eef2d30b0d108e35be98a05c27f2eca783e5bc
- Branch / Tag: refs/heads/main
- Owner: https://github.com/webcoderz
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@e2eef2d30b0d108e35be98a05c27f2eca783e5bc
- Trigger Event: workflow_dispatch

File details

Details for the file age_search-0.1.0-py3-none-any.whl.

File metadata

Download URL: age_search-0.1.0-py3-none-any.whl
Upload date: Jan 12, 2026
Size: 30.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for age_search-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`68a2d30adb34750c58d5d2f9527c5ae8e39986849db59445a7e41054a874c768`
MD5	`a4184bd66c2b48833678162bf6fa56dd`
BLAKE2b-256	`38f746054e34f8cab5b20eddb2ffd7a436d54034658c47f38b1d250cf89b07c8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for age_search-0.1.0-py3-none-any.whl:

Publisher: release.yml on webcoderz/age-search

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: age_search-0.1.0-py3-none-any.whl
- Subject digest: 68a2d30adb34750c58d5d2f9527c5ae8e39986849db59445a7e41054a874c768
- Sigstore transparency entry: 815215817
- Sigstore integration time: Jan 12, 2026
Source repository:
- Permalink: webcoderz/age-search@e2eef2d30b0d108e35be98a05c27f2eca783e5bc
- Branch / Tag: refs/heads/main
- Owner: https://github.com/webcoderz
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yml@e2eef2d30b0d108e35be98a05c27f2eca783e5bc
- Trigger Event: workflow_dispatch

age-search 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

age_search

Why this exists

Core design principles

Installation

Engine setup (IMPORTANT)

Canonical Doc model

Database initialization (one-time)

init_db.py

Optional: auto-sync graph vertices

Writing data

Graph operations (AGE)

Create relationships

Traverse neighbors

Vector search (pgvector)

Full-text search (Postgres FTS)

BM25 search (pg_search / ParadeDB)

Hybrid search (lexical + semantic)

Simple hybrid (RRF)

Typed hybrid results (scores + metadata)

Graph-constrained search

Expand after search

Hierarchical labels (taxonomy)

Relational taxonomy

AGE mirror (optional)

Weighted edges

Community detection helpers (connected components)

Benchmark + eval harness

Development + release notes

Index strategies (cosine)

HNSW (default)

IVFFLAT

CLI (optional)

Mental model summary

What this is good for

What this deliberately does NOT do

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`age_search`

Canonical `Doc` model

`init_db.py`