Skip to main content

Ontology-aligned middleware between agents and graph databases

Project description

SEOCHO

Ontology-aligned middleware between your agents and your graph database.

PyPI License: MIT Docs Ask DeepWiki Quickstart Examples

You declare the ontology. You call add() and ask(). SEOCHO keeps graph writes, semantic artifacts, and agent behavior aligned to that one schema contract across local SDK and runtime paths.

flowchart LR
    D["๐Ÿ“„ Your docs"] --> E["Extraction"]
    O{{"๐Ÿงฌ Ontology<br/>your schema"}} -.governs.-> E
    O -.governs.-> V
    E --> V["Validate<br/>+ readiness gate"]
    V --> G[("Graph<br/>LadybugDB / DozerDB")]
    G --> A["Ontology-grounded<br/>answers"]
    style O fill:#fef3c7,stroke:#f59e0b,stroke-width:2px

SEOCHO is a fit when:

  • you need extraction, Cypher generation, and answers to stay in-schema
  • you want one ontology to drive SDK, runtime, and graph contracts together
  • you need files, artifacts, and traces to stay visible instead of disappearing behind a managed memory black box

Start here:

If you want to... Go here
get a first local success path Quickstart
follow a runnable notebook walkthrough examples/quickstart.ipynb
understand SEOCHO with a guided beginner walkthrough Beginner Guide
see a runnable usecase demo Usecases
bring your own ontology and files Apply Your Data
use the Python SDK directly Python SDK Quickstart
declare graph-model-aware indexing in YAML Indexing Design Specs
inspect files, artifacts, and traces Files and Artifacts
understand the system design Architecture Deep Dive
present the product and architecture Overview Deep-Dive Deck

Quick Start

uv pip install "seocho[local]"       # zero-config local SDK, embedded LadybugDB by default
# or: uv pip install "seocho[embedded]" # minimal embedded graph path
from seocho import Seocho, Ontology, NodeDef, RelDef, Property

# 1. Define your schema
ontology = Ontology(
    name="my_domain",
    nodes={
        "Person":  NodeDef(properties={"name": Property(str, unique=True)}),
        "Company": NodeDef(properties={"name": Property(str, unique=True)}),
    },
    relationships={
        "WORKS_AT": RelDef(source="Person", target="Company"),
    },
)

# 2. Zero-config local client โ€” uses embedded LadybugDB, no server needed
s = Seocho.local(ontology)

# 3. Index
s.add("Marie Curie worked at the University of Paris.")

# 4. Query
print(s.ask("Where did Marie Curie work?"))

Remote runtime client:

from seocho import Seocho

client = Seocho.remote("http://localhost:8001")
print(client.ask("What do we know about ACME?"))

client.ask(...) above is the HTTP chat convenience surface. It is not the same execution engine as runtime client.react(...) or client.advanced(...).

Run the local platform stack:

make setup-env
make up

Install Paths

Path Install What else you need
HTTP client mode pip install seocho a running SEOCHO runtime (base_url=...)
Local SDK engine pip install "seocho[local]" provider credentials; Neo4j/DozerDB only if you pass a Bolt URI
Repository development pip install -e ".[dev]" local clone + test/tooling deps
Offline ontology governance pip install "seocho[ontology]" local ontology files only
  • pip install seocho is intentionally thin โ€” enough for HTTP client mode.
  • Seocho.local(ontology) defaults to embedded LadybugDB at .seocho/local.lbug.
  • DozerDB/Neo4j is the production graph path: pass graph="bolt://..." or construct Neo4jGraphStore(...) explicitly.
  • The fastest full local stack is make setup-env && make up.
  • examples/quickstart.ipynb reads provider keys from .env, stays on LadybugDB by default, and switches to Bolt-backed Neo4j/DozerDB only when both NEO4J_URI and NEO4J_PASSWORD are set.

Execution Surfaces

The same Seocho facade exposes different execution engines. This is the single most important thing to understand before benchmarking or comparing providers.

Surface Where it runs What it actually does Tool use
Seocho.local(...).ask(...) in-process local SDK ontology-aware local query + answer synthesis no runtime agent loop
Seocho(base_url=...).ask(...) HTTP runtime /api/chat memory/chat convenience endpoint not the explicit react/debate path
client.semantic(...) HTTP runtime deterministic semantic graph QA with optional bounded repair no agentic tool loop
client.react(...) HTTP runtime router agent path backed by the Agents runtime yes
client.advanced(...) / client.debate(...) HTTP runtime multi-agent debate with semantic preflight + supervisor synthesis yes

If you want provider-native reasoning and tool-use comparisons, use client.react(...) or client.advanced(...) against a running runtime. Do not use local ask() as that benchmark target.

Why SEOCHO

Built for graph-native teams that need a stronger contract between ontology, runtime, and agent behavior.

  • ontology-first, not prompt-first
  • graph-native, not vector-only
  • schemaless property graph plus agent-visible semantic overlay
  • governed artifacts, not ad hoc schema drift
  • local SDK authoring and runtime consumption on one contract

Architecture Overview

Two planes share one ontology:

  • Data Plane (seocho/index/) โ€” files โ†’ extraction โ†’ validation โ†’ graph write
  • Control Plane (seocho/query/) โ€” ontology โ†’ prompt strategy โ†’ Cypher โ†’ answer synthesis
  • Ontology (seocho/ontology.py) โ€” single source of truth for both planes, and for the runtime artifact contract

The Seocho class is a thin public facade. Canonical engine logic lives under seocho/local_engine.py, seocho/client_remote.py, and seocho/client_bundle.py so the facade stays small. Runtime transport is runtime/agent_server.py; shared runtime composition lives in runtime/server_runtime.py.

For the full story โ€” control plane vs data plane, internal orchestration seams (DomainEvent, IngestionFacade, QueryProxy, AgentFactory, AgentStateMachine), and the staged extraction/ โ†’ runtime/ migration โ€” see docs/ARCHITECTURE.md and docs/RUNTIME_PACKAGE_MIGRATION.md.

Choose Your Runtime Shape

Mode Constructor Best for
HTTP client Seocho(base_url="http://localhost:8001", workspace_id="default") consume an existing runtime over HTTP
Embedded local Seocho.local(ontology) serverless hello world, SDK authoring, experiments
Explicit local engine Seocho(ontology=..., graph_store=..., llm=...) direct graph-store control
Local platform runtime make up or seocho serve UI + API + DozerDB on one machine

Core parameters you will hit early:

  • base_url โ€” remote SEOCHO runtime root for HTTP client mode
  • workspace_id โ€” logical scope passed through runtime-facing requests
  • graph_store โ€” explicit graph store for local engine mode
  • reasoning_mode + repair_budget โ€” bounded semantic repair loop for hard questions
  • max_steps โ€” runtime agent turn limit for react / debate
  • tool_budget โ€” runtime tool-call budget for react / debate

For production local engine, Neo4jGraphStore works against both Neo4j and DozerDB over Bolt:

from seocho.store import Neo4jGraphStore

store = Neo4jGraphStore("bolt://localhost:7687", "neo4j", "password")

Common Use Cases

1. Consume an existing SEOCHO runtime over HTTP

from seocho import Seocho

client = Seocho(base_url="http://localhost:8001", workspace_id="default")
print(client.ask("What do we know about ACME?"))

Use ask() here as a convenience chat surface. When you need explicit runtime graph QA or agentic behavior, call client.semantic(...), client.react(...), or client.advanced(...) directly.

2. Build locally against your own ontology with no graph server

from seocho import Seocho, Ontology

client = Seocho.local(Ontology.from_jsonld("schema.jsonld"))
client.add("ACME acquired Beta in 2024.")
print(client.ask("Who did ACME acquire?", reasoning_mode=True, repair_budget=2))

3. Build locally against a production graph server

from seocho import Seocho, Ontology
from seocho.store import Neo4jGraphStore, OpenAIBackend

client = Seocho(
    ontology=Ontology.from_jsonld("schema.jsonld"),
    graph_store=Neo4jGraphStore("bolt://localhost:7687", "neo4j", "password"),
    llm=OpenAIBackend(model="gpt-4o-mini"),
    workspace_id="default",
)
client.add("ACME acquired Beta in 2024.")
print(client.ask("Who did ACME acquire?", reasoning_mode=True, repair_budget=2))

4. Promote the same ontology into runtime artifacts

artifacts = client.approved_artifacts_from_ontology()
prompt_context = client.prompt_context_from_ontology(
    instructions=["Prefer finance ontology labels and relationships."]
)
draft = client.artifact_draft_from_ontology(name="finance_core_v1")

5. Run the local platform stack with UI + API + graph DB

make setup-env
make up
  • UI: http://localhost:8501
  • API docs: http://localhost:8001/docs
  • DozerDB browser: http://localhost:7474

See docs/FILES_AND_ARTIFACTS.md for where schema.jsonld, graph data, rule profiles, semantic artifacts, and traces live.

What the Ontology Controls

Stage What happens
Extraction Entity types + relationships in LLM prompt
Querying Schema-aware Cypher generation and repair prompts
Validation SHACL shapes derived โ†’ catches type/cardinality errors
Constraints UNIQUE/INDEX generated from ontology, applied to Neo4j
Denormalization Cardinality rules determine safe flattening
Glossary SKOS-style vocabulary terms, aliases, and hidden labels compiled into the ontology context identity
Reasoning Optional low-quality retry re-extracts with ontology guidance
Runtime parity Same ontology can be converted into approved semantic artifacts and typed prompt context
Agent context Stable ontology context hash follows indexing, graph writes, query traces, and agent hand-off metadata

Local SDK writes persist compact _ontology_* graph properties on nodes and relationships. Queries and agent tools compare the active ontology context hash with hashes in the graph and surface any mismatch as ontology_context_mismatch in trace/tool metadata โ€” a guardrail that signals when a graph may need re-indexing under a new ontology profile.

Key Features

# Index a directory (supports .txt, .md, .csv, .json, .jsonl, .pdf)
s.index_directory("./my_data/")

# Category-aware extraction (8 filing-domain presets)
s.add(text, category="Financials")

# Query with reasoning mode
s.ask("question", reasoning_mode=True, repair_budget=2)

# Swappable LLM providers (OpenAI, DeepSeek, Kimi, Grok, Qwen)
from seocho.store import OpenAIBackend, DeepSeekBackend
llm = OpenAIBackend(model="gpt-4o-mini")

# Agent session โ€” context persists across add/ask within one session
with s.session("my_analysis") as sess:
    sess.add("ACME acquired Beta in 2024.")
    sess.add("Beta provides risk analytics to ACME.")
    answer = sess.ask("What does ACME own or use?")

# Schema as code (JSON-LD canonical storage + SHACL export)
ontology.to_jsonld("schema.jsonld")
ontology = Ontology.from_jsonld("schema.jsonld")

# Ontology merge + diff (for migration)
combined = finance_onto.merge(legal_onto)

For the rest โ€” experiment workbench, tracing backends, supervisor + hand-off config, offline governance CLI, multi-ontology per database โ€” see seocho.blog/sdk.

SDK Package Structure

seocho/
โ”œโ”€โ”€ index/              โ† Data Plane: putting data IN
โ”‚   โ”œโ”€โ”€ pipeline.py     โ† chunk โ†’ extract โ†’ validate โ†’ rule inference โ†’ write
โ”‚   โ”œโ”€โ”€ linker.py       โ† embedding-based entity relatedness
โ”‚   โ””โ”€โ”€ file_reader.py  โ† .txt/.md/.csv/.json/.jsonl/.pdf
โ”œโ”€โ”€ query/              โ† Control Plane: getting data OUT
โ”‚   โ”œโ”€โ”€ strategy.py     โ† ontology โ†’ LLM prompt generation (cached)
โ”‚   โ””โ”€โ”€ cypher_builder.py โ† deterministic Cypher from intent
โ”œโ”€โ”€ store/              โ† Storage backends
โ”‚   โ”œโ”€โ”€ graph.py        โ† Neo4j/DozerDB + LadybugDB
โ”‚   โ”œโ”€โ”€ vector.py       โ† FAISS / LanceDB
โ”‚   โ””โ”€โ”€ llm.py          โ† OpenAI, DeepSeek, Kimi, Grok, Qwen
โ”œโ”€โ”€ rules.py            โ† SHACL-like rule inference + validation
โ”œโ”€โ”€ ontology.py         โ† Schema: JSON-LD + SHACL + merge + migration
โ”œโ”€โ”€ session.py          โ† Agent session: context cache + hand-off
โ”œโ”€โ”€ agents.py           โ† IndexingAgent / QueryAgent / Supervisor
โ”œโ”€โ”€ local_engine.py     โ† Local-mode orchestration behind the SDK facade
โ”œโ”€โ”€ client_remote.py    โ† HTTP transport behind the facade
โ”œโ”€โ”€ client_bundle.py    โ† Runtime-bundle glue behind the facade
โ””โ”€โ”€ client.py           โ† Public SDK facade

Three Ways to Use

Python SDK

from seocho import Seocho, Ontology, NodeDef, P

CLI

seocho init                    # create ontology interactively
seocho index ./data/           # index files
seocho ask "your question"     # query
seocho status                  # graph stats

Jupyter Notebook

examples/quickstart.ipynb
examples/bring_your_data.ipynb
examples/finance-compliance/quickstart.py

LPG and RDF Support

# LPG (default) โ€” Cypher queries
onto = Ontology(name="finance", graph_model="lpg", ...)

# RDF โ€” n10s Cypher (DozerDB + neosemantics)
onto = Ontology(name="fibo", graph_model="rdf",
                namespace="https://spec.edmcouncil.org/fibo/", ...)

Documentation

Doc Description
seocho.blog Full documentation site
SDK Overview SDK features and quick start
Ontology Guide Schema design, JSON-LD, SHACL
API Reference Complete method reference
docs/USECASES.md Runnable usecase demos
docs/BEGINNER_GUIDE.md Guided first-run path with architecture snippets
docs/ARCHITECTURE.md System architecture
docs/presentations/SEOCHO_OVERVIEW_DEEP_DIVE.md Beginner-friendly architecture deck
docs/FILES_AND_ARTIFACTS.md Where ontology, rule, trace, and runtime files live
docs/BENCHMARKS.md Private finance corpus and GraphRAG-Bench evaluation tracks
docs/WORKFLOW.md Operational workflow
docs/ISSUE_TASK_SYSTEM.md Sprint/task governance
CONTRIBUTING.md How to contribute

Observability

Pluggable tracing backends selectable at runtime or via SEOCHO_TRACE_BACKEND:

  • none โ€” no tracing; smallest surface
  • console โ€” ephemeral stdout for local dev
  • jsonl โ€” canonical neutral trace artifact; file-based retention
  • opik โ€” optional exporter (hosted or self-hosted); SEOCHO_TRACE_OPIK_MODE=self_host for private infra

Sensitive workloads: prefer none or jsonl. Prompts, retrieval evidence, and metadata may appear in traces โ€” route remote exporters through your governance review. More detail at docs/FILES_AND_ARTIFACTS.md.

Server Mode (Platform Operators)

For the full platform with multi-agent debate, web UI, and Docker services:

make setup-env && make up
# UI: http://localhost:8501
# API: http://localhost:8001/docs
# DozerDB: http://localhost:7474

Default make up starts the core local stack: neo4j, extraction-service, evaluation-interface. The legacy semantic-service is opt-in:

docker compose --profile legacy-semantic up -d semantic-service

Scheduled Codex workflows skip cleanly when OPENAI_API_KEY / SEOCHO_GITHUB_APP_ID / SEOCHO_GITHUB_APP_PRIVATE_KEY are unset. Basic CI remains the required repository check surface.

See docs/QUICKSTART.md for the full server setup guide.

Contributing

git clone git@github.com:tteon/seocho.git && cd seocho
pip install -e ".[dev]"
scripts/pm/install-git-hooks.sh
python -m pytest seocho/tests/ -q

Pick a usecase to build around: docs/USECASES.md. Full guide in CONTRIBUTING.md.

License

MIT โ€” see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seocho-0.4.0.tar.gz (398.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seocho-0.4.0-py3-none-any.whl (453.1 kB view details)

Uploaded Python 3

File details

Details for the file seocho-0.4.0.tar.gz.

File metadata

  • Download URL: seocho-0.4.0.tar.gz
  • Upload date:
  • Size: 398.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for seocho-0.4.0.tar.gz
Algorithm Hash digest
SHA256 4dcfb0274d272ec945046b686d2a6b170deb56677117592c39f6858050cba34e
MD5 25f5a6fa542d90758467429b8a7e6382
BLAKE2b-256 31521fbecc210ec477ffc94ed3525d8a7b4e02294e4e552e5e3869df5e517669

See more details on using hashes here.

File details

Details for the file seocho-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: seocho-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 453.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for seocho-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 802b85975d642b12f887c6ef2169efb1e1201360ee65fb5850e32010789bdc6a
MD5 7f94208e696d64a408ed705ea9e9b852
BLAKE2b-256 9d118465434dedc68d0042224164618c03ced5ef56a89e8a92ab50cb72402ca3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page