Skip to main content

Developer-friendly Python SDK for the SEOCHO graph-memory runtime

Project description

SEOCHO

Ontology-driven knowledge graph library for Python

PyPI Tests License: MIT

Define your schema once — it drives extraction, querying, validation, and graph-governance artifacts from one contract.

Install

pip install seocho

Optional offline ontology governance tooling:

pip install "seocho[ontology]"

Quick Start

from seocho import Seocho, Ontology, NodeDef, RelDef, P
from seocho.store import Neo4jGraphStore, OpenAIBackend

# 1. Define your schema
ontology = Ontology(
    name="my_domain",
    package_id="org.example.my_domain",
    nodes={
        "Person":  NodeDef(properties={"name": P(str, unique=True)}),
        "Company": NodeDef(properties={"name": P(str, unique=True)}),
    },
    relationships={
        "WORKS_AT": RelDef(source="Person", target="Company"),
    },
)

# 2. Connect
s = Seocho(
    ontology=ontology,
    graph_store=Neo4jGraphStore("bolt://localhost:7687", "neo4j", "password"),
    llm=OpenAIBackend(model="gpt-4o"),
)

# 3. Index
s.add("Marie Curie worked at the University of Paris.")

# 4. Query
print(s.ask("Where did Marie Curie work?"))

What the Ontology Controls

Stage What happens
Extraction Entity types + relationships in LLM prompt
Querying Schema-aware Cypher generation and repair prompts
Validation SHACL shapes derived → catches type/cardinality errors
Constraints UNIQUE/INDEX generated from ontology and can be applied to Neo4j
Denormalization Cardinality rules determine safe flattening
Reasoning Optional low-quality retry re-extracts with ontology guidance

Key Features

# Index files from a directory
s.index_directory("./my_data/")         # .txt, .md, .csv, .json, .jsonl, .pdf

# Category-specific extraction (auto-selects prompt)
s.add(text, category="Financials")      # 8 FinDER domain presets

# Query with reasoning mode
s.ask("question", reasoning_mode=True, repair_budget=2)

# Multiple LLM providers
from seocho.store import OpenAIBackend
llm = OpenAIBackend(model="gpt-4o-mini")                              # OpenAI
llm = OpenAIBackend(model="deepseek-chat", base_url="https://api.deepseek.com/v1")  # DeepSeek

# Multi-ontology per database
s.register_ontology("finance_db", finance_ontology)

# Schema as code (JSON-LD canonical storage)
ontology.to_jsonld("schema.jsonld")
ontology = Ontology.from_jsonld("schema.jsonld")

# Apply generated Neo4j constraints explicitly in local mode
s.ensure_constraints(database="neo4j")

# Offline ontology governance helpers
# seocho ontology check --schema schema.jsonld
# seocho ontology export --schema schema.jsonld --format shacl --output shacl.json
# seocho ontology diff --left schema_v1.jsonld --right schema_v2.jsonld
# diff output now includes package_id, recommended version bump, and migration warnings

# Experiment workbench
from seocho.experiment import Workbench
wb = Workbench(input_texts=["text..."])
wb.vary("ontology", ["v1.jsonld", "v2.jsonld"])
wb.vary("model", ["gpt-4o", "gpt-4o-mini"])
results = wb.run_all()
print(results.leaderboard())

# Pluggable tracing
from seocho import enable_tracing, configure_tracing_from_env
enable_tracing(backend="none")          # disable tracing explicitly
enable_tracing(backend="console")       # stdout only
enable_tracing(backend="jsonl")         # canonical neutral trace artifact
enable_tracing(backend="opik")          # optional exporter (hosted or self-hosted)
configure_tracing_from_env()            # SEOCHO_TRACE_BACKEND=none|console|jsonl|opik

# Agent design configuration
from seocho import AgentConfig, AGENT_PRESETS
s = Seocho(ontology=onto, ..., agent_config=AGENT_PRESETS["strict"])

# Agent-level session (context persists across operations)
with s.session("my_analysis") as sess:
    sess.add("Samsung CEO Jay Y. Lee reported $234B revenue.")
    sess.add("Apple CEO Tim Cook reported $383B revenue.")
    answer = sess.ask("Compare Samsung and Apple revenue")
    # → structured entity context passed to QueryAgent

# Supervisor with sub-agent hand-off (explicit opt-in)
from seocho import RoutingPolicy
s = Seocho(ontology=onto, ..., agent_config=AgentConfig(
    execution_mode="supervisor", handoff=True,
    routing_policy=RoutingPolicy(latency=0.1, token_efficiency=0.3, information_quality=0.6),
))
with s.session("auto") as sess:
    sess.run("Samsung CEO is Jay Y. Lee")    # → IndexingAgent
    sess.run("Who is Samsung's CEO?")        # → QueryAgent

# Ontology merge (combine two schemas)
finance = Ontology.from_jsonld("finance.jsonld")
legal = Ontology.from_jsonld("legal.jsonld")
combined = finance.merge(legal)  # union of nodes + relationships
combined.to_jsonld("combined.jsonld")

SDK Package Structure

seocho/
├── index/           ← Data Plane: putting data IN
│   ├── pipeline.py  ← chunk → extract → validate → write
│   └── file_reader.py ← .txt/.md/.csv/.json/.jsonl/.pdf
├── query/           ← Control Plane: getting data OUT
│   ├── strategy.py  ← ontology → LLM prompt generation
│   └── cypher_builder.py ← deterministic Cypher from intent
├── store/           ← Storage backends
│   ├── graph.py     ← Neo4j/DozerDB
│   ├── vector.py    ← FAISS / LanceDB
│   └── llm.py       ← OpenAI, DeepSeek, Kimi, Grok
├── ontology.py      ← Schema: JSON-LD + SHACL + denormalization + merge
├── session.py       ← Agent session: context cache + hand-off
├── agents.py        ← IndexingAgent / QueryAgent / Supervisor
├── tools.py         ← @function_tool definitions for agents
├── agent_config.py  ← AgentConfig, RoutingPolicy, presets
├── experiment.py    ← Workbench for parameter exploration
├── tracing.py       ← Pluggable observability
└── client.py        ← Seocho unified interface

Three Ways to Use

Python SDK (developers)

from seocho import Seocho, Ontology, NodeDef, P

CLI (no code needed)

seocho init                    # create ontology interactively
seocho index ./data/           # index files
seocho ask "your question"     # query
seocho status                  # graph stats
seocho experiment --input ...  # parameter exploration

Jupyter Notebook (data analysts)

examples/quickstart.ipynb
examples/bring_your_data.ipynb

LPG and RDF Support

# LPG mode (default) — Cypher queries
onto = Ontology(name="finance", graph_model="lpg", ...)

# RDF mode — n10s Cypher (DozerDB + neosemantics)
onto = Ontology(name="fibo", graph_model="rdf",
                namespace="https://spec.edmcouncil.org/fibo/", ...)

Documentation

Doc Description
seocho.blog Full documentation site
SDK Overview SDK features and quick start
Ontology Guide Schema design, JSON-LD, SHACL
API Reference Complete method reference
Examples Real-world patterns
CONTRIBUTING.md How to contribute
docs/ARCHITECTURE.md System architecture
docs/WORKFLOW.md Operational workflow
docs/ISSUE_TASK_SYSTEM.md Sprint/task governance

Observability Modes

  • none: no tracing; smallest surface and lowest data retention risk.
  • console: ephemeral stdout debugging for local development.
  • jsonl: canonical neutral trace artifact for local files, replay, and vendor-neutral retention.
  • opik: optional exporter/backend for hosted or self-hosted team observability.

Recommended defaults:

  • sensitive data or simple local usage: none or jsonl
  • team debugging and evaluation: jsonl + opik
  • private infra: self-hosted Opik with SEOCHO_TRACE_OPIK_MODE=self_host

Retention and privacy guidance:

  • JSONL retention follows your filesystem policy; rotate or delete trace files explicitly.
  • Opik retention follows the target Opik deployment policy, whether hosted or self-hosted.
  • prompts, retrieval evidence, and metadata may appear in traces; avoid remote exporters for sensitive workloads unless governance is approved.

Server Mode (Platform Operators)

For the full platform with multi-agent debate, web UI, and Docker services:

make setup-env && make up
# UI: http://localhost:8501
# API: http://localhost:8001/docs
# DozerDB: http://localhost:7474

See docs/QUICKSTART.md for the full server setup guide.

Contributing

git clone git@github.com:tteon/seocho.git && cd seocho
pip install -e ".[dev]"
python -m pytest seocho/tests/ -q

See CONTRIBUTING.md for the full guide.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seocho-0.3.0.tar.gz (160.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

seocho-0.3.0-py3-none-any.whl (175.8 kB view details)

Uploaded Python 3

File details

Details for the file seocho-0.3.0.tar.gz.

File metadata

  • Download URL: seocho-0.3.0.tar.gz
  • Upload date:
  • Size: 160.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for seocho-0.3.0.tar.gz
Algorithm Hash digest
SHA256 a86156b520d8cc1b4624f1a64df7e56f383fd3bdab1cb929d1b3d60c32afefd5
MD5 160a7a4136658d5b7c7cb07ad122e01d
BLAKE2b-256 f7c24c96e4c9e738703b922f9e4e5d8a5b8f1e0fac505e8a777ec57bd237159c

See more details on using hashes here.

File details

Details for the file seocho-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: seocho-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 175.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for seocho-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 89c9d44ba78b8919997a702571d8addf450e20310d9e1257455bbb2e7b7d3ce5
MD5 b4be31fb6c9a184d7ef23edf3c87eeff
BLAKE2b-256 46f9640f14038901b498321a6115841c04e9c4da5b92baf39d29eb94a50be748

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page