Graph-like planning → context fetching → synthesis agent (library-style).
Project description
fetchgraph
Universal, library-style agent that plans what to fetch, fetches context from pluggable providers, and synthesizes an output.
Pipeline: PLAN → FETCH → (ASSESS/REFETCH)* → SYNTH → VERIFY → (REFINE)* → SAVE
Why fetchgraph?
fetchgraph is a library-style LLM agent orchestrator.
You bring:
- your LLM (OpenAI, local, whatever),
- your data providers (DBs, APIs, files),
and fetchgraph handles:
- planning what context to fetch,
- calling providers with JSON selectors,
- packing context into the prompt,
- verifying / refining the result.
Features
- JSON-only selectors with JSON Schema hints for planners
- Pluggable context providers (APIs, relational sources, etc.)
- Relational providers with semantic clauses
- CSV semantic backend (TF-IDF) for pandas providers
- pgvector / LangChain vector store integration
- Library-style API: no framework lock-in
Install
pip install fetchgraph
Quick Start
Selectors are JSON-only
Providers receive a selectors argument that must be JSON-serializable. The
shared alias SelectorsDict (see fetchgraph/json_types.py) represents
Dict[str, JSONValue] and is used across protocols and models. The planner/LLM
produces this structure, so do not place runtime-only Python objects (e.g.
connections, DataFrames) into selectors; pass such hints through **kwargs
instead. Providers can publish the expected shape via ProviderInfo.selectors_schema
(a JSON Schema) and optional examples containing stringified JSON payloads.
Relational providers require selectors to include a string field "op" that
chooses the operation (e.g., "schema", "semantic_only", "query"). The
complete set of supported shapes is described by the schema returned from
RelationalDataProvider.describe().
from fetchgraph import (
BaseGraphAgent, ContextPacker, BaselineSpec, ContextFetchSpec,
TaskProfile, RawLLMOutput
)
from fetchgraph.core import make_llm_plan_generic, make_llm_synth_generic
# Define providers (implement ContextProvider protocol)
class SpecProvider:
name = "spec"
def fetch(self, feature_name, selectors=None, **kw): return {"content": f"Spec for {feature_name}"}
def serialize(self, obj): return obj.get("content", "") if isinstance(obj, dict) else str(obj)
def dummy_llm(prompt: str, sender: str) -> str:
if sender == "generic_plan":
return '{"required_context":["spec"],"context_plan":[{"provider":"spec","mode":"full"}]}'
if sender == "generic_synth":
return "result: ok"
return ""
profile = TaskProfile(
task_name="Demo",
goal="Produce YAML doc from spec",
output_format="YAML: result: <...>"
)
agent = BaseGraphAgent(
llm_plan=make_llm_plan_generic(dummy_llm, profile, {"spec": SpecProvider()}),
llm_synth=make_llm_synth_generic(dummy_llm, profile),
domain_parser=lambda raw: raw.text, # RawLLMOutput -> Any
saver=lambda feature_name, parsed: None, # save side-effect
providers={"spec": SpecProvider()},
verifiers=[type("Ok",(),{"name":"ok","check":lambda self,out: []})()],
packer=ContextPacker(max_tokens=2000, summarizer_llm=lambda t: t[:200]),
baseline=[BaselineSpec(ContextFetchSpec(provider="spec"))],
)
print(agent.run("FeatureX"))
Working with selectors
- Plan-time inputs: The planner/LLM crafts
selectors(aSelectorsDict) for eachContextFetchSpec. These inputs must be JSON-serializable and should be validated by providers using their published JSON Schema. - Provider contract: Implementations of
ContextProvider.fetchshould acceptselectors: Optional[SelectorsDict] = Noneand treat**kwargsas optional runtime hints that may be non-serializable. - Schema + examples: Providers can guide planners by returning
ProviderInfo(selectors_schema=..., examples=[...])fromdescribe().
Example for a relational provider that requires an "op" selector:
from fetchgraph.json_types import SelectorsDict
from fetchgraph.models import ProviderInfo
class RelationalDataProvider:
name = "relational"
def fetch(self, feature_name: str, selectors: SelectorsDict, **kwargs):
op = selectors.get("op")
if not op:
raise ValueError("selectors.op is required")
... # existing logic for schema/semantic_only/query
def describe(self) -> ProviderInfo:
schema = {
"oneOf": [
{"type": "object", "required": ["op"], "properties": {"op": {"const": "schema"}}},
{"type": "object", "required": ["op", "sql"], "properties": {"op": {"const": "query"}, "sql": {"type": "string"}}},
]
}
return ProviderInfo(
name=self.name,
selectors_schema=schema,
examples=["{\"op\":\"schema\"}", "{\"op\":\"query\",\"sql\":\"select 1\"}"],
)
During planning you can feed selectors into ContextFetchSpec to fix the
operation:
fetch_spec = ContextFetchSpec(provider="relational", selectors={"op": "schema"})
CSV semantic backend for Pandas providers
fetchgraph.semantic_backend ships a lightweight TF-IDF backend that turns a CSV
file into semantic embeddings and reuses them across runs. The flow is:
- Build embeddings from a CSV once using
CsvEmbeddingBuilderand persist them alongside the CSV. - Configure a
CsvSemanticBackendwith one or moreCsvSemanticSourceentries (one per entity) pointing at the CSV and saved embeddings. - Pass that backend into
PandasRelationalDataProviderso semantic clauses can delegate matching to the precomputed vectors.
Example setup:
from pathlib import Path
from fetchgraph.semantic_backend import (
EmbeddingModel,
CsvEmbeddingBuilder,
CsvSemanticBackend,
CsvSemanticSource,
)
from fetchgraph.relational_models import EntityDescriptor, ColumnDescriptor
from fetchgraph.relational_pandas import PandasRelationalDataProvider
csv_path = Path("products.csv")
embedding_path = Path("products_embeddings.json")
# Build once (e.g., during deployment) to avoid recomputing embeddings at runtime.
CsvEmbeddingBuilder(
csv_path=csv_path,
entity="product",
id_column="id",
text_fields=["name", "description"],
output_path=embedding_path,
).build()
semantic_backend = CsvSemanticBackend(
{"product": CsvSemanticSource("product", csv_path, embedding_path)}
)
entities = [
EntityDescriptor(
name="product",
columns=[ColumnDescriptor(name="id", role="primary_key"), ColumnDescriptor(name="name"), ColumnDescriptor(name="description")],
)
]
provider = PandasRelationalDataProvider(
name="products", entities=entities, relations=[], frames={"product": ...}, semantic_backend=semantic_backend
)
You can plug in an embedding model (for example, an OpenAI client) to build and query dense embeddings instead of the default TF-IDF vectors:
from fetchgraph.semantic_backend import (
EmbeddingModel,
CsvSemanticSource,
CsvEmbeddingBuilder,
CsvSemanticBackend,
)
class OpenAIEmbeddingModel:
def __init__(self, client):
self.client = client
def embed_documents(self, texts):
# replace with client.embeddings(...)
return [[1.0, 0.0] for _ in texts]
def embed_query(self, text):
return self.embed_documents([text])[0]
embedding = OpenAIEmbeddingModel(client)
CsvEmbeddingBuilder(
csv_path="fbs.csv",
entity="fbs",
id_column="id",
text_fields=["name", "description"],
output_path="fbs_embeddings.json",
embedding_model=embedding,
).build()
csv_backend = CsvSemanticBackend(
{
"fbs": CsvSemanticSource(
entity="fbs",
csv_path=Path("fbs.csv"),
embedding_path=Path("fbs_embeddings.json"),
)
},
embedding_model=embedding,
)
At query time, SemanticClause filters sent to the relational provider will
call semantic_backend.search(...) with the requested entity, fields, and
query text. Fields must be a subset of the indexed CSV columns (not including
the reserved __all__ combined projection). By default, field similarities are
summed; adjust the backend if you need a different aggregation strategy.
pgvector / LangChain vector stores
If you already manage embeddings in PostgreSQL with pgvector via LangChain,
you can supply your existing vector stores directly:
from langchain_community.vectorstores.pgvector import PGVector
from fetchgraph.semantic_backend import PgVectorSemanticBackend, PgVectorSemanticSource
vector_store = PGVector.from_existing_index(
collection_name="product_vectors", connection_string="postgresql+psycopg://..."
)
semantic_backend = PgVectorSemanticBackend(
{
"product": PgVectorSemanticSource(
entity="product",
vector_store=vector_store,
metadata_entity_key="entity", # optional, defaults to "entity"
metadata_field_key="field", # optional, defaults to "field"
id_metadata_keys=("id",), # optional metadata key(s) to read the row identifier
score_kind="distance", # convert pgvector distances into similarity scores
)
}
)
The backend will filter returned documents by entity and requested fields using
Document metadata before converting scores into :class:SemanticMatch entries.
LICENSE
MIT License
Copyright (c) 2025 ...
Permission is hereby granted, free of charge, to any person obtaining a copy
...
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fetchgraph-0.1.2.tar.gz.
File metadata
- Download URL: fetchgraph-0.1.2.tar.gz
- Upload date:
- Size: 73.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
376e332bd39db03e4a607456f904f877f0d06d682efca71b22d184ae38ec1f55
|
|
| MD5 |
b53c61dd6bd72986f3259e52f77228a0
|
|
| BLAKE2b-256 |
488b0c85e9554a48dfefc129723fb32faaeb39443012d4500865affb44a349d5
|
File details
Details for the file fetchgraph-0.1.2-py3-none-any.whl.
File metadata
- Download URL: fetchgraph-0.1.2-py3-none-any.whl
- Upload date:
- Size: 66.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1a4be2f1ad75287199be2cffbde1922c90ca086ac2b92ebec49d86d02538e2e4
|
|
| MD5 |
18037d9c0e5373f4102af943d0354ca4
|
|
| BLAKE2b-256 |
7324e80fecd41e7eca855d76d0887fc680aa97509ca9d1fb9cf40cbe0a8e5c65
|