Skip to main content

Graph-like planning → context fetching → synthesis agent (library-style).

Project description

fetchgraph

Universal, library-style agent that plans what to fetch, fetches context from pluggable providers, and synthesizes an output.

Pipeline: PLAN → FETCH → (ASSESS/REFETCH)* → SYNTH → VERIFY → (REFINE)* → SAVE

Install (dev)

pip install -e .

Quick Start

Selectors are JSON-only

Providers receive a selectors argument that must be JSON-serializable. The shared alias SelectorsDict (see fetchgraph/json_types.py) represents Dict[str, JSONValue] and is used across protocols and models. The planner/LLM produces this structure, so do not place runtime-only Python objects (e.g. connections, DataFrames) into selectors; pass such hints through **kwargs instead. Providers can publish the expected shape via ProviderInfo.selectors_schema (a JSON Schema) and optional examples containing stringified JSON payloads.

Relational providers require selectors to include a string field "op" that chooses the operation (e.g., "schema", "semantic_only", "query"). The complete set of supported shapes is described by the schema returned from RelationalDataProvider.describe().

from fetchgraph import (
  BaseGraphAgent, ContextPacker, BaselineSpec, ContextFetchSpec,
  TaskProfile, RawLLMOutput
)
from fetchgraph.core import make_llm_plan_generic, make_llm_synth_generic

# Define providers (implement ContextProvider protocol)
class SpecProvider:
    name = "spec"
    def fetch(self, feature_name, selectors=None, **kw): return {"content": f"Spec for {feature_name}"}
    def serialize(self, obj): return obj.get("content", "") if isinstance(obj, dict) else str(obj)

def dummy_llm(prompt: str, sender: str) -> str:
    if sender == "generic_plan":
        return '{"required_context":["spec"],"context_plan":[{"provider":"spec","mode":"full"}]}'
    if sender == "generic_synth":
        return "result: ok"
    return ""

profile = TaskProfile(
  task_name="Demo",
  goal="Produce YAML doc from spec",
  output_format="YAML: result: <...>"
)

agent = BaseGraphAgent(
  llm_plan=make_llm_plan_generic(dummy_llm, profile, {"spec": SpecProvider()}),
  llm_synth=make_llm_synth_generic(dummy_llm, profile),
  domain_parser=lambda raw: raw.text,  # RawLLMOutput -> Any
  saver=lambda feature_name, parsed: None,  # save side-effect
  providers={"spec": SpecProvider()},
  verifiers=[type("Ok",(),{"name":"ok","check":lambda self,out: []})()],
  packer=ContextPacker(max_tokens=2000, summarizer_llm=lambda t: t[:200]),
  baseline=[BaselineSpec(ContextFetchSpec(provider="spec"))],
)

print(agent.run("FeatureX"))

Working with selectors

  • Plan-time inputs: The planner/LLM crafts selectors (a SelectorsDict) for each ContextFetchSpec. These inputs must be JSON-serializable and should be validated by providers using their published JSON Schema.
  • Provider contract: Implementations of ContextProvider.fetch should accept selectors: Optional[SelectorsDict] = None and treat **kwargs as optional runtime hints that may be non-serializable.
  • Schema + examples: Providers can guide planners by returning ProviderInfo(selectors_schema=..., examples=[...]) from describe().

Example for a relational provider that requires an "op" selector:

from fetchgraph.json_types import SelectorsDict
from fetchgraph.models import ProviderInfo

class RelationalDataProvider:
    name = "relational"

    def fetch(self, feature_name: str, selectors: SelectorsDict, **kwargs):
        op = selectors.get("op")
        if not op:
            raise ValueError("selectors.op is required")
        ...  # existing logic for schema/semantic_only/query

    def describe(self) -> ProviderInfo:
        schema = {
            "oneOf": [
                {"type": "object", "required": ["op"], "properties": {"op": {"const": "schema"}}},
                {"type": "object", "required": ["op", "sql"], "properties": {"op": {"const": "query"}, "sql": {"type": "string"}}},
            ]
        }
        return ProviderInfo(
            name=self.name,
            selectors_schema=schema,
            examples=["{\"op\":\"schema\"}", "{\"op\":\"query\",\"sql\":\"select 1\"}"],
        )

During planning you can feed selectors into ContextFetchSpec to fix the operation:

fetch_spec = ContextFetchSpec(provider="relational", selectors={"op": "schema"})

CSV semantic backend for Pandas providers

fetchgraph.semantic_backend ships a lightweight TF-IDF backend that turns a CSV file into semantic embeddings and reuses them across runs. The flow is:

  1. Build embeddings from a CSV once using CsvEmbeddingBuilder and persist them alongside the CSV.
  2. Configure a CsvSemanticBackend with one or more CsvSemanticSource entries (one per entity) pointing at the CSV and saved embeddings.
  3. Pass that backend into PandasRelationalDataProvider so semantic clauses can delegate matching to the precomputed vectors.

Example setup:

from pathlib import Path
from fetchgraph.semantic_backend import (
    CsvEmbeddingBuilder,
    CsvSemanticBackend,
    CsvSemanticSource,
)
from fetchgraph.relational_models import EntityDescriptor, ColumnDescriptor
from fetchgraph.relational_pandas import PandasRelationalDataProvider

csv_path = Path("products.csv")
embedding_path = Path("products_embeddings.json")

# Build once (e.g., during deployment) to avoid recomputing embeddings at runtime.
CsvEmbeddingBuilder(
    csv_path=csv_path,
    entity="product",
    id_column="id",
    text_fields=["name", "description"],
    output_path=embedding_path,
).build()

semantic_backend = CsvSemanticBackend(
    {"product": CsvSemanticSource("product", csv_path, embedding_path)}
)

entities = [
    EntityDescriptor(
        name="product",
        columns=[ColumnDescriptor(name="id", role="primary_key"), ColumnDescriptor(name="name"), ColumnDescriptor(name="description")],
    )
]

provider = PandasRelationalDataProvider(
    name="products", entities=entities, relations=[], frames={"product": ...}, semantic_backend=semantic_backend
)

At query time, SemanticClause filters sent to the relational provider will call semantic_backend.search(...) with the requested entity, fields, and query text. Fields must be a subset of the indexed CSV columns (not including the reserved __all__ combined projection). By default, field similarities are summed; adjust the backend if you need a different aggregation strategy.

pgvector / LangChain vector stores

If you already manage embeddings in PostgreSQL with pgvector via LangChain, you can supply your existing vector stores directly:

from langchain_community.vectorstores.pgvector import PGVector
from fetchgraph.semantic_backend import PgVectorSemanticBackend, PgVectorSemanticSource

vector_store = PGVector.from_existing_index(
    collection_name="product_vectors", connection_string="postgresql+psycopg://..."
)

semantic_backend = PgVectorSemanticBackend(
    {
        "product": PgVectorSemanticSource(
            entity="product",
            vector_store=vector_store,
            metadata_entity_key="entity",  # optional, defaults to "entity"
            metadata_field_key="field",    # optional, defaults to "field"
            id_metadata_keys=("id",),       # optional metadata key(s) to read the row identifier
            score_kind="distance",          # convert pgvector distances into similarity scores
        )
    }
)

The backend will filter returned documents by entity and requested fields using Document metadata before converting scores into :class:SemanticMatch entries.


LICENSE

MIT License

Copyright (c) 2025 ...

Permission is hereby granted, free of charge, to any person obtaining a copy
...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fetchgraph-0.0.3.tar.gz (45.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fetchgraph-0.0.3-py3-none-any.whl (46.2 kB view details)

Uploaded Python 3

File details

Details for the file fetchgraph-0.0.3.tar.gz.

File metadata

  • Download URL: fetchgraph-0.0.3.tar.gz
  • Upload date:
  • Size: 45.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for fetchgraph-0.0.3.tar.gz
Algorithm Hash digest
SHA256 b6b7fd1b3724c465a480cc6486507615d39b63ec0afc2d4053d2ea99f91b9b21
MD5 eb52393720b24097cf92a88afe597eaf
BLAKE2b-256 09fc9893f6b2c49dbb6e8ebe4cde12298f7db5bf649200aa1011c2046c5f571c

See more details on using hashes here.

File details

Details for the file fetchgraph-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: fetchgraph-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 46.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for fetchgraph-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 69903c87c9c181069b8d1222e8e4adefbd207cb06bb592454de8094993ceb69c
MD5 c207641dcb048cd141e1d7128b972924
BLAKE2b-256 c0ff25bac344cad531dcda1ef38dd79cf5c5a69f60d5c49068d69bdc43bd73e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page