Streaming knowledge-graph reasoner with a typed query layer (multilingual SPLADE + Dempster-Shafer MCTS + cassette persistence).

These details have not been verified by PyPI

Project links

Project description

Infon

Streaming knowledge-graph reasoner with a typed query layer. Stream text in over HTTP, get typed triples back, query them with composable operators (covers, place_overlaps, compose, evaluate, plan). The default deployment is a Redis-style FastAPI service. Persist to a content-addressed cassette when you outgrow RAM.

The point isn't "ask a question, get an answer". The point is that the things the question is about — a duration, an ongoing interval, a country, a clause — are first-class records in the index, and the operators that combine them are pure-Python predicates with no neural in the loop.

30-second hello world

pip install infon
infon schema init defence -o schema.json
infon serve --schema schema.json &
infon feed   doc_a "BAE Systems delivered six long-endurance UAVs to the RAAF in 2025."
infon flush  doc_a --doc-id press.bae.2025_03 --timestamp 2025-03-12
infon query  candidates --actor bae --top-k 5

Or in Python:

from infon import AnchorSchema, Service, CandidateQuery

schema = AnchorSchema.from_file("schema.json")
svc = Service(schema=schema)

svc.feed("doc_a", "Toyota partnered with Panasonic on solid-state batteries.")
svc.feed("doc_a", "The Japanese automaker plans to launch them by 2027.")
rep = svc.flush("doc_a", doc_id="press.toyota.2025_09",
                timestamp="2025-09-01")
# rep.n_infons, rep.infon_ids

rows = svc.query(CandidateQuery(actor="toyota", min_confidence=0.6))
for c in rows:
    inf = svc.store.get(c.hit.loc.infon_id)
    print(c.score, inf.subject, inf.predicate, inf.object)

In-process queries return Candidate(hit, score, provenance); hydrate the Infon via svc.store.get(...). Over HTTP, the wire format is CandidateRow(subject, predicate, object, score, ...) — no hydration step needed.

Or over the wire, against a live server:

from infon import Client

with Client("http://localhost:8000") as c:
    c.feed("doc_a", "Toyota partnered with Panasonic on solid-state batteries.")
    rep = c.flush("doc_a")
    rows = c.candidates(actor="toyota", min_confidence=0.6, top_k=20)
    verdict = c.connect("toyota", "samsung", max_hops=3)
    counts = c.aggregate("count", group_by="actor")

What's inside

              feed text                   query typed predicates
                  │                                │
                  ▼                                ▼
        ┌────────────────────────────────────────────────┐
        │   FastAPI service (Redis-style streaming)      │
        │   POST /feed   /flush   /query   /snapshot     │
        └────────────────────────────────────────────────┘
                              │
                              ▼
        ┌────────────────────────────────────────────────┐
        │   Service kernel (in-process, RAM-only)        │
        │   feed(stream_id, text)  → per-stream buffer   │
        │   flush(stream_id)       → fastcoref+extract   │
        │                              → MemoryStore     │
        └────────────────────────────────────────────────┘
                              │
                              ▼
        ┌────────────────────────────────────────────────┐
        │   Operators over typed records                 │
        │   covers / ongoing_at / bind_event   (temporal)│
        │   matches / filter_to_index_args     (threshold)│
        │   place_overlaps / within_radius     (spatial) │
        │   convert / convert_currency         (quantity)│
        │   evaluate / restrict / compose      (modal)   │
        │   plan / route                       (planner) │
        └────────────────────────────────────────────────┘

feed and flush are two calls because coreference is a document-level operation. Resolving "it" / "the company" against a single sentence is wrong by design — the actor hasn't been named yet. Single-sentence coref would be a fallback; we don't ship fallbacks. So feed buffers, flush runs fastcoref over the buffered window, and only post-flush sentences become queryable.

Documentation


Schema authoring	Anchor types, JSON shape, hierarchies, multilingual surfaces.
Operators	Full reference: numeric, temporal, spatial, modality, planner, MCTS, JSON DSL.
Examples	Tested transcripts for every operator family + Service end-to-end.
Multilingual	XLM-R SPLADE coverage, when to add explicit anchor tokens, performance.
Deployment	`infon serve`, env vars, snapshot/reload, S3, production checklist.
CLI	All `infon` subcommands.

Try it on a vertical

Six end-to-end Jupyter notebooks, each running the full streaming pipeline against a real corpus:

Notebook	Domain
`notebooks/01_supply_chain.ipynb`	Multi-tier supplier risk (ERP/AIS/customs ingest).
`notebooks/02_legal_contracts.ipynb`	Clause refs, defined terms, modality, conditional contexts.
`notebooks/03_defence_industry.ipynb`	Eight operators per sentence; persona-conditioned MCTS.
`notebooks/04_compliance_regulatory.ipynb`	Threshold predicates, currency over time, jurisdictions.
`notebooks/05_kano_conjoint.ipynb`	Product voice → Kano + conjoint structural analysis.
`notebooks/06_drug_discovery.ipynb`	Bio entities, mechanism-of-action, evidence tiers.

Run any of them with pip install 'infon[demo]' first.

Multilingual support

The default encoder is multilingual XLM-R SPLADE (opensearch-project/opensearch-neural-sparse-encoding-multilingual-v1 — Apache-2.0, ~1.1 GB). Latin-script languages and most Cyrillic / Greek work out of the box. Japanese, Chinese, and Korean need explicit tokens per script in the schema:

{
  "toyota": {
    "type": "actor",
    "tokens": ["toyota", "トヨタ", "丰田", "도요타"],
    "canonical_name": "Toyota Motor Corporation",
    "aliases": ["TMC", "トヨタ自動車", "丰田汽车"]
  }
}

See docs/multilingual.md for the full story.

Operators — what we compute

A taste; full reference in docs/operators.md.

Threshold — unit-aware numeric predicates

from infon.threshold import Threshold, matches, filter_to_index_args

t = Threshold(kind="duration", op=">=", value=14, unit_surface="hours")
filter_to_index_args(t)        # {'kind': 'duration', 'value_min': 50400.0}
matches(t, value=50400.0, unit_surface="hours")    # True
matches(t, value=50400.0, unit_surface="km")       # False — wrong dimension

compose / covers / bind_event — temporal inference

from infon.temporal_inference import compose, covers, ongoing_at

refs = [
    TemporalReference(op="in",    start_iso="1982-01-01",
                                  end_iso="1982-12-31"),
    TemporalReference(op="since", start_iso="1982-01-01", end_iso=None),
]
compose(refs)                  # InferredInterval(..., ongoing=True)
covers(refs, "2010-06-15")     # True
ongoing_at(refs, "2026-05-26") # True

Quantity — distribution-aware arithmetic

from infon.quantity import Quantity, FXTable, convert_currency

# Uncertain quantity: "around 1000 kg" → normal(μ=1000, σ=50)
mass = Quantity(Distribution.normal(1000.0, 50.0), KILOGRAM)

# FX is dated. No silent fallback to "today's rate".
fx = FXTable()
fx.add("EUR", "USD", "2026-01-15", 1.10)
revenue_eur = Quantity.from_value(1_000_000.0, currency_unit("EUR"))
convert_currency(revenue_eur, "USD", on="2025-01-01", fx=fx)   # → None

plan — multi-constraint search

from infon import CandidateQuery, NumericPredicate, TemporalPredicate

q = CandidateQuery(
    predicate="produces",
    numeric=(NumericPredicate(kind="duration", value_min=50400.0),),  # ≥14h
    temporal=(TemporalPredicate(mode="overlap", t_start="2023-01-01"),),
    actor_in=("AU", "GB", "JP"),
    actor_not_in=("US",),
    evidentiality="primary",
    min_confidence=0.6,
)
cands = svc.query(q)

connect / any_of — multi-hop reachability

from infon import ConnectQuery, AnyOfQuery

v = svc.query(ConnectQuery(source="toyota", target="samsung", max_hops=3))
# Verdict(label='SUPPORTS',
#         mass=Mass(supports=0.403, refutes=0.000, theta=0.597),
#         sources=[Edge(toyota -supply→ catl),
#                  Edge(catl   -license→ samsung)],
#         n_candidates=2, n_hydrated=2)

MCTS over the hypergraph, Dempster–Shafer chain mass, refutation aware: a refuting edge cancels affirmation on the same triple.

evaluate — JSON DSL

from infon.logical_tool import evaluate

evaluate(cog, {
    "and": [
        {"triple": {"s": "toyota", "p": "invest", "o": "solid_state"}},
        {"not":   {"triple": {"s": "toyota", "p": "invest", "o": "lithium_ion"}}},
    ]
})
# {'verdict': 'SUPPORTS', 'mass': {...}, 'trace': [...]}

triple / and / or / not / if / exists / forall. Each sub-expression returns its own Dempster–Shafer mass plus an audit trace.

One sentence, eight operators

"Per §3.2(a), the Supplier shall, within 30 days of the Effective Date (as defined in §1.4), deliver to the Buyer's facility within 50 km of Canberra a system meeting the endurance specification (≥14 hours), provided the system is not manufactured in the United States."

clause-ref §3.2(a)             →  IKLIst("clause:3.2(a)", obligation)
defined-term Effective Date    →  IKLThat("defined:effective_date")
modality "shall"               →  ModalClaim(operator='O', ...)
duration "30 days"             →  bind_event(within, effective_date)
spatial radius "within 50km"   →  PlaceReference(op='within', radius_km=50)
threshold "≥14 hours"          →  Threshold(kind='duration', op='>=', 14h)
conditional "provided"         →  ConditionalContext(antecedent=...)
spatial negation "not in US"   →  actor_not_in=("US",)

The planner takes the eight typed records and resolves them in one call. None of the operators ever re-read the sentence string.

Persistence — the snapshot upgrade hatch

The streaming service is RAM-only by contract — kill the process, lose the data. Same shape as redis-server without RDB/AOF. To persist, snapshot the live store:

path = svc.snapshot_to_cassette("./data/run_2026_05_27")

Reload any time:

from infon import Service, AnchorSchema
schema = AnchorSchema.from_file("schema.json")
svc = Service.from_cassette("./data/run_2026_05_27", schema=schema)

The cassette substrate gives you what RAM doesn't: time-travel snapshots, S3-backed delta ingest, Kan-pushforward schema migration (62× faster than re-extracting), and a manifest pruner that skips 7–16× of shard opens at scale. URI swap for cloud: InfonStore("s3://bucket/prefix").

Install

pip install infon                  # core
pip install 'infon[demo]'          # + ddgs (web search) + strands-agents
pip install 'infon[aws]'           # + boto3 + s3fs (cassette on S3)
pip install 'infon[all]'           # everything

Dependency	Purpose	Required?
`torch` ≥ 2.0	GNN + SSL losses	yes
`transformers` ≥ 4.40	SPLADE tokenizer/model	yes
`numpy` ≥ 1.24	linear algebra	yes
`pyarrow` ≥ 15	cassette indexes	yes
`fsspec` ≥ 2024.1	local + S3 paths	yes
`fastcoref` ≥ 2.1	coreference	yes
`fastapi` ≥ 0.100	HTTP service	yes
`uvicorn` ≥ 0.20	ASGI server	yes
`httpx`	client transport	yes (via fastapi)
`s3fs` ≥ 2024.1	S3 backend	optional (`[aws]`)
`strands-agents[bedrock]` ≥ 1.0	conversational layer	optional (`[agent]`)
`ddgs` ≥ 6.0	live web search	optional (`[demo]`)

The default anchor encoder downloads from Hugging Face on first use and caches under ~/.cache/huggingface. There is no bundled fallback — missing network on first run fails loudly.

Public API

The supported surface is everything in infon.__all__:

from infon import (
    # Core
    AnchorSchema, ServiceConfig, InfonConfig,
    Infon, Edge, Constraint, Span, QueryResult,

    # Streaming kernel + transport
    Service, FlushReport, IngestReport, ResearchPlan,
    create_app,
    Client, AsyncClient,
    FlushReceipt, CandidateRow, Mass, InfonRow, Verdict,
    ResearchPlanResult,

    # Typed queries
    CandidateQuery,
    NumericPredicate, TemporalPredicate, PlacePredicate,
    SequencePredicate, FrequencyPredicate,
    ConnectQuery, AnyOfQuery, AggregateQuery,

    # Persistence
    InfonStore, MemoryStore, Query,

    # Encoder (lazy — torch loads on first attribute access)
    Encoder,

    # Errors
    InfonError, ConfigError, SchemaError, ExtractionError,
    RoutingError, SnapshotError, DimensionError, LogicalExprError,
)

Heavy deps (torch, transformers, fastcoref) load lazily — import infon is cheap until you reach for an operator that needs the real models.

References

Bodnar et al. 2022 (Neural Sheaf Diffusion) · Schlichtkrull et al. 2018 (R-GCN) · Shafer 1976 (Dempster–Shafer) · Barwise & Perry 1983 (situation semantics) · Kan 1958 (adjoint functors for schema migration) · Formal, Piwowarski & Clinchant 2021 (SPLADE).

Encoder attribution

Infon's default encoder is the multilingual SPLADE checkpoint published by the OpenSearch Project:

@misc{opensearch2024multilingualsparse,
  author = {OpenSearch Project},
  title  = {Neural Sparse Encoding Multilingual v1},
  year   = {2024},
  url    = {https://huggingface.co/opensearch-project/opensearch-neural-sparse-encoding-multilingual-v1}
}

Released under Apache-2.0. Users embedding Infon in commercial products should still verify the upstream model card for their deployment context.

License

Apache 2.0 — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.1

May 27, 2026

0.1.0

May 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

infon-0.1.1.tar.gz (2.0 MB view details)

Uploaded May 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

infon-0.1.1-py3-none-any.whl (2.0 MB view details)

Uploaded May 27, 2026 Python 3

File details

Details for the file infon-0.1.1.tar.gz.

File metadata

Download URL: infon-0.1.1.tar.gz
Upload date: May 27, 2026
Size: 2.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for infon-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`c37aa4864da17d9ee0db30568ec030dd836a6377ccc8a6c1e6f9ee2b59afab23`
MD5	`a425191cce4e471ffab1ac8e9868f983`
BLAKE2b-256	`8d92e6a5ed18e95a370486aa5ace7165bf35d43972c200ff84ef00f1e2e009f3`

See more details on using hashes here.

File details

Details for the file infon-0.1.1-py3-none-any.whl.

File metadata

Download URL: infon-0.1.1-py3-none-any.whl
Upload date: May 27, 2026
Size: 2.0 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for infon-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e0b806f0453de90afca92ed99c28fc04048040c92722ba6b7c12bc3d5a1eb57c`
MD5	`00c569302741e0dbc77ed755aadc2b84`
BLAKE2b-256	`f1ecf3342c0a2aee475c985a03c6c1349817661b2621aea8274bf89c1fe1ac20`

See more details on using hashes here.

infon 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Infon

30-second hello world

What's inside

Documentation

Try it on a vertical

Multilingual support

Operators — what we compute

Threshold — unit-aware numeric predicates

compose / covers / bind_event — temporal inference

Quantity — distribution-aware arithmetic

plan — multi-constraint search

connect / any_of — multi-hop reachability

evaluate — JSON DSL

One sentence, eight operators

Persistence — the snapshot upgrade hatch

Install

Public API

References

Encoder attribution

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes