TRUGS web research tool — crawl web sources and build passive TRUGS knowledge graphs. One-way and passive: it builds graphs, it never closes the self-developing loop. Single `trugs-web` binary.

These details have not been verified by PyPI

Project links

Project description

trugs-web

Web-to-TRUG builder. One binary, trugs-web — crawl web sources, extract entities and relations with LLM assistance (model-agnostic), resolve and score credibility, and build a passive TRUGS knowledge graph for querying. It builds graphs; it never closes the self-developing loop (reserved patent mechanism, US app 19/575,491).

What & Why

trugs-web is a one-way, passive research tool. You point it at seed URLs; it discovers sources, extracts structured knowledge (entities, relations, citations) via LLM-backed natural language processing, resolves entity identity across sources, scores credibility from topology and metadata, and emits a TRUGS 1.0 format graph. That graph is a passive data structure—queryable, traversable, reportable—but inert. The tool does not modify it in place; no agent closes the feedback loop. This boundary is licensed: the reserved patent (US 19/575,491) protects self-modifying graph substrates. A downstream user who wires this tool's passive output into a self-modifying agent operates outside this grant.

As a T2 reference application (sibling of trugs-folder), trugs-web depends downward on the T1 commons: trugs-tools>=2.0.0 (language, validator) and trugs-store>=2.0.0 (graph persistence). v1 ships the ingestion pipeline (crawler, extractor, resolver, credibility scorer, graph builder), query subsystem (loader, traverse, synthesize), and weight computation (topology-based importance ranking). Hub federation and refresh are deferred to Phase 2 (see deferred_phase2/).

Key Features

Source discovery: crawl seed URLs, discover linked pages, respect robots.txt, apply rate limits and exponential backoff
Entity & relation extraction: LLM-backed NLP (model-agnostic; use Anthropic Claude, OpenAI GPT, or mock for testing)
Cross-reference resolution: deduplicate entities across sources by name, description, and context
Credibility scoring: topology-aware confidence weights on nodes and edges
TRUGS 1.0 graph output: validated against trugs_tools.validator.validate_trug; query-ready JSON
Query & traverse: load a graph, traverse by relation type, filter by weight threshold, synthesize findings into markdown reports
Safety rails: secrets from environment (never inline), LLM cost guard, inter-request delays, structured logging

Quick Example

import asyncio
from trugs_web import build_graph, load_graph, query_graph, generate_report

# 1. Build a passive graph from seed URLs (mock LLM provider for testing)
async def main():
    builder = await build_graph(
        topic="acupuncture evidence",
        seed_urls=["https://example.com/research"],
        llm_provider="mock",  # or "anthropic", "openai"
        output_path="acupuncture.trug.json"
    )
    print(f"Built: {len(builder.graph['nodes'])} nodes, {len(builder.graph['edges'])} edges")

    # 2. Load and query the graph
    graph = load_graph("acupuncture.trug.json")
    results = query_graph(graph, "sources for acupuncture efficacy", min_weight=0.5)
    print(results)

    # 3. Synthesize a markdown report
    report = await generate_report(graph, "acupuncture efficacy evidence")
    print(report.to_markdown())

asyncio.run(main())

Installation

Requirements: Python ≥ 3.11

# Minimal (graph building + querying, no crawling or LLM)
pip install trugs-web

# With web crawling (httpx, beautifulsoup4, lxml)
pip install "trugs-web[web]"

# With LLM support (anthropic, openai)
pip install "trugs-web[llm]"

# Both
pip install "trugs-web[web,llm]"

# Verify
trugs-web --version

Usage

The trugs-web binary exposes four verbs:

`crawl` — Discover sources from seed URLs (no LLM)

trugs-web crawl https://example.com/research --topic "acupuncture" --max-sources 30

Outputs a list of discovered sources (title, URL, source type). No LLM calls.

`build` — Crawl, extract, resolve, score → TRUGS graph

trugs-web build https://example.com/research \
  --topic "acupuncture efficacy" \
  --provider anthropic \
  --out acupuncture.trug.json

Full pipeline: discovers sources, extracts entities/relations via LLM, resolves duplicates, scores credibility, writes a TRUGS 1.0 JSON graph. Requires $ANTHROPIC_API_KEY or $OPENAI_API_KEY in the environment (provider-dependent).

`query` — Traverse and find within a built graph

trugs-web query acupuncture.trug.json --q "sources for efficacy" --min-weight 0.6

Traverses the graph, returns matching nodes and paths as JSON.

`synthesize` — Render a markdown report from a graph

trugs-web synthesize acupuncture.trug.json \
  --q "acupuncture evidence summary" \
  --out report.md

Queries the graph and synthesizes findings into a human-readable markdown document.

Every verb documents examples and options: trugs-web <verb> --help.

Library Use

from trugs_web import (
    build_graph,            # async; orchestrates the full pipeline
    load_graph,             # load a .trug.json file
    query_graph,            # traverse and filter
    generate_report,        # async; synthesize markdown
    TRUGSWebGraphBuilder,   # low-level graph construction
    EntityExtractor,        # LLM-backed entity extraction
    EntityResolver,         # cross-reference deduplication
    CredibilityScorer,      # topology-aware scoring
)

# Programmatic graph building
builder = TRUGSWebGraphBuilder(name="myresearch", topic="my topic")
await builder.build("https://example.com", llm_provider="mock")
builder.save("output.trug.json", validate=True)

# Validation (graph_builder uses trugs_tools.validator internally)
graph = load_graph("output.trug.json")
from trugs_tools.validator import validate_trug
validate_trug(graph)  # raises if invalid

Documentation

ARCHITECTURE.md — v1 subsystem design, data flow, validation contract
AGENT.md — agent/multi-turn patterns (if used with LLM orchestration)
CHANGELOG.md — v1.0 → v2.0 migration, breaking changes

Status

Beta. v1 implements the Phase 1 ingestion and query pipeline. Hub federation, refresh, and self-developing loop reservation are deferred to Phase 2. Test coverage: 270 passing (respx-mocked HTTP, MockLLMClient). Pytest markers: robots, rate_limit, cost, secret, logging, graph_validation_e2e.

License

Apache-2.0. See LICENSE and NOTICE.

This license covers a one-way, passive tool. The reserved patent mechanism (US app 19/575,491) protects self-modifying graph substrates. See NOTICE for boundary and commercial licensing.

Contributing

PRs welcome. Issues: https://github.com/TRUGS-LLC/TRUGS-WEB/issues

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.0.0

Jun 21, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trugs_web-2.0.0.tar.gz (57.9 kB view details)

Uploaded Jun 21, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

trugs_web-2.0.0-py3-none-any.whl (43.6 kB view details)

Uploaded Jun 21, 2026 Python 3

File details

Details for the file trugs_web-2.0.0.tar.gz.

File metadata

Download URL: trugs_web-2.0.0.tar.gz
Upload date: Jun 21, 2026
Size: 57.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for trugs_web-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`5595ce0185982a20ca611d8283e14fc2e75e13838a5adf4bbe143007baa03a2c`
MD5	`5bab8c77257119115c263a42acc4c9bd`
BLAKE2b-256	`17dab8616a2bcacaa2fe4f1bb818fe36ea6b6a205dbccb25bdca1f8abbfde000`

See more details on using hashes here.

File details

Details for the file trugs_web-2.0.0-py3-none-any.whl.

File metadata

Download URL: trugs_web-2.0.0-py3-none-any.whl
Upload date: Jun 21, 2026
Size: 43.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for trugs_web-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`009c55182809f192b4ce3fae7f45de762f04674c48baf5dc6e55ca600c7f0938`
MD5	`072db2d2758c0de2943469f87b3979c0`
BLAKE2b-256	`79cb04e4411a5ebffe6234da2774816c95c516f563b940c31aa0f4f277d45099`

See more details on using hashes here.

trugs-web 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

trugs-web

What & Why

Key Features

Quick Example

Installation

Usage

crawl — Discover sources from seed URLs (no LLM)

build — Crawl, extract, resolve, score → TRUGS graph

query — Traverse and find within a built graph

synthesize — Render a markdown report from a graph

Library Use

Documentation

Status

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`crawl` — Discover sources from seed URLs (no LLM)

`build` — Crawl, extract, resolve, score → TRUGS graph

`query` — Traverse and find within a built graph

`synthesize` — Render a markdown report from a graph