TRUGS web research tool — crawl web sources and build passive TRUGS knowledge graphs. One-way and passive: it builds graphs, it never closes the self-developing loop. Single `trugs-web` binary.
Project description
trugs-web
Web-to-TRUG builder. One binary, trugs-web — crawl web sources, extract entities and relations with LLM assistance (model-agnostic), resolve and score credibility, and build a passive TRUGS knowledge graph for querying. It builds graphs; it never closes the self-developing loop (reserved patent mechanism, US app 19/575,491).
What & Why
trugs-web is a one-way, passive research tool. You point it at seed URLs; it discovers sources, extracts structured knowledge (entities, relations, citations) via LLM-backed natural language processing, resolves entity identity across sources, scores credibility from topology and metadata, and emits a TRUGS 1.0 format graph. That graph is a passive data structure—queryable, traversable, reportable—but inert. The tool does not modify it in place; no agent closes the feedback loop. This boundary is licensed: the reserved patent (US 19/575,491) protects self-modifying graph substrates. A downstream user who wires this tool's passive output into a self-modifying agent operates outside this grant.
As a T2 reference application (sibling of trugs-folder), trugs-web depends downward on the T1 commons: trugs-tools>=2.0.0 (language, validator) and trugs-store>=2.0.0 (graph persistence). v1 ships the ingestion pipeline (crawler, extractor, resolver, credibility scorer, graph builder), query subsystem (loader, traverse, synthesize), and weight computation (topology-based importance ranking). Hub federation and refresh are deferred to Phase 2 (see deferred_phase2/).
Key Features
- Source discovery: crawl seed URLs, discover linked pages, respect
robots.txt, apply rate limits and exponential backoff - Entity & relation extraction: LLM-backed NLP (model-agnostic; use Anthropic Claude, OpenAI GPT, or mock for testing)
- Cross-reference resolution: deduplicate entities across sources by name, description, and context
- Credibility scoring: topology-aware confidence weights on nodes and edges
- TRUGS 1.0 graph output: validated against
trugs_tools.validator.validate_trug; query-ready JSON - Query & traverse: load a graph, traverse by relation type, filter by weight threshold, synthesize findings into markdown reports
- Safety rails: secrets from environment (never inline), LLM cost guard, inter-request delays, structured logging
Quick Example
import asyncio
from trugs_web import build_graph, load_graph, query_graph, generate_report
# 1. Build a passive graph from seed URLs (mock LLM provider for testing)
async def main():
builder = await build_graph(
topic="acupuncture evidence",
seed_urls=["https://example.com/research"],
llm_provider="mock", # or "anthropic", "openai"
output_path="acupuncture.trug.json"
)
print(f"Built: {len(builder.graph['nodes'])} nodes, {len(builder.graph['edges'])} edges")
# 2. Load and query the graph
graph = load_graph("acupuncture.trug.json")
results = query_graph(graph, "sources for acupuncture efficacy", min_weight=0.5)
print(results)
# 3. Synthesize a markdown report
report = await generate_report(graph, "acupuncture efficacy evidence")
print(report.to_markdown())
asyncio.run(main())
Installation
Requirements: Python ≥ 3.11
# Minimal (graph building + querying, no crawling or LLM)
pip install trugs-web
# With web crawling (httpx, beautifulsoup4, lxml)
pip install "trugs-web[web]"
# With LLM support (anthropic, openai)
pip install "trugs-web[llm]"
# Both
pip install "trugs-web[web,llm]"
# Verify
trugs-web --version
Usage
The trugs-web binary exposes four verbs:
crawl — Discover sources from seed URLs (no LLM)
trugs-web crawl https://example.com/research --topic "acupuncture" --max-sources 30
Outputs a list of discovered sources (title, URL, source type). No LLM calls.
build — Crawl, extract, resolve, score → TRUGS graph
trugs-web build https://example.com/research \
--topic "acupuncture efficacy" \
--provider anthropic \
--out acupuncture.trug.json
Full pipeline: discovers sources, extracts entities/relations via LLM, resolves duplicates, scores credibility, writes a TRUGS 1.0 JSON graph. Requires $ANTHROPIC_API_KEY or $OPENAI_API_KEY in the environment (provider-dependent).
query — Traverse and find within a built graph
trugs-web query acupuncture.trug.json --q "sources for efficacy" --min-weight 0.6
Traverses the graph, returns matching nodes and paths as JSON.
synthesize — Render a markdown report from a graph
trugs-web synthesize acupuncture.trug.json \
--q "acupuncture evidence summary" \
--out report.md
Queries the graph and synthesizes findings into a human-readable markdown document.
Every verb documents examples and options: trugs-web <verb> --help.
Library Use
from trugs_web import (
build_graph, # async; orchestrates the full pipeline
load_graph, # load a .trug.json file
query_graph, # traverse and filter
generate_report, # async; synthesize markdown
TRUGSWebGraphBuilder, # low-level graph construction
EntityExtractor, # LLM-backed entity extraction
EntityResolver, # cross-reference deduplication
CredibilityScorer, # topology-aware scoring
)
# Programmatic graph building
builder = TRUGSWebGraphBuilder(name="myresearch", topic="my topic")
await builder.build("https://example.com", llm_provider="mock")
builder.save("output.trug.json", validate=True)
# Validation (graph_builder uses trugs_tools.validator internally)
graph = load_graph("output.trug.json")
from trugs_tools.validator import validate_trug
validate_trug(graph) # raises if invalid
Documentation
- ARCHITECTURE.md — v1 subsystem design, data flow, validation contract
- AGENT.md — agent/multi-turn patterns (if used with LLM orchestration)
- CHANGELOG.md — v1.0 → v2.0 migration, breaking changes
Status
Beta. v1 implements the Phase 1 ingestion and query pipeline. Hub federation, refresh, and self-developing loop reservation are deferred to Phase 2. Test coverage: 270 passing (respx-mocked HTTP, MockLLMClient). Pytest markers: robots, rate_limit, cost, secret, logging, graph_validation_e2e.
License
Apache-2.0. See LICENSE and NOTICE.
This license covers a one-way, passive tool. The reserved patent mechanism (US app 19/575,491) protects self-modifying graph substrates. See NOTICE for boundary and commercial licensing.
Contributing
PRs welcome. Issues: https://github.com/TRUGS-LLC/TRUGS-WEB/issues
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file trugs_web-2.0.0.tar.gz.
File metadata
- Download URL: trugs_web-2.0.0.tar.gz
- Upload date:
- Size: 57.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5595ce0185982a20ca611d8283e14fc2e75e13838a5adf4bbe143007baa03a2c
|
|
| MD5 |
5bab8c77257119115c263a42acc4c9bd
|
|
| BLAKE2b-256 |
17dab8616a2bcacaa2fe4f1bb818fe36ea6b6a205dbccb25bdca1f8abbfde000
|
File details
Details for the file trugs_web-2.0.0-py3-none-any.whl.
File metadata
- Download URL: trugs_web-2.0.0-py3-none-any.whl
- Upload date:
- Size: 43.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
009c55182809f192b4ce3fae7f45de762f04674c48baf5dc6e55ca600c7f0938
|
|
| MD5 |
072db2d2758c0de2943469f87b3979c0
|
|
| BLAKE2b-256 |
79cb04e4411a5ebffe6234da2774816c95c516f563b940c31aa0f4f277d45099
|