Skip to main content

TRUGS web research tool — crawl web sources and build passive TRUGS knowledge graphs. One-way and passive: it builds graphs, it never closes the self-developing loop. Single `trugs-web` binary.

Project description

trugs-web

Web-to-TRUG builder. One binary, trugs-web — crawl web sources, extract entities and relations with LLM assistance (model-agnostic), resolve and score credibility, and build a passive TRUGS knowledge graph for querying. It builds graphs; it never closes the self-developing loop (reserved patent mechanism, US app 19/575,491).

What & Why

trugs-web is a one-way, passive research tool. You point it at seed URLs; it discovers sources, extracts structured knowledge (entities, relations, citations) via LLM-backed natural language processing, resolves entity identity across sources, scores credibility from topology and metadata, and emits a TRUGS 1.0 format graph. That graph is a passive data structure—queryable, traversable, reportable—but inert. The tool does not modify it in place; no agent closes the feedback loop. This boundary is licensed: the reserved patent (US 19/575,491) protects self-modifying graph substrates. A downstream user who wires this tool's passive output into a self-modifying agent operates outside this grant.

As a T2 reference application (sibling of trugs-folder), trugs-web depends downward on the T1 commons: trugs-tools>=2.0.0 (language, validator) and trugs-store>=2.0.0 (graph persistence). v1 ships the ingestion pipeline (crawler, extractor, resolver, credibility scorer, graph builder), query subsystem (loader, traverse, synthesize), and weight computation (topology-based importance ranking). Hub federation and refresh are deferred to Phase 2 (see deferred_phase2/).

Key Features

  • Source discovery: crawl seed URLs, discover linked pages, respect robots.txt, apply rate limits and exponential backoff
  • Entity & relation extraction: LLM-backed NLP (model-agnostic; use Anthropic Claude, OpenAI GPT, or mock for testing)
  • Cross-reference resolution: deduplicate entities across sources by name, description, and context
  • Credibility scoring: topology-aware confidence weights on nodes and edges
  • TRUGS 1.0 graph output: validated against trugs_tools.validator.validate_trug; query-ready JSON
  • Query & traverse: load a graph, traverse by relation type, filter by weight threshold, synthesize findings into markdown reports
  • Safety rails: secrets from environment (never inline), LLM cost guard, inter-request delays, structured logging

Quick Example

import asyncio
from trugs_web import build_graph, load_graph, query_graph, generate_report

# 1. Build a passive graph from seed URLs (mock LLM provider for testing)
async def main():
    builder = await build_graph(
        topic="acupuncture evidence",
        seed_urls=["https://example.com/research"],
        llm_provider="mock",  # or "anthropic", "openai"
        output_path="acupuncture.trug.json"
    )
    print(f"Built: {len(builder.graph['nodes'])} nodes, {len(builder.graph['edges'])} edges")

    # 2. Load and query the graph
    graph = load_graph("acupuncture.trug.json")
    results = query_graph(graph, "sources for acupuncture efficacy", min_weight=0.5)
    print(results)

    # 3. Synthesize a markdown report
    report = await generate_report(graph, "acupuncture efficacy evidence")
    print(report.to_markdown())

asyncio.run(main())

Installation

Requirements: Python ≥ 3.11

# Minimal (graph building + querying, no crawling or LLM)
pip install trugs-web

# With web crawling (httpx, beautifulsoup4, lxml)
pip install "trugs-web[web]"

# With LLM support (anthropic, openai)
pip install "trugs-web[llm]"

# Both
pip install "trugs-web[web,llm]"

# Verify
trugs-web --version

Usage

The trugs-web binary exposes four verbs:

crawl — Discover sources from seed URLs (no LLM)

trugs-web crawl https://example.com/research --topic "acupuncture" --max-sources 30

Outputs a list of discovered sources (title, URL, source type). No LLM calls.

build — Crawl, extract, resolve, score → TRUGS graph

trugs-web build https://example.com/research \
  --topic "acupuncture efficacy" \
  --provider anthropic \
  --out acupuncture.trug.json

Full pipeline: discovers sources, extracts entities/relations via LLM, resolves duplicates, scores credibility, writes a TRUGS 1.0 JSON graph. Requires $ANTHROPIC_API_KEY or $OPENAI_API_KEY in the environment (provider-dependent).

query — Traverse and find within a built graph

trugs-web query acupuncture.trug.json --q "sources for efficacy" --min-weight 0.6

Traverses the graph, returns matching nodes and paths as JSON.

synthesize — Render a markdown report from a graph

trugs-web synthesize acupuncture.trug.json \
  --q "acupuncture evidence summary" \
  --out report.md

Queries the graph and synthesizes findings into a human-readable markdown document.

Every verb documents examples and options: trugs-web <verb> --help.

Library Use

from trugs_web import (
    build_graph,            # async; orchestrates the full pipeline
    load_graph,             # load a .trug.json file
    query_graph,            # traverse and filter
    generate_report,        # async; synthesize markdown
    TRUGSWebGraphBuilder,   # low-level graph construction
    EntityExtractor,        # LLM-backed entity extraction
    EntityResolver,         # cross-reference deduplication
    CredibilityScorer,      # topology-aware scoring
)

# Programmatic graph building
builder = TRUGSWebGraphBuilder(name="myresearch", topic="my topic")
await builder.build("https://example.com", llm_provider="mock")
builder.save("output.trug.json", validate=True)

# Validation (graph_builder uses trugs_tools.validator internally)
graph = load_graph("output.trug.json")
from trugs_tools.validator import validate_trug
validate_trug(graph)  # raises if invalid

Documentation

  • ARCHITECTURE.md — v1 subsystem design, data flow, validation contract
  • AGENT.md — agent/multi-turn patterns (if used with LLM orchestration)
  • CHANGELOG.md — v1.0 → v2.0 migration, breaking changes

Status

Beta. v1 implements the Phase 1 ingestion and query pipeline. Hub federation, refresh, and self-developing loop reservation are deferred to Phase 2. Test coverage: 270 passing (respx-mocked HTTP, MockLLMClient). Pytest markers: robots, rate_limit, cost, secret, logging, graph_validation_e2e.

License

Apache-2.0. See LICENSE and NOTICE.

This license covers a one-way, passive tool. The reserved patent mechanism (US app 19/575,491) protects self-modifying graph substrates. See NOTICE for boundary and commercial licensing.

Contributing

PRs welcome. Issues: https://github.com/TRUGS-LLC/TRUGS-WEB/issues

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trugs_web-2.0.0.tar.gz (57.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

trugs_web-2.0.0-py3-none-any.whl (43.6 kB view details)

Uploaded Python 3

File details

Details for the file trugs_web-2.0.0.tar.gz.

File metadata

  • Download URL: trugs_web-2.0.0.tar.gz
  • Upload date:
  • Size: 57.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for trugs_web-2.0.0.tar.gz
Algorithm Hash digest
SHA256 5595ce0185982a20ca611d8283e14fc2e75e13838a5adf4bbe143007baa03a2c
MD5 5bab8c77257119115c263a42acc4c9bd
BLAKE2b-256 17dab8616a2bcacaa2fe4f1bb818fe36ea6b6a205dbccb25bdca1f8abbfde000

See more details on using hashes here.

File details

Details for the file trugs_web-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: trugs_web-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 43.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for trugs_web-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 009c55182809f192b4ce3fae7f45de762f04674c48baf5dc6e55ca600c7f0938
MD5 072db2d2758c0de2943469f87b3979c0
BLAKE2b-256 79cb04e4411a5ebffe6234da2774816c95c516f563b940c31aa0f4f277d45099

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page