Skip to main content

Deterministic universal runtime extraction infrastructure with replay-safe graphs and Kaalka persistence

Project description


WebWeaveX v2.0.0

Deterministic runtime extraction and replay-safe operational cognition infrastructure

PyPI version Python 3.10+ Apache 2.0 Tests passing Coverage 90%+ Build passing Deterministic runtime Replay-safe Kaalka verified Production ready Open Source

Buy Me a Coffee



Contents


What is WebWeaveX?

WebWeaveX is deterministic runtime extraction and operational cognition infrastructure. It captures how software actually runs—browser DOM, authenticated sessions, Electron state, native UI, workflows, and connector surfaces—and compiles that into replay-safe runtime graphs with Kaalka-encrypted persistence.

Why it exists

Modern systems are authenticated, stateful, runtime-driven, SPA-based, Electron-based, synchronized, and operationally dynamic. Operators need continuity across runs, not another HTML snapshot.

Traditional extraction fails because it is:

Failure mode Consequence
HTML-only parsing Misses hydration, storage, IPC, native UI
Stateless requests Loses session and workflow continuity
No authenticated persistence Re-login and drift between runs
No replay contract Cannot prove equivalence after rebuild
No reconstruction Cannot rebuild operational topology from IR
Weak SPA/Electron support Unstable IDs, routes, and storage break diffs

WebWeaveX exists to deliver deterministic runtime extraction and replay-safe operational reconstruction through one canonical pipeline.


What WebWeaveX is NOT

WebWeaveX is not:

Category Clarification
Auth bypass tooling Does not defeat MFA, CAPTCHA, or login controls
Malware or exploit infrastructure Not designed for unauthorized access
Credential theft tooling Does not harvest secrets you do not already hold
CAPTCHA bypass software No circumvention of bot defenses
Browser exploitation tooling Not a vulnerability framework
AGI or “autonomous hacking” No probabilistic agent that “figures out” sites
Hacking infrastructure No unauthorized intrusion features
An LLM wrapper Core path is deterministic; optional plugins fail safe
A chatbot Infrastructure library, not conversational AI

WebWeaveX only operates on authorized authenticated runtimes and data you explicitly provide.


Why existing systems fail

System Strength Limitation for operational runtime
BeautifulSoup Fast static HTML parse No live session, storage, or runtime graph
Selenium Browser automation No unified IR, Kaalka fabric, or replay equivalence layer
Playwright Reliable browser control Automation driver—not extraction + memory + reconstruction
Puppeteer Chromium scripting Same gap: no federated sync or deterministic checkpoints
Traditional crawlers Scale on public pages Stateless; poor on authenticated SPAs
Generic AI agents Flexible tasks Probabilistic; weak replay and audit guarantees

Common gaps WebWeaveX addresses:

  • Lack of runtime continuity across processes
  • Lack of replay and fingerprint equivalence
  • Lack of authenticated persistence (encrypted, deterministic)
  • Lack of reconstruction from structured IR
  • Lack of synchronization between browser, semantic, workflow, and memory layers

Core capabilities

Capability Description
Browser runtime extraction Bounded Playwright capture, network/session envelopes
SPA stabilization DOM and route stabilization for framework noise
Electron extraction Routes, IPC, storage metadata, deterministic Electron hash
Native runtime cognition Desktop, terminal, VM, remote (graceful OS fallbacks)
Terminal runtime Shell-oriented cognition fixtures
Distributed extraction Autonomous workers + Kaalka checkpoints
Runtime causality Event chains and propagation in extraction fabrics
Semantic cognition Entities, ontology, semantic graphs
Workflow runtime Plans, objectives, workflow memory
Synchronization runtime Multi-source runtime alignment
Reconstruction engine Replay-safe rebuild from IR
Federated memory Deterministic merge and stable hashes
Execution sandbox Allowlisted actions only
Runtime replay validate_replay_equivalence()
Runtime graph Normalized universal runtime graph
Deterministic fingerprints Global and pipeline hashes
Authenticated runtime continuation Encrypted session reload
Kaalka deterministic encryption Stable ciphertext; cross-language vectors
Connector runtime fabric Database, API, container, K8s, telemetry (bounded)

Authenticated runtime continuation

Modern applications authenticate with cookies, localStorage, sessionStorage, tokens, runtime identity, and cross-navigation continuity. Electron adds IndexedDB metadata, IPC, and route state. Multi-tab products add synchronization state across surfaces.

WebWeaveX supports:

  • Encrypted authenticated session persistence (save_encrypted_session, session paths on extract_web)
  • Runtime continuation across extractions when you supply the same Kaalka key and session file
  • Deterministic replay-safe reconstruction of operational graphs from IR

Persistence uses Kaalka deterministic encryption (algorithm: kaalka)—not plaintext JSON checkpoints on disk.

Stored surface Mechanism
Cookies / headers Encrypted session store
Browser snapshot Session + identity engines
Electron storage Native/Electron cognition (bounded)
Workflow / sync state Kaalka checkpoint engines

WebWeaveX does not: bypass auth, defeat MFA, bypass security controls, or access systems without authorization.

WebWeaveX only operates on authorized authenticated runtimes explicitly provided by the user.

from webweavex import extract_web

result = extract_web(
    "https://app.example.com/dashboard",
    authenticated=True,
    session_path="./session.kaalka",
    encryption_key="your-kaalka-master-key",
)

Architecture

                              ┌──────────────────┐
                              │      Input       │
                              │  UniversalInput  │
                              └────────┬─────────┘
                                       │
                                       ▼
                              ┌──────────────────┐
                              │ Canonical Pipeline│
                              │ run_canonical_    │
                              │   pipeline()      │
                              └────────┬─────────┘
                                       │
                                       ▼
                              ┌──────────────────┐
                              │ Runtime Cognition │
                              │ web·native·repo   │
                              └────────┬─────────┘
                                       │
           ┌───────────────────────────┼───────────────────────────┐
           ▼                           ▼                           ▼
    ┌─────────────┐            ┌─────────────┐            ┌─────────────┐
    │  Semantic   │            │  Causality  │            │  Workflow   │
    │   Layer     │            │   Layer     │            │  Runtime    │
    └──────┬──────┘            └──────┬──────┘            └──────┬──────┘
           │                          │                          │
           └──────────────────────────┼──────────────────────────┘
                                      ▼
                             ┌─────────────────┐
                             │ Synchronization │
                             │    Runtime      │
                             └────────┬────────┘
                                      ▼
                             ┌─────────────────┐
                             │ Federated Memory│
                             └────────┬────────┘
                                      ▼
                             ┌─────────────────┐
                             │ Execution Fabric│
                             └────────┬────────┘
                                      ▼
                             ┌─────────────────┐
                             │ Reconstruction  │
                             │    Engine       │
                             └────────┬────────┘
                                      ▼
                             ┌─────────────────┐
                             │ Universal Runtime│
                             │     Graph        │
                             └─────────────────┘

Source: core/kernel/runtime_pipeline.py


Canonical pipeline

Single production execution path—no shadow orchestrators.

from webweavex import UniversalInput, run_canonical_pipeline

result = run_canonical_pipeline(
    UniversalInput(source="https://example.com", source_type="web"),
)

print(result["pipeline_hash"])
print(len(result["unified_runtime_graph"].get("nodes", [])))
Property Detail
Single execution path run_canonical_pipeline() only
Deterministic normalization RuntimeGraphContract.normalize()
Replay-safe runtime Fingerprint at pipeline boundary
Canonical IR generation Per-kind extraction → kernel phases

Quick start

pip install webweavex
pip install "webweavex[browser]"
pip install "webweavex[full]"
python -c "import webweavex; print(webweavex.__version__)"
# 2.0.0

Real code examples

Browser, auth, replay, semantic, reconstruction, distributed, native

Browser extraction

from webweavex import extract_web, compute_global_runtime_fingerprint

out = extract_web("https://example.com")
print(out.get("bounded"), compute_global_runtime_fingerprint(out))

Authenticated runtime persistence

from webweavex import save_encrypted_session, extract_web

save_encrypted_session(
    "./session.kaalka",
    {"cookies": [], "headers": {}, "auth_tokens": []},
    "your-kaalka-master-key",
)

out = extract_web(
    "https://app.example.com",
    authenticated=True,
    session_path="./session.kaalka",
    encryption_key="your-kaalka-master-key",
)

Runnable: examples/authenticated_extraction.py

Replay equivalence

from webweavex import validate_replay_equivalence

assert validate_replay_equivalence(original, replayed)["equivalent"]

Semantic runtime

out = extract_web("https://example.com", semantic_runtime=True)

Reconstruction

from webweavex import run_reconstruction_runtime

rebuilt = run_reconstruction_runtime(
    sources={"extraction": prior},
    runtime_type="browser",
)

Distributed extraction

from webweavex import run_autonomous_extraction

out = run_autonomous_extraction(
    tasks=[{"task_id": "t1", "url": "https://example.com", "priority": 0}],
)

Native extraction

from webweavex import extract_native

out = extract_native(runtime="desktop", application="notepad")

Determinism

Mechanism Role
compute_global_runtime_fingerprint() Cross-run runtime digest
validate_replay_equivalence() Graph + fingerprint + topology checks
compute_stable_dom_hash() DOM meaning stable under attribute noise
SPA stabilizer Framework route/state freeze
stable_memory_hash() Ordered federated memory merge
Kaalka encrypt_value Identical plaintext + key → identical ciphertext

Python ↔ JS consistency: reference vectors in validation/kaalka_cross_language/ validate hash and encrypt stability across runtimes.

Limitation: two live fetches of a dynamic SPA may differ; identical captured bytes → identical stabilized hashes.


Reconstruction engine

WebWeaveX reconstructs operational structure from runtime IR:

  • Runtime topology and unified graphs
  • Workflow and application memory views
  • Browser/application state envelopes
  • Semantic operational graphs
Property Meaning
Runtime reconstruction IR → bounded runtime view
Operational graph rebuilding Normalized nodes/edges
Replay-safe reconstruction Tested equivalence paths
Deterministic recreation Sorted, canonical structures

This is not full machine cloning or sci-fi simulation—it is auditable operational recreation for engineering workflows.


Real validation

Validation commands and CI gates
Metric Value
Tests 760+ passing (pytest -q)
Scoped coverage ≥ 90% (production packages in pyproject.toml)
Wheel webweavex-2.0.0-py3-none-any.whl
Replay validate_replay_equivalence suite
Determinism Kaalka cross-language + fingerprint tests
Playwright Browser extraction paths (optional extra)
Native Orchestrator + platform fallbacks
Distributed Autonomous extraction tests
pytest -q
python -m build
python validation/final_production_master.py

Security model

Control Implementation
Allowlisted execution core/execution/ sandbox
No arbitrary eval/exec Forbidden in production paths
Sandboxed runtime Bounded simulate/rollback
Deterministic persistence Kaalka-only checkpoints
Encrypted memory/session encrypt_value, session wrappers
Replay-safe recovery Deterministic reload envelopes

See SECURITY.md. Report issues responsibly.


Architecture guarantees

Guarantee How
Deterministic outputs Canonical ordering, stable hashes
Replay-safe persistence Kaalka + equivalence validation
Bounded execution Explicit bounded: True contracts
Graceful degradation Playwright/native/connectors fail soft
Canonical normalization Graph and DOM contracts
Stable graph generation build_runtime_graph + normalize
Cross-language consistency Kaalka reference vectors

Contract document: WEBWEAVEX_v2_ARCHITECTURE_LOCK_REPORT.md


Repository structure

WebWeaveX/
├── core/           # Runtime infrastructure (kernel, browser, memory, sync, …)
├── webweavex/      # Public Python package
├── tests/          # 760+ tests
├── docs/           # Architecture, API, security, Kaalka, replay, validation
├── examples/       # Runnable scripts
├── validation/     # Production and real-world validators
└── .github/        # CI, templates, code of conduct, funding
Package Role
core/kernel/ Canonical pipeline, RuntimeKernel
core/browser/ Web extraction, DOM/SPA stabilization
core/crypto/ Kaalka engines
core/memory/ Federated memory fabric
core/synchronization/ Sync runtime
core/reconstruction/ Reconstruction orchestrator
core/replay/ Replay equivalence
webweavex/ Stable public API

Contributing

See CONTRIBUTING.md and .github/CODE_OF_CONDUCT.md.

Rule Requirement
Determinism No random / uuid4 in runtime paths
Replay safety Preserve graph normalization semantics
Canonical pipeline No parallel mega-orchestrators
Persistence Kaalka for new checkpoints
Tests pytest -q must pass; coverage gate ≥ 90% scoped

Roadmap

See ROADMAP.md.

v2.1 focus:

  • Deeper native bindings (UIA, AX, AT-SPI)
  • Distributed runtime infrastructure hardening
  • Stronger SPA normalization
  • Real connector runtimes (live Postgres, Redis, K8s validation)
  • Native OS integrations behind optional extras

License

Apache 2.0 — see LICENSE.


Final positioning

WebWeaveX is an attempt to build deterministic runtime cognition infrastructure for the authenticated operational web—where extraction means encrypted continuity, structured graphs, replay equivalence, and reconstruction, not disposable HTML dumps.

If this work helps your team, consider supporting it:

Buy Me a Coffee


Documentation · docs/ · examples/ · release report

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

webweavex-2.0.0.tar.gz (424.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

webweavex-2.0.0-py3-none-any.whl (966.2 kB view details)

Uploaded Python 3

File details

Details for the file webweavex-2.0.0.tar.gz.

File metadata

  • Download URL: webweavex-2.0.0.tar.gz
  • Upload date:
  • Size: 424.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for webweavex-2.0.0.tar.gz
Algorithm Hash digest
SHA256 829e5a92915d9328fa9b5126734460cfe6397891bba4788caccbebcb9d9944d5
MD5 19f546bcf177f57cb72e6e49be6ab256
BLAKE2b-256 1387578d3c5b336d7ea6a388b2317e7237ebd1e7b26db267fc0405254c2f9428

See more details on using hashes here.

File details

Details for the file webweavex-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: webweavex-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 966.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.0

File hashes

Hashes for webweavex-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8c883375a62f04cd3e22cd8fab3e1476c7a4cb3ecdca61dec46da209a5ebb260
MD5 78beae39d8d98991589f6fe271f5cec4
BLAKE2b-256 d52f7f75de469ee352720c61a37bfc6f7b7c81978fcb7b916a9bc7a3645f708d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page