Deterministic universal runtime extraction infrastructure with replay-safe graphs and Kaalka persistence
Project description
Deterministic runtime extraction and replay-safe operational cognition infrastructure
Contents
- What is WebWeaveX?
- What WebWeaveX is NOT
- Why existing systems fail
- Core capabilities
- Authenticated runtime continuation
- Architecture
- Canonical pipeline
- Quick start
- Code examples
- Determinism
- Validation
- Security
- Architecture guarantees
- Contributing
What is WebWeaveX?
WebWeaveX is deterministic runtime extraction and operational cognition infrastructure. It captures how software actually runs—browser DOM, authenticated sessions, Electron state, native UI, workflows, and connector surfaces—and compiles that into replay-safe runtime graphs with Kaalka-encrypted persistence.
Why it exists
Modern systems are authenticated, stateful, runtime-driven, SPA-based, Electron-based, synchronized, and operationally dynamic. Operators need continuity across runs, not another HTML snapshot.
Traditional extraction fails because it is:
| Failure mode | Consequence |
|---|---|
| HTML-only parsing | Misses hydration, storage, IPC, native UI |
| Stateless requests | Loses session and workflow continuity |
| No authenticated persistence | Re-login and drift between runs |
| No replay contract | Cannot prove equivalence after rebuild |
| No reconstruction | Cannot rebuild operational topology from IR |
| Weak SPA/Electron support | Unstable IDs, routes, and storage break diffs |
WebWeaveX exists to deliver deterministic runtime extraction and replay-safe operational reconstruction through one canonical pipeline.
What WebWeaveX is NOT
WebWeaveX is not:
| Category | Clarification |
|---|---|
| Auth bypass tooling | Does not defeat MFA, CAPTCHA, or login controls |
| Malware or exploit infrastructure | Not designed for unauthorized access |
| Credential theft tooling | Does not harvest secrets you do not already hold |
| CAPTCHA bypass software | No circumvention of bot defenses |
| Browser exploitation tooling | Not a vulnerability framework |
| AGI or “autonomous hacking” | No probabilistic agent that “figures out” sites |
| Hacking infrastructure | No unauthorized intrusion features |
| An LLM wrapper | Core path is deterministic; optional plugins fail safe |
| A chatbot | Infrastructure library, not conversational AI |
WebWeaveX only operates on authorized authenticated runtimes and data you explicitly provide.
Why existing systems fail
| System | Strength | Limitation for operational runtime |
|---|---|---|
| BeautifulSoup | Fast static HTML parse | No live session, storage, or runtime graph |
| Selenium | Browser automation | No unified IR, Kaalka fabric, or replay equivalence layer |
| Playwright | Reliable browser control | Automation driver—not extraction + memory + reconstruction |
| Puppeteer | Chromium scripting | Same gap: no federated sync or deterministic checkpoints |
| Traditional crawlers | Scale on public pages | Stateless; poor on authenticated SPAs |
| Generic AI agents | Flexible tasks | Probabilistic; weak replay and audit guarantees |
Common gaps WebWeaveX addresses:
- Lack of runtime continuity across processes
- Lack of replay and fingerprint equivalence
- Lack of authenticated persistence (encrypted, deterministic)
- Lack of reconstruction from structured IR
- Lack of synchronization between browser, semantic, workflow, and memory layers
Core capabilities
| Capability | Description |
|---|---|
| Browser runtime extraction | Bounded Playwright capture, network/session envelopes |
| SPA stabilization | DOM and route stabilization for framework noise |
| Electron extraction | Routes, IPC, storage metadata, deterministic Electron hash |
| Native runtime cognition | Desktop, terminal, VM, remote (graceful OS fallbacks) |
| Terminal runtime | Shell-oriented cognition fixtures |
| Distributed extraction | Autonomous workers + Kaalka checkpoints |
| Runtime causality | Event chains and propagation in extraction fabrics |
| Semantic cognition | Entities, ontology, semantic graphs |
| Workflow runtime | Plans, objectives, workflow memory |
| Synchronization runtime | Multi-source runtime alignment |
| Reconstruction engine | Replay-safe rebuild from IR |
| Federated memory | Deterministic merge and stable hashes |
| Execution sandbox | Allowlisted actions only |
| Runtime replay | validate_replay_equivalence() |
| Runtime graph | Normalized universal runtime graph |
| Deterministic fingerprints | Global and pipeline hashes |
| Authenticated runtime continuation | Encrypted session reload |
| Kaalka deterministic encryption | Stable ciphertext; cross-language vectors |
| Connector runtime fabric | Database, API, container, K8s, telemetry (bounded) |
Authenticated runtime continuation
Modern applications authenticate with cookies, localStorage, sessionStorage, tokens, runtime identity, and cross-navigation continuity. Electron adds IndexedDB metadata, IPC, and route state. Multi-tab products add synchronization state across surfaces.
WebWeaveX supports:
- Encrypted authenticated session persistence (
save_encrypted_session, session paths onextract_web) - Runtime continuation across extractions when you supply the same Kaalka key and session file
- Deterministic replay-safe reconstruction of operational graphs from IR
Persistence uses Kaalka deterministic encryption (algorithm: kaalka)—not plaintext JSON checkpoints on disk.
| Stored surface | Mechanism |
|---|---|
| Cookies / headers | Encrypted session store |
| Browser snapshot | Session + identity engines |
| Electron storage | Native/Electron cognition (bounded) |
| Workflow / sync state | Kaalka checkpoint engines |
WebWeaveX does not: bypass auth, defeat MFA, bypass security controls, or access systems without authorization.
WebWeaveX only operates on authorized authenticated runtimes explicitly provided by the user.
from webweavex import extract_web
result = extract_web(
"https://app.example.com/dashboard",
authenticated=True,
session_path="./session.kaalka",
encryption_key="your-kaalka-master-key",
)
Architecture
┌──────────────────┐
│ Input │
│ UniversalInput │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Canonical Pipeline│
│ run_canonical_ │
│ pipeline() │
└────────┬─────────┘
│
▼
┌──────────────────┐
│ Runtime Cognition │
│ web·native·repo │
└────────┬─────────┘
│
┌───────────────────────────┼───────────────────────────┐
▼ ▼ ▼
┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Semantic │ │ Causality │ │ Workflow │
│ Layer │ │ Layer │ │ Runtime │
└──────┬──────┘ └──────┬──────┘ └──────┬──────┘
│ │ │
└──────────────────────────┼──────────────────────────┘
▼
┌─────────────────┐
│ Synchronization │
│ Runtime │
└────────┬────────┘
▼
┌─────────────────┐
│ Federated Memory│
└────────┬────────┘
▼
┌─────────────────┐
│ Execution Fabric│
└────────┬────────┘
▼
┌─────────────────┐
│ Reconstruction │
│ Engine │
└────────┬────────┘
▼
┌─────────────────┐
│ Universal Runtime│
│ Graph │
└─────────────────┘
Source: core/kernel/runtime_pipeline.py
Canonical pipeline
Single production execution path—no shadow orchestrators.
from webweavex import UniversalInput, run_canonical_pipeline
result = run_canonical_pipeline(
UniversalInput(source="https://example.com", source_type="web"),
)
print(result["pipeline_hash"])
print(len(result["unified_runtime_graph"].get("nodes", [])))
| Property | Detail |
|---|---|
| Single execution path | run_canonical_pipeline() only |
| Deterministic normalization | RuntimeGraphContract.normalize() |
| Replay-safe runtime | Fingerprint at pipeline boundary |
| Canonical IR generation | Per-kind extraction → kernel phases |
Quick start
pip install webweavex
pip install "webweavex[browser]"
pip install "webweavex[full]"
python -c "import webweavex; print(webweavex.__version__)"
# 2.0.0
Real code examples
Browser, auth, replay, semantic, reconstruction, distributed, native
Browser extraction
from webweavex import extract_web, compute_global_runtime_fingerprint
out = extract_web("https://example.com")
print(out.get("bounded"), compute_global_runtime_fingerprint(out))
Authenticated runtime persistence
from webweavex import save_encrypted_session, extract_web
save_encrypted_session(
"./session.kaalka",
{"cookies": [], "headers": {}, "auth_tokens": []},
"your-kaalka-master-key",
)
out = extract_web(
"https://app.example.com",
authenticated=True,
session_path="./session.kaalka",
encryption_key="your-kaalka-master-key",
)
Runnable: examples/authenticated_extraction.py
Replay equivalence
from webweavex import validate_replay_equivalence
assert validate_replay_equivalence(original, replayed)["equivalent"]
Semantic runtime
out = extract_web("https://example.com", semantic_runtime=True)
Reconstruction
from webweavex import run_reconstruction_runtime
rebuilt = run_reconstruction_runtime(
sources={"extraction": prior},
runtime_type="browser",
)
Distributed extraction
from webweavex import run_autonomous_extraction
out = run_autonomous_extraction(
tasks=[{"task_id": "t1", "url": "https://example.com", "priority": 0}],
)
Native extraction
from webweavex import extract_native
out = extract_native(runtime="desktop", application="notepad")
Determinism
| Mechanism | Role |
|---|---|
compute_global_runtime_fingerprint() |
Cross-run runtime digest |
validate_replay_equivalence() |
Graph + fingerprint + topology checks |
compute_stable_dom_hash() |
DOM meaning stable under attribute noise |
| SPA stabilizer | Framework route/state freeze |
stable_memory_hash() |
Ordered federated memory merge |
Kaalka encrypt_value |
Identical plaintext + key → identical ciphertext |
Python ↔ JS consistency: reference vectors in validation/kaalka_cross_language/ validate hash and encrypt stability across runtimes.
Limitation: two live fetches of a dynamic SPA may differ; identical captured bytes → identical stabilized hashes.
Reconstruction engine
WebWeaveX reconstructs operational structure from runtime IR:
- Runtime topology and unified graphs
- Workflow and application memory views
- Browser/application state envelopes
- Semantic operational graphs
| Property | Meaning |
|---|---|
| Runtime reconstruction | IR → bounded runtime view |
| Operational graph rebuilding | Normalized nodes/edges |
| Replay-safe reconstruction | Tested equivalence paths |
| Deterministic recreation | Sorted, canonical structures |
This is not full machine cloning or sci-fi simulation—it is auditable operational recreation for engineering workflows.
Real validation
Validation commands and CI gates
| Metric | Value |
|---|---|
| Tests | 760+ passing (pytest -q) |
| Scoped coverage | ≥ 90% (production packages in pyproject.toml) |
| Wheel | webweavex-2.0.0-py3-none-any.whl |
| Replay | validate_replay_equivalence suite |
| Determinism | Kaalka cross-language + fingerprint tests |
| Playwright | Browser extraction paths (optional extra) |
| Native | Orchestrator + platform fallbacks |
| Distributed | Autonomous extraction tests |
pytest -q
python -m build
python validation/final_production_master.py
Security model
| Control | Implementation |
|---|---|
| Allowlisted execution | core/execution/ sandbox |
| No arbitrary eval/exec | Forbidden in production paths |
| Sandboxed runtime | Bounded simulate/rollback |
| Deterministic persistence | Kaalka-only checkpoints |
| Encrypted memory/session | encrypt_value, session wrappers |
| Replay-safe recovery | Deterministic reload envelopes |
See SECURITY.md. Report issues responsibly.
Architecture guarantees
| Guarantee | How |
|---|---|
| Deterministic outputs | Canonical ordering, stable hashes |
| Replay-safe persistence | Kaalka + equivalence validation |
| Bounded execution | Explicit bounded: True contracts |
| Graceful degradation | Playwright/native/connectors fail soft |
| Canonical normalization | Graph and DOM contracts |
| Stable graph generation | build_runtime_graph + normalize |
| Cross-language consistency | Kaalka reference vectors |
Contract document: WEBWEAVEX_v2_ARCHITECTURE_LOCK_REPORT.md
Repository structure
WebWeaveX/
├── core/ # Runtime infrastructure (kernel, browser, memory, sync, …)
├── webweavex/ # Public Python package
├── tests/ # 760+ tests
├── docs/ # Architecture, API, security, Kaalka, replay, validation
├── examples/ # Runnable scripts
├── validation/ # Production and real-world validators
└── .github/ # CI, templates, code of conduct, funding
| Package | Role |
|---|---|
core/kernel/ |
Canonical pipeline, RuntimeKernel |
core/browser/ |
Web extraction, DOM/SPA stabilization |
core/crypto/ |
Kaalka engines |
core/memory/ |
Federated memory fabric |
core/synchronization/ |
Sync runtime |
core/reconstruction/ |
Reconstruction orchestrator |
core/replay/ |
Replay equivalence |
webweavex/ |
Stable public API |
Contributing
See CONTRIBUTING.md and .github/CODE_OF_CONDUCT.md.
| Rule | Requirement |
|---|---|
| Determinism | No random / uuid4 in runtime paths |
| Replay safety | Preserve graph normalization semantics |
| Canonical pipeline | No parallel mega-orchestrators |
| Persistence | Kaalka for new checkpoints |
| Tests | pytest -q must pass; coverage gate ≥ 90% scoped |
Roadmap
See ROADMAP.md.
v2.1 focus:
- Deeper native bindings (UIA, AX, AT-SPI)
- Distributed runtime infrastructure hardening
- Stronger SPA normalization
- Real connector runtimes (live Postgres, Redis, K8s validation)
- Native OS integrations behind optional extras
License
Apache 2.0 — see LICENSE.
Final positioning
WebWeaveX is an attempt to build deterministic runtime cognition infrastructure for the authenticated operational web—where extraction means encrypted continuity, structured graphs, replay equivalence, and reconstruction, not disposable HTML dumps.
If this work helps your team, consider supporting it:
Documentation · docs/ · examples/ · release report
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file webweavex-2.0.0.tar.gz.
File metadata
- Download URL: webweavex-2.0.0.tar.gz
- Upload date:
- Size: 424.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
829e5a92915d9328fa9b5126734460cfe6397891bba4788caccbebcb9d9944d5
|
|
| MD5 |
19f546bcf177f57cb72e6e49be6ab256
|
|
| BLAKE2b-256 |
1387578d3c5b336d7ea6a388b2317e7237ebd1e7b26db267fc0405254c2f9428
|
File details
Details for the file webweavex-2.0.0-py3-none-any.whl.
File metadata
- Download URL: webweavex-2.0.0-py3-none-any.whl
- Upload date:
- Size: 966.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8c883375a62f04cd3e22cd8fab3e1476c7a4cb3ecdca61dec46da209a5ebb260
|
|
| MD5 |
78beae39d8d98991589f6fe271f5cec4
|
|
| BLAKE2b-256 |
d52f7f75de469ee352720c61a37bfc6f7b7c81978fcb7b916a9bc7a3645f708d
|