Skip to main content

FaultRay — Zero-risk infrastructure and AI agent chaos engineering. Simulate agent resilience failures and prove your availability ceiling mathematically.

Reason this release was yanked:

Relicensed to Apache 2.0. Please upgrade to v11.2.0.

Project description

FaultRay

DORA-aligned Resilience Research Prototype — Without Touching Production

PyPI Downloads Python 3.11+ License DOI CI Live Demo Resilience Score Tests

☁️ Try FaultRay Cloud — No setup required  |  Live Demo


FaultRay simulates hundreds to thousands of failure scenarios entirely in memory — mathematically proving your availability ceiling before anything breaks. Built for financial institutions that need to prove DORA compliance without risking production systems.

Screenshots

Dashboard
Resilience Dashboard
Heatmap
Failure Heatmap
Topology
Dependency Topology
Cost Analysis
Financial Impact Analysis
Compliance
DORA Compliance Dashboard

Demo

pip install faultray
faultray demo
Building demo infrastructure...
╭────────────────────────────────────────────────────╮
│ Metric           │ Value                           │
│ Components       │ 9                               │
│ Dependencies     │ 12                              │
│ Resilience Score │ 50.0/100                        │
╰────────────────────────────────────────────────────╯

Running chaos simulation...

╭────────── FaultRay Chaos Simulation Report ──────────╮
│ Resilience Score: 50/100                             │
│ Scenarios tested: 255                                │
│ Critical: 21  Warning: 84  Passed: 150               │
╰──────────────────────────────────────────────────────╯

  Generate HTML report: faultray simulate --html report.html
  Generate DORA evidence: faultray dora evidence infra.yaml

Why Financial Institutions Choose FaultRay

Traditional chaos engineering tools (Gremlin, Steadybit, AWS FIS) inject real failures into production. For banks, insurers, and payment processors operating under DORA, that approach creates unacceptable risk.

FaultRay takes a fundamentally different approach: mathematical simulation. Your trading systems stay online. Your payment rails keep running. You still get the evidence regulators need.

Gremlin Steadybit AWS FIS FaultRay
Approach Fault injection Fault injection (with safety) Fault injection Math simulation
Production risk Medium-High Low-Medium (blast radius controls) Medium Zero
Setup Agent per host Agent per host AWS only pip install
DORA evidence Reporting available Reporting available CloudWatch logs Audit-ready reports
AI agent testing No No No Yes
Cost $$$$ $$$ $$ Free tier / Enterprise

DORA Compliance — All 5 Pillars

FaultRay maps directly to the EU Digital Operational Resilience Act (Regulation EU 2022/2554), fully effective since January 17, 2025. Non-compliance carries fines up to 2% of global annual turnover.

Full DORA Command Suite

# Pillar 1: ICT Risk Management (Articles 5-16)
faultray dora assess model.json              # 52-control compliance check
faultray dora risk-assessment model.json     # Comprehensive risk evaluation
faultray dora gap-analysis model.json        # Control gaps + remediation

# Pillar 2: Incident Management (Articles 17-23)
faultray dora incident-assess model.json     # Incident readiness evaluation

# Pillar 3: Resilience Testing (Articles 24-27)
faultray simulate --model model.json --json  # chaos scenario simulation
faultray dora test-plan model.json           # Generate resilience test plan
faultray dora tlpt-readiness model.json      # TLPT preparation assessment

# Pillar 4: Third-Party Risk (Articles 28-30)
faultray dora concentration-risk model.json  # ICT concentration risk (HHI)
faultray dora register model.json            # RTS 2024/1774 register

# Pillar 5: Information Sharing (Article 45)
# Integrated threat intelligence from CVE/CISA advisories

# Evidence & Reporting
faultray dora evidence model.json            # Audit-ready evidence package
faultray dora report model.json              # HTML report for regulators
faultray dora rts-export model.json --format csv  # Machine-readable export

What Regulators See

FaultRay generates timestamped, signed evidence packages that map every finding to specific DORA articles and RTS requirements:

  • RTS 2024/1774 — ICT Risk Management Framework details
  • ITS 2024/2956 — Register of Information templates
  • RTS 2025/301 — Incident reporting content and timelines

Quick Start

1. Terraform Safety Net (CI/CD Integration)

terraform plan -out=plan.out
terraform show -json plan.out > plan.json
faultray tf-check plan.json --fail-on-regression --min-score 60
# .github/workflows/terraform.yml
- name: Resilience Gate
  run: |
    pip install faultray
    terraform show -json plan.out > plan.json
    faultray tf-check plan.json --fail-on-regression --min-score 60

2. GitHub Action (Marketplace)

Add FaultRay to any CI/CD pipeline with our official GitHub Action:

# .github/workflows/resilience.yml
name: Resilience Check
on: [pull_request]
jobs:
  check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: mattyopon/faultray@v1
        with:
          plan-file: plan.json
          min-score: 60
          fail-on-regression: true
          financial: true

Or use it with a YAML infrastructure definition:

      - uses: mattyopon/faultray@v1
        with:
          yaml-file: infra.yaml
          financial: true
          cost-per-hour: 25000

Available inputs:

Input Description Default
plan-file Path to Terraform plan JSON file ''
yaml-file Path to infrastructure YAML file ''
min-score Minimum resilience score (0-100). Fails if below. 0
fail-on-regression Fail if resilience score drops from baseline false
financial Include financial impact analysis false
cost-per-hour Default cost per hour of downtime (USD) 10000

3. Define Your Infrastructure

# infra.yaml
components:
  - id: api-gateway
    type: load_balancer
    replicas: 2
  - id: trading-engine
    type: app_server
    replicas: 3
  - id: market-data
    type: database
    replicas: 1   # ← FaultRay flags this as SPOF

dependencies:
  - source: api-gateway
    target: trading-engine
    type: requires
  - source: trading-engine
    target: market-data
    type: requires
faultray load infra.yaml
faultray simulate --html report.html

4. AI Agent Testing

faultray agent assess ai-workflow.yaml     # Risk assessment
faultray agent scenarios ai-workflow.yaml  # What could go wrong?

Simulates AI-specific failures: hallucination cascades, context overflow, LLM rate limiting, token exhaustion, tool failures, agent loops, prompt injection.

Sensitivity Ratchet Simulation

Measure how much damage the sensitivity ratchet prevents. The ratchet is a security mechanism where an agent's outbound permissions narrow irreversibly once it accesses data above a certain sensitivity threshold (PUBLIC < INTERNAL < CONFIDENTIAL < RESTRICTED < TOP_SECRET).

faultray agent ratchet                        # Run all built-in scenarios
faultray agent ratchet --scenario exfiltration  # Single scenario
faultray agent ratchet --json                 # Machine-readable output

Built-in scenarios:

  • exfiltration — Agent reads classified data then tries to send externally
  • cross-agent — Agent A passes classified data to Agent B who attempts external send
  • escalation — Agent gradually accesses higher-sensitivity data

Each scenario runs twice (with and without the ratchet) and reports an effectiveness score showing how much data-leak damage the ratchet prevents.

5. Continuous Compliance Monitoring

faultray compliance-monitor model.json --framework dora  # DORA
faultray compliance-monitor model.json --framework soc2  # SOC 2
faultray compliance-monitor model.json --framework pci   # PCI DSS

Tracks compliance trends over 90 days with automated drift detection.

APM — Application Performance Monitoring

FaultRay includes a lightweight APM agent that collects real-time host metrics and feeds them to the FaultRay collector for anomaly detection, alerting, and topology-aware analysis.

# One-command interactive setup
faultray apm setup

# Or manual setup
faultray apm install --collector http://localhost:8080
faultray apm start
faultray apm status

Architecture

Your Hosts                          FaultRay Server
┌────────────────────────────┐      ┌──────────────────────────────┐
│  APM Agent  (each host)    │      │  Collector  faultray serve   │
│  ─────────────────────     │      │  ─────────────────────────── │
│  Collects every 15s:       │─────▶│  Time-Series DB              │
│  • CPU utilization         │ HTTP │  Anomaly Detection (Z-score) │
│  • Memory usage            │      │  Alert Rules Engine          │
│  • Disk usage              │      │  Web Dashboard  :8080/apm    │
│  • Network I/O             │      └──────────────────────────────┘
│  • Process count           │
│  • TCP connections         │
└────────────────────────────┘

Metrics Collected

Metric Description
cpu_percent CPU utilization across all cores
memory_percent RAM usage (used / total)
disk_percent Root disk usage
net_bytes_sent Network bytes sent
net_bytes_recv Network bytes received
process_count Number of running processes
tcp_connections Active TCP connections

Integration with Simulation

APM real-baseline data feeds directly into chaos simulations:

# Capture baseline metrics
faultray apm metrics <agent-id> --json > baseline.json

# Run simulation using real topology
faultray simulate infra.yaml

# Correlate simulation results with APM alerts
faultray apm alerts --severity critical

Resilience Badge

Show your infrastructure resilience score in your README:

faultray badge infra.yaml

Output:

[![Resilience Score](https://img.shields.io/badge/resilience-72%2F100-green)](https://github.com/mattyopon/faultray)

Which renders as: Resilience Score

The badge color adjusts automatically based on your score:

Score Color
80-100 Bright green
60-79 Green
40-59 Yellow
20-39 Orange
0-19 Red

For raw URL output (no markdown wrapping):

faultray badge infra.yaml --url

Key Features

Feature Description
5-Layer Availability Model Mathematical proof of your uptime ceiling — "your 99.99% SLA is physically impossible given this topology"
5 Simulation Engines Cascade, Dynamic, Ops, What-If, Capacity
DORA Compliance Suite 52 controls, 5 pillars, audit-ready evidence packages
Cascade Failure Analysis Graph-based blast radius mapping with containment scoring
SPOF Detection Automatic identification of single points of failure
AI Agent Testing 7 agent-specific fault types (hallucination, loops, etc.)
Terraform Integration Pre-apply impact analysis as a CI/CD gate
Third-Party Risk ICT concentration risk analysis (Herfindahl-Hirschman Index)
Multi-Framework Compliance SOC 2, ISO 27001, PCI DSS 4.0, NIST CSF, DORA, HIPAA, GDPR
APM Agent Install once, monitor forever — real-time metrics, anomaly detection, topology auto-discovery
100+ CLI Commands From faultray demo to faultray war-room

The 5-Layer Availability Model

Most SLA claims are aspirational. FaultRay proves what's actually achievable:

Layer What It Measures Financial Impact
L1: Software Deploy downtime, human error, config drift Operational uptime ceiling
L2: Hardware MTBF/MTTR × redundancy × failover Physical infrastructure limits
L3: Theoretical Network loss, GC pauses, jitter Unreachable upper bound
L4: Operational Incident rate × response time, on-call coverage Team capacity constraints
L5: External SLA ∏(third-party SLAs) Vendor dependency floor

Result: A mathematically provable availability ceiling. If your infrastructure graph says 99.95% max but you're promising 99.99%, FaultRay catches it — before the regulator does.

Research & Patent

FaultRay's core algorithms are described in a peer-reviewable paper and protected by a US patent application.

Paper:

Maeda, Y. (2026). FaultRay: In-Memory Infrastructure Resilience Simulation with Graph-Based Cascade Analysis, Multi-Layer Availability Limits, and AI Agent Failure Modeling. Zenodo. DOI: 10.5281/zenodo.19139911

Patent:

US Provisional Patent Application No. 64/010,200 (filed March 19, 2026)

@misc{maeda2026faultray,
  author    = {Maeda, Yutaro},
  title     = {FaultRay: In-Memory Infrastructure Resilience Simulation},
  year      = {2026},
  doi       = {10.5281/zenodo.19139911},
  publisher = {Zenodo}
}

Development

pip install -e ".[dev]"
pytest tests/ -v
ruff check src/ tests/

Community

License

Apache License 2.0 — see LICENSE.

License Transition (2026-04-11): FaultRay was relicensed from BSL 1.1 to Apache 2.0.

  • v11.1.0 and earlier: BSL 1.1 (yanked on PyPI)
  • v11.2.0 and later: Apache 2.0 (recommended)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

faultray-11.2.0.tar.gz (6.9 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

faultray-11.2.0-py3-none-any.whl (2.7 MB view details)

Uploaded Python 3

File details

Details for the file faultray-11.2.0.tar.gz.

File metadata

  • Download URL: faultray-11.2.0.tar.gz
  • Upload date:
  • Size: 6.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for faultray-11.2.0.tar.gz
Algorithm Hash digest
SHA256 903ee68d4c4506c688755f622b8325b3c79fc79c350706ed8c7f9f7a048b1ef5
MD5 93299480086a646a685ba5a9fe52b369
BLAKE2b-256 4f2b73c2106d010ba11f723de17d0314cab55688501c6f306b2d7ade938a8e9b

See more details on using hashes here.

File details

Details for the file faultray-11.2.0-py3-none-any.whl.

File metadata

  • Download URL: faultray-11.2.0-py3-none-any.whl
  • Upload date:
  • Size: 2.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for faultray-11.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 415e3784a73a9f5f69f5f4196e67a45a15627727e14c4e2c6138372b4c2c7031
MD5 1b11e60944dea730752b8efd37d5582e
BLAKE2b-256 504bee61f0c9792fa81bf60d879f687702d103dcbf21f43cce9057b1b0ff44d4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page