Skip to main content

Verification and evaluation framework for autonomous agents across agentic protocols

Project description

Arc-Verifier

Open source verification infrastructure for autonomous agents across agentic protocols. The industry standard for comprehensive security validation, behavioral verification, and performance certification of AI agents that manage real assets.

Overview

Arc-Verifier provides automated evaluation of autonomous agents deployed on agentic protocols including multichain intent systems, and TEE-based infrastructures. As autonomous agents increasingly manage billions in capital and execute decisions without human intervention, verification becomes critical for protocol safety and user trust.

Key Features

  • Security Validation: Container scanning, TEE attestation, key management verification
  • Strategy Verification: Validates agents do what they claim with real market data
  • Verification Score™: Trustworthiness metric (0-180)
  • Production Scale: Verify 100+ agents in parallel
  • LLM-as-a-Judge: Agent behavioral assessment and risk detection
  • Protocol Agnostic: Works with any containerized agent

Architecture Diagram

graph TB
    subgraph "Input"
        A[Docker Image]
    end
    
    subgraph "Verification Pipeline"
        B[Security Scanner<br/>CVE Detection]
        C[TEE Validator<br/>Hardware Attestation]
        D[Performance Tester<br/>Load & Stress]
        E[Strategy Verifier<br/>Backtesting]
        F[LLM Judge<br/>Behavioral Analysis]
    end
    
    subgraph "Scoring Engine"
        G[Verification Calculator<br/>0-180 Points]
    end
    
    subgraph "Outputs"
        H[Verification Report]
        I[Web Dashboard]
        J[CI/CD Integration]
    end
    
    A --> B
    B --> C
    C --> D
    D --> E
    E --> F
    F --> G
    G --> H
    G --> I
    G --> J
    
    style A fill:#e1f5fe
    style G fill:#fff9c4
    style H fill:#c8e6c9
    style I fill:#c8e6c9
    style J fill:#c8e6c9

Component Architecture

arc-verifier/
├── core/                 # Core verification engine
│   ├── verifier.py      # Main orchestrator
│   └── pipeline.py      # Pipeline coordination
├── security/            # Security components
│   ├── scanner.py       # Vulnerability scanning
│   └── tee_validator.py # TEE attestation
├── analysis/            # Analysis engines
│   ├── performance.py   # Load testing
│   ├── strategy.py      # Strategy verification
│   └── llm_judge/       # AI behavioral analysis
├── data/                # Market data management
│   ├── backtester.py    # Historical testing
│   └── fetcher.py       # Data collection
├── orchestration/       # Scaling infrastructure
│   └── parallel.py      # Concurrent verification
├── web/                 # Web UI dashboard
│   ├── templates/       # HTML templates
│   └── static/          # CSS/JS assets
└── cli/                 # Command-line interface
    └── commands/        # CLI commands

Architecture

Core Components

Component Purpose
scanner.py Container vulnerability detection using Trivy
validator.py TEE attestation validation (Intel SGX, AMD SEV)
benchmarker.py Performance testing and resource profiling
strategy_verifier.py Trading strategy analysis with real market data
real_backtester.py Historical performance simulation
simulator.py Agent behavior simulation under various conditions
llm_judge/ AI-based code analysis and behavioral assessment
tee/ Trusted Execution Environment validation suite
parallel_verifier.py Concurrent verification using Dagger orchestration
verification_pipeline.py End-to-end verification workflow coordination

LLM Judge Module

Modular AI analysis system for trust-focused evaluation:

llm_judge/
├── core.py              # Main orchestrator
├── models.py            # Pydantic data models
├── providers/           # LLM provider abstractions
│   ├── anthropic.py     # Anthropic Claude integration
│   ├── openai.py        # OpenAI GPT integration
│   └── factory.py       # Provider selection logic
├── security/            # Trust-focused analysis
│   ├── analyzers.py     # Security pattern detection
│   ├── prompts.py       # Security evaluation prompts
│   └── scoring.py       # Trust score calculation
└── evaluation/          # General assessment
    ├── ensemble.py      # Multi-provider evaluation
    └── prompts.py       # Behavioral analysis prompts

TEE Validation Suite

Hardware-based verification for trusted execution:

tee/
├── attestation_verifier.py    # Attestation validation
├── phala_validator.py         # Phala Network TEE support
├── code_hash_registry.py      # Verified code tracking
└── config.py                  # TEE configuration management

Verification Pipeline

Five-stage automated analysis:

Docker Image → Security Scan → TEE Validation → Performance Test → Strategy Analysis → AI Assessment → Score
  1. Security Analysis (scanner.py)

    • CVE detection with Trivy
    • Dependency vulnerability assessment
    • Container configuration analysis
  2. TEE Attestation (validator.py, tee/)

    • Hardware security validation
    • Enclave measurement verification
    • Code integrity confirmation
  3. Performance Evaluation (benchmarker.py)

    • Load testing and throughput measurement
    • Resource usage profiling
    • Latency analysis under stress
  4. Strategy Verification (strategy_verifier.py, real_backtester.py)

    • Historical performance backtesting
    • Market regime analysis
    • Risk-adjusted return calculation
  5. Behavioral Assessment (llm_judge/, simulator.py)

    • AI-powered code review
    • Intent classification and validation
    • Deception and malicious pattern detection

Quick Start

Installation

# Install from PyPI
pip install arc-verifier

# Install with all features
pip install 'arc-verifier[llm,web]'

# Initialize environment
arc-verifier init

Basic Usage

# Verify single agent
arc-verifier verify myagent:latest

# Verify with high security requirements
arc-verifier verify prod-agent:latest --tier high

# Batch verification from file
arc-verifier batch -f agents.txt --max-concurrent 20

# Launch web dashboard
arc-verifier export web

Programmatic API

from arc_verifier import api

# Simple verification
result = await api.verify_agent("myagent:latest")
print(f"Fort Score: {result.fort_score}/180")
print(f"Status: {result.status}")

# Batch verification with custom settings
results = await api.verify_batch(
    ["agent1:latest", "agent2:latest", "agent3:latest"],
    max_concurrent=10,
    enable_llm=True,
    tier="high"
)

# Access individual components
security_result = await api.scan_security("myagent:latest")
performance_result = await api.test_performance("myagent:latest", duration=120)
backtest_result = await api.backtest_strategy("trader:latest", start_date="2024-01-01")

Configuration

Environment Variables

# LLM Analysis
ANTHROPIC_API_KEY=your_key
OPENAI_API_KEY=your_key
LLM_PRIMARY_PROVIDER=anthropic
LLM_ENABLE_ENSEMBLE=true

# TEE Validation
TEE_INTEL_PCCS_ENDPOINT=https://api.trustedservices.intel.com/sgx/certification/v4
TEE_PHALA_ENDPOINT=https://api.phala.network/v1/verify

# Performance Testing
BENCHMARK_DURATION=60
DOCKER_TIMEOUT=30

TEE Configuration

# Initialize TEE configuration
python -m arc_verifier.tee.cli init-config

# Add agent to registry
python -m arc_verifier.tee.cli registry add myagent:latest \
  --risk-level medium --capabilities "trading,defi"

Output Formats

Terminal Output

┌──────────────────────────────┐
│     Verification Results     │
├──────────────────────────────┤
│ Security: ✓ 0 critical      │
│ TEE: ✓ Intel SGX verified   │
│ Performance: ✓ 2000 TPS     │
│ Strategy: ✓ 75% effective   │
│ AI Analysis: ✓ No risks     │
└──────────────────────────────┘

Fort Score: 145/180 (Deploy with confidence)

JSON Output

{
  "verification_id": "ver_a1b2c3d4",
  "image": "myagent:latest",
  "timestamp": "2024-01-15T10:30:00Z",
  "fort_score": 145,
  "components": {
    "docker_scan": {
      "vulnerabilities": {"critical": 0, "high": 0},
      "agent_detected": true
    },
    "tee_validation": {
      "valid": true,
      "platform": "Intel SGX",
      "measurements": {"mrenclave": "abc123..."}
    },
    "performance": {
      "throughput": 2000,
      "latency_p99": 45.7,
      "cpu_efficiency": 0.85
    },
    "strategy_analysis": {
      "detected_strategy": "arbitrage",
      "effectiveness": 75.2,
      "max_drawdown": 0.12
    },
    "llm_analysis": {
      "trust_recommendation": "DEPLOY",
      "confidence": 0.92,
      "risk_score": 0.15
    }
  }
}

CI/CD Integration

GitHub Actions

- name: Verify Agent
  run: |
    pip install arc-verifier
    arc-verifier verify ${{ github.repository }}:${{ github.sha }} \
      --tier high --output json > results.json
    
    # Enforce minimum score
    SCORE=$(jq -r '.fort_score' results.json)
    if [ $SCORE -lt 120 ]; then exit 1; fi

Integration with Agentic Protocols

Arc-Verifier integrates with various agentic protocol infrastructures:

  • Intent-based Systems: Validate agents executing cross-chain intents
  • TEE-based Protocols: Comprehensive attestation for Phala, Oasis, and other TEE networks
  • General Agent Frameworks: Protocol-agnostic verification for any containerized agent

Fort Score™

The industry-standard trustworthiness metric for autonomous agents (0-180 points):

Scoring Components

Component Range Evaluation Criteria
Security -30 to +30 • Vulnerability count and severity
• Secure coding practices
• Key management security
• TEE attestation validity
Performance -50 to +90 • Throughput and latency
• Resource efficiency
• Error handling
• Scalability under load
Strategy -30 to +30 • Backtesting performance
• Risk-adjusted returns
• Strategy consistency
• Market regime adaptability
Intelligence -30 to +30 • LLM behavioral analysis
• Code quality assessment
• Malicious pattern detection
• Trust recommendations

Deployment Guidelines

Score Range Status Recommendation
150-180 🟢 Excellent Deploy to production with confidence
120-149 🟡 Good Deploy with monitoring, minor improvements recommended
90-119 🟠 Fair Deploy to staging only, significant improvements needed
60-89 🔴 Poor Do not deploy, major issues present
0-59 ⛔ Critical High risk, fundamental redesign required

Export Options

Export verification results in various formats:

# Export as HTML report
arc-verifier export results --latest --format html

# Export as JSON
arc-verifier export results --latest --format json

# View in web dashboard
arc-verifier export web

Data Sources

Arc-Verifier automatically collects:

Source Data Components
Container Image Layers, dependencies, configuration scanner.py
Runtime Metrics Resource usage, performance data benchmarker.py
Market Data Historical prices, volatility data_fetcher.py, data_registry.py
TEE Attestations Hardware measurements, signatures tee/
Code Patterns Logic analysis, behavioral signatures llm_judge/

Installation

# Requirements: Python 3.11+, Docker
pip install arc-verifier

# With all features
pip install 'arc-verifier[llm,web]'

# Development installation
git clone https://github.com/arc-computer/arc-verifier
cd arc-verifier
pip install -e ".[dev,llm,web]"

Contributing

Open source infrastructure project. See CONTRIBUTING.md for development guidelines.

  • Issues: GitHub Issues for bug reports
  • Development: Follow conventional commits, maintain test coverage
  • Documentation: Update relevant component docs with changes

License

MIT License - Open source infrastructure for the agentic protocol ecosystem.


Verification infrastructure for autonomous agents across all agentic protocols

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arc_verifier-0.1.2.tar.gz (141.5 kB view details)

Uploaded Source

Built Distribution

arc_verifier-0.1.2-py3-none-any.whl (161.0 kB view details)

Uploaded Python 3

File details

Details for the file arc_verifier-0.1.2.tar.gz.

File metadata

  • Download URL: arc_verifier-0.1.2.tar.gz
  • Upload date:
  • Size: 141.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for arc_verifier-0.1.2.tar.gz
Algorithm Hash digest
SHA256 852f48ced25229887d9207eebd73574c63f1a694cd8a9d4a1f36fcbe1a671af8
MD5 e0c7ee9608ebb08e6bbfa532bf30d5c4
BLAKE2b-256 ce9ec31655f992048e5bca5c7b9facb612388681fbf5f143230675bd5e945e99

See more details on using hashes here.

File details

Details for the file arc_verifier-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: arc_verifier-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 161.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for arc_verifier-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 bca25974ee3be01cb7991577aa89222ce1f638c759714dd5142e20e72a23dd73
MD5 97832e8bcec0602f442b084ba1f16939
BLAKE2b-256 31b3f6da42b0015a1a4f484b181c1c9ecbab362b048b9ece3dab5a7e26553a89

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page