Verification and evaluation framework for autonomous agents across agentic protocols

These details have not been verified by PyPI

Project links

Project description

Arc-Verifier

Open source verification infrastructure for autonomous agents across agentic protocols. The industry standard for comprehensive security validation, behavioral verification, and performance certification of AI agents that manage real assets.

Overview

Arc-Verifier provides automated evaluation of autonomous agents deployed on agentic protocols including multichain intent systems, and TEE-based infrastructures. As autonomous agents increasingly manage billions in capital and execute decisions without human intervention, verification becomes critical for protocol safety and user trust.

Key Features

Security Validation: Container scanning, TEE attestation, key management verification
Strategy Verification: Validates agents do what they claim with real market data
Verification Score™: Trustworthiness metric (0-180)
Production Scale: Verify 100+ agents in parallel
LLM-as-a-Judge: Agent behavioral assessment and risk detection
Protocol Agnostic: Works with any containerized agent

Architecture Diagram

graph TB
    subgraph "Input"
        A[Docker Image]
    end
    
    subgraph "Verification Pipeline"
        B[Security Scanner<br/>CVE Detection]
        C[TEE Validator<br/>Hardware Attestation]
        D[Performance Tester<br/>Load & Stress]
        E[Strategy Verifier<br/>Backtesting]
        F[LLM Judge<br/>Behavioral Analysis]
    end
    
    subgraph "Scoring Engine"
        G[Verification Calculator<br/>0-180 Points]
    end
    
    subgraph "Outputs"
        H[Verification Report]
        I[Web Dashboard]
        J[CI/CD Integration]
    end
    
    A --> B
    B --> C
    C --> D
    D --> E
    E --> F
    F --> G
    G --> H
    G --> I
    G --> J
    
    style A fill:#e1f5fe
    style G fill:#fff9c4
    style H fill:#c8e6c9
    style I fill:#c8e6c9
    style J fill:#c8e6c9

Component Architecture

arc-verifier/
├── core/                 # Core verification engine
│   ├── verifier.py      # Main orchestrator
│   └── pipeline.py      # Pipeline coordination
├── security/            # Security components
│   ├── scanner.py       # Vulnerability scanning
│   └── tee_validator.py # TEE attestation
├── analysis/            # Analysis engines
│   ├── performance.py   # Load testing
│   ├── strategy.py      # Strategy verification
│   └── llm_judge/       # AI behavioral analysis
├── data/                # Market data management
│   ├── backtester.py    # Historical testing
│   └── fetcher.py       # Data collection
├── orchestration/       # Scaling infrastructure
│   └── parallel.py      # Concurrent verification
├── web/                 # Web UI dashboard
│   ├── templates/       # HTML templates
│   └── static/          # CSS/JS assets
└── cli/                 # Command-line interface
    └── commands/        # CLI commands

Architecture

Core Components

Component	Purpose
`scanner.py`	Container vulnerability detection using Trivy
`validator.py`	TEE attestation validation (Intel SGX, AMD SEV)
`benchmarker.py`	Performance testing and resource profiling
`strategy_verifier.py`	Trading strategy analysis with real market data
`real_backtester.py`	Historical performance simulation
`simulator.py`	Agent behavior simulation under various conditions
`llm_judge/`	AI-based code analysis and behavioral assessment
`tee/`	Trusted Execution Environment validation suite
`parallel_verifier.py`	Concurrent verification using Dagger orchestration
`verification_pipeline.py`	End-to-end verification workflow coordination

LLM Judge Module

Modular AI analysis system for trust-focused evaluation:

llm_judge/
├── core.py              # Main orchestrator
├── models.py            # Pydantic data models
├── providers/           # LLM provider abstractions
│   ├── anthropic.py     # Anthropic Claude integration
│   ├── openai.py        # OpenAI GPT integration
│   └── factory.py       # Provider selection logic
├── security/            # Trust-focused analysis
│   ├── analyzers.py     # Security pattern detection
│   ├── prompts.py       # Security evaluation prompts
│   └── scoring.py       # Trust score calculation
└── evaluation/          # General assessment
    ├── ensemble.py      # Multi-provider evaluation
    └── prompts.py       # Behavioral analysis prompts

TEE Validation Suite

Hardware-based verification for trusted execution:

tee/
├── attestation_verifier.py    # Attestation validation
├── phala_validator.py         # Phala Network TEE support
├── code_hash_registry.py      # Verified code tracking
└── config.py                  # TEE configuration management

Verification Pipeline

Five-stage automated analysis:

Docker Image → Security Scan → TEE Validation → Performance Test → Strategy Analysis → AI Assessment → Score

Security Analysis (scanner.py)
- CVE detection with Trivy
- Dependency vulnerability assessment
- Container configuration analysis
TEE Attestation (validator.py, tee/)
- Hardware security validation
- Enclave measurement verification
- Code integrity confirmation
Performance Evaluation (benchmarker.py)
- Load testing and throughput measurement
- Resource usage profiling
- Latency analysis under stress
Strategy Verification (strategy_verifier.py, real_backtester.py)
- Historical performance backtesting
- Market regime analysis
- Risk-adjusted return calculation
Behavioral Assessment (llm_judge/, simulator.py)
- AI-powered code review
- Intent classification and validation
- Deception and malicious pattern detection

Quick Start

Installation

# Install from PyPI
pip install arc-verifier

# Install with all features
pip install 'arc-verifier[llm,web]'

# Initialize environment
arc-verifier init

Basic Usage

# Verify single agent
arc-verifier verify myagent:latest

# Verify with high security requirements
arc-verifier verify prod-agent:latest --tier high

# Batch verification from file
arc-verifier batch -f agents.txt --max-concurrent 20

# Launch web dashboard
arc-verifier export web

Programmatic API

from arc_verifier import api

# Simple verification
result = await api.verify_agent("myagent:latest")
print(f"Fort Score: {result.fort_score}/180")
print(f"Status: {result.status}")

# Batch verification with custom settings
results = await api.verify_batch(
    ["agent1:latest", "agent2:latest", "agent3:latest"],
    max_concurrent=10,
    enable_llm=True,
    tier="high"
)

# Access individual components
security_result = await api.scan_security("myagent:latest")
performance_result = await api.test_performance("myagent:latest", duration=120)
backtest_result = await api.backtest_strategy("trader:latest", start_date="2024-01-01")

Configuration

Environment Variables

# LLM Analysis
ANTHROPIC_API_KEY=your_key
OPENAI_API_KEY=your_key
LLM_PRIMARY_PROVIDER=anthropic
LLM_ENABLE_ENSEMBLE=true

# TEE Validation
TEE_INTEL_PCCS_ENDPOINT=https://api.trustedservices.intel.com/sgx/certification/v4
TEE_PHALA_ENDPOINT=https://api.phala.network/v1/verify

# Performance Testing
BENCHMARK_DURATION=60
DOCKER_TIMEOUT=30

TEE Configuration

# Initialize TEE configuration
python -m arc_verifier.tee.cli init-config

# Add agent to registry
python -m arc_verifier.tee.cli registry add myagent:latest \
  --risk-level medium --capabilities "trading,defi"

Output Formats

Terminal Output

┌──────────────────────────────┐
│     Verification Results     │
├──────────────────────────────┤
│ Security: ✓ 0 critical      │
│ TEE: ✓ Intel SGX verified   │
│ Performance: ✓ 2000 TPS     │
│ Strategy: ✓ 75% effective   │
│ AI Analysis: ✓ No risks     │
└──────────────────────────────┘

Fort Score: 145/180 (Deploy with confidence)

JSON Output

{
  "verification_id": "ver_a1b2c3d4",
  "image": "myagent:latest",
  "timestamp": "2024-01-15T10:30:00Z",
  "fort_score": 145,
  "components": {
    "docker_scan": {
      "vulnerabilities": {"critical": 0, "high": 0},
      "agent_detected": true
    },
    "tee_validation": {
      "valid": true,
      "platform": "Intel SGX",
      "measurements": {"mrenclave": "abc123..."}
    },
    "performance": {
      "throughput": 2000,
      "latency_p99": 45.7,
      "cpu_efficiency": 0.85
    },
    "strategy_analysis": {
      "detected_strategy": "arbitrage",
      "effectiveness": 75.2,
      "max_drawdown": 0.12
    },
    "llm_analysis": {
      "trust_recommendation": "DEPLOY",
      "confidence": 0.92,
      "risk_score": 0.15
    }
  }
}

CI/CD Integration

GitHub Actions

- name: Verify Agent
  run: |
    pip install arc-verifier
    arc-verifier verify ${{ github.repository }}:${{ github.sha }} \
      --tier high --output json > results.json
    
    # Enforce minimum score
    SCORE=$(jq -r '.fort_score' results.json)
    if [ $SCORE -lt 120 ]; then exit 1; fi

Integration with Agentic Protocols

Arc-Verifier integrates with various agentic protocol infrastructures:

Intent-based Systems: Validate agents executing cross-chain intents
TEE-based Protocols: Comprehensive attestation for Phala, Oasis, and other TEE networks
General Agent Frameworks: Protocol-agnostic verification for any containerized agent

Fort Score™

The industry-standard trustworthiness metric for autonomous agents (0-180 points):

Scoring Components

Component	Range	Evaluation Criteria
Security	-30 to +30	• Vulnerability count and severity • Secure coding practices • Key management security • TEE attestation validity
Performance	-50 to +90	• Throughput and latency • Resource efficiency • Error handling • Scalability under load
Strategy	-30 to +30	• Backtesting performance • Risk-adjusted returns • Strategy consistency • Market regime adaptability
Intelligence	-30 to +30	• LLM behavioral analysis • Code quality assessment • Malicious pattern detection • Trust recommendations

Deployment Guidelines

Score Range	Status	Recommendation
150-180	🟢 Excellent	Deploy to production with confidence
120-149	🟡 Good	Deploy with monitoring, minor improvements recommended
90-119	🟠 Fair	Deploy to staging only, significant improvements needed
60-89	🔴 Poor	Do not deploy, major issues present
0-59	⛔ Critical	High risk, fundamental redesign required

Export Options

Export verification results in various formats:

# Export as HTML report
arc-verifier export results --latest --format html

# Export as JSON
arc-verifier export results --latest --format json

# View in web dashboard
arc-verifier export web

Data Sources

Arc-Verifier automatically collects:

Source	Data	Components
Container Image	Layers, dependencies, configuration	`scanner.py`
Runtime Metrics	Resource usage, performance data	`benchmarker.py`
Market Data	Historical prices, volatility	`data_fetcher.py`, `data_registry.py`
TEE Attestations	Hardware measurements, signatures	`tee/`
Code Patterns	Logic analysis, behavioral signatures	`llm_judge/`

Installation

# Requirements: Python 3.11+, Docker
pip install arc-verifier

# With all features
pip install 'arc-verifier[llm,web]'

# Development installation
git clone https://github.com/arc-computer/arc-verifier
cd arc-verifier
pip install -e ".[dev,llm,web]"

Contributing

Open source infrastructure project. See CONTRIBUTING.md for development guidelines.

Issues: GitHub Issues for bug reports
Development: Follow conventional commits, maintain test coverage
Documentation: Update relevant component docs with changes

License

MIT License - Open source infrastructure for the agentic protocol ecosystem.

Verification infrastructure for autonomous agents across all agentic protocols

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.2

Jun 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arc_verifier-0.1.2.tar.gz (141.5 kB view details)

Uploaded Jun 16, 2025 Source

Built Distribution

arc_verifier-0.1.2-py3-none-any.whl (161.0 kB view details)

Uploaded Jun 16, 2025 Python 3

File details

Details for the file arc_verifier-0.1.2.tar.gz.

File metadata

Download URL: arc_verifier-0.1.2.tar.gz
Upload date: Jun 16, 2025
Size: 141.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for arc_verifier-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`852f48ced25229887d9207eebd73574c63f1a694cd8a9d4a1f36fcbe1a671af8`
MD5	`e0c7ee9608ebb08e6bbfa532bf30d5c4`
BLAKE2b-256	`ce9ec31655f992048e5bca5c7b9facb612388681fbf5f143230675bd5e945e99`

See more details on using hashes here.

File details

Details for the file arc_verifier-0.1.2-py3-none-any.whl.

File metadata

Download URL: arc_verifier-0.1.2-py3-none-any.whl
Upload date: Jun 16, 2025
Size: 161.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for arc_verifier-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bca25974ee3be01cb7991577aa89222ce1f638c759714dd5142e20e72a23dd73`
MD5	`97832e8bcec0602f442b084ba1f16939`
BLAKE2b-256	`31b3f6da42b0015a1a4f484b181c1c9ecbab362b048b9ece3dab5a7e26553a89`

See more details on using hashes here.

arc-verifier 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Arc-Verifier

Overview

Key Features

Architecture Diagram

Component Architecture

Architecture

Core Components

LLM Judge Module

TEE Validation Suite

Verification Pipeline

Quick Start

Installation

Basic Usage

Programmatic API

Configuration

Environment Variables

TEE Configuration

Output Formats

Terminal Output

JSON Output

CI/CD Integration

GitHub Actions

Integration with Agentic Protocols

Fort Score™

Scoring Components

Deployment Guidelines

Export Options

Data Sources

Installation

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes