Unified LLM abstraction layer for autonomous agents
Project description
╭──────────────────────────────────────────────────────╮
│ │
│ ▄▀█ █▀█ █▀▀ █ █ █▀▄▀█ │
│ █▀█ █▀▄ █▄▄ █▄▄ █▄▄ █ ▀ █ │
│ │
│ Unified LLM Abstraction Layer │
│ for Autonomous Agents at Scale │
│ │
├──────────────────────────────────────────────────────┤
│ 11 Providers · 8 Modules · 2 Dependencies · <1ms │
╰──────────────────────────────────────────────────────╯
A minimal, security-first LLM abstraction layer built for autonomous agents at scale.
ArcLLM normalizes communication across LLM providers into a single, clean interface designed for agentic tool-calling loops. One function to load a model, one method to invoke it, normalized responses every time — regardless of provider.
from arcllm import load_model, Message
model = load_model("anthropic")
response = await model.invoke([
Message(role="user", content="What is 2 + 2?")
])
print(response.content) # "4"
print(response.usage) # input_tokens=12 output_tokens=4 total_tokens=16
print(response.stop_reason) # "end_turn"
Switch providers by changing one string. Your agent code stays the same.
Why ArcLLM
Built for federal and enterprise production environments where thousands of autonomous agents run concurrently and security, auditability, and control are non-negotiable.
- Security first — API keys from environment variables or vault backends. PII redaction, HMAC request signing, and audit trails built in. No secrets in config files, ever.
- Agent-native — Purpose-built for agentic tool-calling loops, not chat interfaces. Stateless model objects. Your agent manages its own conversation history.
- Minimal core — Two runtime dependencies (
pydantic,httpx). No provider SDKs. Direct HTTP to every provider. Import time under 100ms, abstraction overhead under 1ms per call. - Budget enforcement — Per-scope spend tracking with calendar period resets. Pre-flight cost estimation. Per-call, daily, and monthly limits with configurable enforcement (
block,warn,log). - Classification-aware routing — Route LLM calls to specific providers based on data classification. CUI stays on cleared infrastructure. Unclassified goes to cost-optimized providers.
- Opt-in complexity — Need just Anthropic with no extras? That's all that loads. Need retry, fallback, telemetry, audit, routing, and budget controls? Enable them with a flag. Nothing runs that you didn't ask for.
- Config-driven — Model metadata, provider settings, and module toggles live in TOML files. Add a provider by dropping in one
.tomlfile. Zero code changes.
Supported Providers
| Provider | Type | Adapter |
|---|---|---|
| Anthropic | Cloud | anthropic |
| OpenAI | Cloud | openai |
| DeepSeek | Cloud | deepseek |
| Mistral | Cloud | mistral |
| Groq | Cloud | groq |
| Together AI | Cloud | together |
| Fireworks AI | Cloud | fireworks |
| Hugging Face Inference | Cloud | huggingface |
| Hugging Face TGI | Self-hosted | huggingface_tgi |
| Ollama | Local | ollama |
| vLLM | Self-hosted | vllm |
Every adapter translates provider-specific quirks (role names, content formats, tool call schemas) so your agent code never has to.
Opt-In Modules
All disabled by default. Enable via config or at load time.
| Module | What It Does |
|---|---|
| Retry | Exponential backoff on transient errors (429, 500, 503). Respects Retry-After headers. |
| Fallback | Provider chain — if Anthropic fails, try OpenAI. Configurable order. |
| Rate Limit | Token-bucket throttling per provider. Prevents quota exhaustion across concurrent agents. |
| Telemetry | Timing, token counts, cost-per-call, and budget enforcement with automatic pricing from model metadata. |
| Audit | Structured call logging with metadata for compliance trails. PII-safe by default. |
| Security | PII redaction on requests/responses, HMAC request signing, vault-based key resolution. |
| Routing | Classification-aware provider/model selection. Route CUI to cleared providers, unclassified to cost-optimized. |
| OpenTelemetry | Distributed tracing export via OTLP (gRPC or HTTP). GenAI semantic conventions. |
Enable at load time:
model = load_model("anthropic", retry=True, telemetry=True, audit=True)
Or override with custom settings:
model = load_model("anthropic", retry={"max_retries": 5, "backoff_base_seconds": 2.0})
Or enable globally in config.toml:
[modules.retry]
enabled = true
max_retries = 3
backoff_base_seconds = 1.0
Budget Enforcement
Control LLM spend at every level — per-call, daily, and monthly — with configurable enforcement:
[modules.telemetry]
enabled = true
monthly_limit_usd = 500.00
daily_limit_usd = 50.00
per_call_max_usd = 5.00
alert_threshold_pct = 80
enforcement = "block" # block | warn | log
budget_scope = "my-agent"
Budget scopes are validated with NFKC Unicode normalization. Costs are clamped to prevent negative injection. Calendar period resets are automatic (UTC monthly/daily). Thread-safe for concurrent agents.
Classification-Aware Routing
Route LLM calls based on data classification:
[modules.routing]
enabled = true
enforcement = "block"
default_classification = "unclassified"
[modules.routing.rules.cui]
provider = "anthropic"
model = "claude-sonnet-4-6"
[modules.routing.rules.unclassified]
provider = "openai"
model = "gpt-4o-mini"
CUI data stays on cleared providers. Unclassified data goes to cost-optimized providers. Enforcement modes: block (hard stop), warn (log + continue), log (silent).
Installation
pip install -e "."
With dev tools:
pip install -e ".[dev]"
With OpenTelemetry export:
pip install -e ".[otel]"
With ECDSA request signing:
pip install -e ".[signing]"
Requirements: Python 3.12+
Setup
1. Set your API key
ArcLLM reads API keys from environment variables by default. Never from config files.
export ANTHROPIC_API_KEY=your-key-here
See .env.example for all supported providers.
Vault integration (optional)
For enterprise environments, ArcLLM resolves API keys from vault backends with TTL caching and automatic env var fallback. Configure in config.toml:
[vault]
backend = "my_vault_module:HashicorpVaultBackend"
cache_ttl_seconds = 300
Then set vault paths per provider in their TOML files:
[provider]
api_key_env = "ANTHROPIC_API_KEY"
vault_path = "secret/data/llm/anthropic"
Resolution order: vault (cached) -> environment variable -> error. The vault backend is a pluggable protocol — implement get_secret(path) and is_available() for any secrets manager (HashiCorp Vault, AWS Secrets Manager, Azure Key Vault, etc.).
2. Load and invoke
from arcllm import load_model, Message
model = load_model("anthropic")
response = await model.invoke([
Message(role="user", content="Summarize this document.")
])
print(response.content)
Use the async context manager to ensure clean connection shutdown:
async with load_model("anthropic") as model:
response = await model.invoke(messages)
3. Switch providers
model = load_model("openai") # OpenAI
model = load_model("groq") # Groq
model = load_model("ollama") # Local Ollama
model = load_model("together") # Together AI
Same Message types, same invoke() call, same LLMResponse back.
Tool-Calling (Agentic Loop)
This is what ArcLLM was built for. Define tools, send them with your messages, and handle the loop:
from arcllm import load_model, Message, Tool, TextBlock, ToolUseBlock, ToolResultBlock
model = load_model("anthropic")
# Define a tool
search_tool = Tool(
name="web_search",
description="Search the web for current information.",
parameters={
"type": "object",
"properties": {
"query": {"type": "string", "description": "The search query"}
},
"required": ["query"],
},
)
messages = [Message(role="user", content="Search for the latest Python release.")]
# Agentic loop
while True:
response = await model.invoke(messages, tools=[search_tool])
if response.stop_reason == "end_turn":
print(response.content)
break
if response.stop_reason == "tool_use":
# Pack the assistant's response back into messages
assistant_content = []
if response.content:
assistant_content.append(TextBlock(text=response.content))
for tc in response.tool_calls:
assistant_content.append(
ToolUseBlock(id=tc.id, name=tc.name, arguments=tc.arguments)
)
messages.append(Message(role="assistant", content=assistant_content))
# Execute tools and send results back
for tc in response.tool_calls:
result = execute_tool(tc.name, tc.arguments) # your implementation
messages.append(Message(
role="tool",
content=[ToolResultBlock(tool_use_id=tc.id, content=result)],
))
Every provider returns the same LLMResponse with the same ToolCall objects and the same stop_reason values. Your agentic loop works across all 11 providers without modification.
Core Types
ArcLLM's type system is the contract between your agent and any LLM provider.
| Type | Purpose |
|---|---|
Message |
Input message with role and content |
Tool |
Tool definition sent to the LLM |
LLMResponse |
Normalized response: content, tool_calls, usage, stop_reason |
ToolCall |
Parsed tool call: id, name, arguments (always a dict) |
Usage |
Token accounting: input, output, total, cache, reasoning |
ContentBlock |
Union of TextBlock, ImageBlock, ToolUseBlock, ToolResultBlock |
All types are Pydantic v2 models with full validation and serialization.
Architecture
Agent Code
|
load_model() ---- Public API
|
Modules ---------- opt-in: retry, fallback, telemetry, audit, security, routing, otel
|
Adapter ---------- provider-specific translation (one .py per provider)
|
Types ------------ pydantic models (the universal contract)
|
Config ----------- TOML files (global defaults + per-provider metadata)
Design principles:
- Library, not a framework — import what you need, nothing more
- No state in the LLM layer — model objects hold config, agents hold conversation
- Provider quirks stay in adapters — your code sees clean, normalized types
- Fail fast, fail loud — errors carry raw data, nothing is silently swallowed
- Config-driven — add a provider by dropping in a TOML file, not writing code
Simplicity by the Numbers
ArcLLM is radically smaller than alternatives. This is a design choice, not a limitation.
| Metric | ArcLLM | pi-ai | LiteLLM |
|---|---|---|---|
| Source LOC | ~2,900 | ~22,600 | ~475,000 |
| Source files | 33 | 38 | 1,558 |
| Runtime deps | 3 | 13 | 12+ |
| Providers | 11 | 9 | 100+ |
LOC per provider: ArcLLM averages ~60 lines per provider adapter. Most are 11-line thin aliases over the OpenAI-compatible base. pi-ai averages ~630 lines per provider. LiteLLM averages ~1,300 lines per provider.
Why this matters:
- Auditable — A security reviewer can read the entire LLM layer in an afternoon. Try that with 475K lines.
- Debuggable — When something breaks, the call stack is shallow. No framework magic, no middleware chains you can't trace.
- Maintainable — Fewer lines means fewer bugs. Every line in ArcLLM exists because it has to, not because a feature flag needed a feature flag.
- Fast — 3 runtime dependencies means fast installs, small container images, and minimal attack surface. pi-ai requires
@anthropic-ai/sdk,openai,@google/genai,@aws-sdk/client-bedrock-runtime, and 9 more packages. LiteLLM requiresopenai,tiktoken,tokenizers,aiohttp,jinja2,jsonschema, and more.
The tradeoff is provider count: LiteLLM supports 100+ providers because it wraps provider SDKs. pi-ai wraps 4 provider SDKs (Anthropic, OpenAI, Google, AWS) to cover 9 API backends. ArcLLM supports 11 providers via direct HTTP — no SDKs, no transitive dependency trees. Adding a new OpenAI-compatible provider is an 11-line file and a TOML config.
LOC measured with find src -name "*.py" | xargs wc -l (ArcLLM) and find src -name "*.ts" | xargs wc -l (pi-ai), excluding tests. Competitor numbers from public GitHub repos as of February 2026.
Configuration
Global defaults (src/arcllm/config.toml):
[defaults]
provider = "anthropic"
temperature = 0.7
max_tokens = 4096
[vault]
backend = ""
cache_ttl_seconds = 300
[modules.retry]
enabled = false
max_retries = 3
backoff_base_seconds = 1.0
[modules.fallback]
enabled = false
chain = ["anthropic", "openai"]
[modules.routing]
enabled = false
enforcement = "warn"
default_classification = "unclassified"
[modules.telemetry]
enabled = false
log_level = "INFO"
# monthly_limit_usd = 500.00
# daily_limit_usd = 50.00
# per_call_max_usd = 5.00
# alert_threshold_pct = 80
# enforcement = "block"
[modules.security]
enabled = false
pii_enabled = true
signing_enabled = true
signing_algorithm = "hmac-sha256"
signing_key_env = "ARCLLM_SIGNING_KEY"
Provider config (src/arcllm/providers/anthropic.toml):
[provider]
base_url = "https://api.anthropic.com"
api_key_env = "ANTHROPIC_API_KEY"
default_model = "claude-sonnet-4-20250514"
vault_path = ""
[models.claude-sonnet-4-20250514]
context_window = 200000
max_output_tokens = 8192
supports_tools = true
supports_vision = true
cost_input_per_1m = 3.00
cost_output_per_1m = 15.00
Model metadata (context windows, capabilities, pricing) lives in config, not code. Update a model's pricing or add a new model variant without touching a single line of Python.
Running Tests
pytest -v # Unit + adapter tests (mocked)
pytest --cov=arcllm # With coverage
pytest tests/security/ # Security-specific tests
pytest tests/test_agentic_loop.py # Live API test (requires ANTHROPIC_API_KEY)
Security
ArcLLM is built security-first for federal production environments. See docs/security.md for the full security reference including NIST 800-53 and OWASP mapping.
Key capabilities:
- API key isolation — Keys from env vars or vault only. Never in config, logs, or responses.
- PII redaction — Automatic detection and redaction of SSN, credit cards, emails, phone numbers, and IPs on both inbound and outbound messages.
- Request signing — HMAC-SHA256 signatures on every request payload for tamper detection and non-repudiation.
- Vault integration — Pluggable secrets backend with TTL caching. Supports any vault (HashiCorp, AWS SM, Azure KV).
- Audit trails — Structured compliance logging. PII-safe metadata by default, raw content opt-in at DEBUG.
- Budget enforcement — Per-scope spend limits with negative cost injection prevention, Unicode homoglyph attack resistance, and thread-safe accumulation.
- Classification routing — Data classification-aware provider selection prevents CUI from reaching unauthorized providers.
- TLS enforced — All provider communication over HTTPS via httpx defaults.
- OpenTelemetry — Distributed tracing with mTLS support for secure telemetry export.
Project Status
ArcLLM is in active development. Core foundation, all provider adapters, the module system, budget enforcement, and classification routing are complete and tested.
| Phase | Status |
|---|---|
| Core Foundation (types, config, adapters, registry) | Complete |
| Module System (retry, fallback, rate limit, telemetry, audit, security, otel) | Complete |
| Enterprise (vault integration, request signing, PII redaction) | Complete |
| Budget enforcement (per-scope, daily, monthly, per-call) | Complete |
| Classification-aware routing | Complete |
License
This project is licensed under the Apache License, Version 2.0.
Copyright (c) 2025-2026 BlackArc Systems.
Acknowledgments
ArcLLM was inspired by pi-ai (from pi-mono by Mario Zechner) and LiteLLM. We studied their architectures and built something purpose-fit for Python, federal environments, and autonomous agent fleets.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file arcllm-0.2.0.tar.gz.
File metadata
- Download URL: arcllm-0.2.0.tar.gz
- Upload date:
- Size: 184.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c56b3e8a0435c5e6738f4259cd6b1581d1203847abf4722e0bfedec54d350c34
|
|
| MD5 |
97c36f7429ca351b08e71155bb492f4f
|
|
| BLAKE2b-256 |
d5fd650051f302026f0c36ad7832efae3814449d5f69b0e808769bd48f06c55a
|
Provenance
The following attestation bundles were made for arcllm-0.2.0.tar.gz:
Publisher:
publish-arcllm.yml on joshuamschultz/Arc
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
arcllm-0.2.0.tar.gz -
Subject digest:
c56b3e8a0435c5e6738f4259cd6b1581d1203847abf4722e0bfedec54d350c34 - Sigstore transparency entry: 991967566
- Sigstore integration time:
-
Permalink:
joshuamschultz/Arc@1b000b26b8ca9dc429c121a10da36ed3f119d6c2 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/joshuamschultz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-arcllm.yml@1b000b26b8ca9dc429c121a10da36ed3f119d6c2 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file arcllm-0.2.0-py3-none-any.whl.
File metadata
- Download URL: arcllm-0.2.0-py3-none-any.whl
- Upload date:
- Size: 59.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
52c63c2a936f77ae61929da7727e994751363bd05b5b6013f906d161497bb5f7
|
|
| MD5 |
c52a9cc51a742d73ba7f5e001d214379
|
|
| BLAKE2b-256 |
0c450c56ffc57e4869e9f827870fa761b168fddc92012dbf23c00db5f795847d
|
Provenance
The following attestation bundles were made for arcllm-0.2.0-py3-none-any.whl:
Publisher:
publish-arcllm.yml on joshuamschultz/Arc
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
arcllm-0.2.0-py3-none-any.whl -
Subject digest:
52c63c2a936f77ae61929da7727e994751363bd05b5b6013f906d161497bb5f7 - Sigstore transparency entry: 991967567
- Sigstore integration time:
-
Permalink:
joshuamschultz/Arc@1b000b26b8ca9dc429c121a10da36ed3f119d6c2 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/joshuamschultz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-arcllm.yml@1b000b26b8ca9dc429c121a10da36ed3f119d6c2 -
Trigger Event:
workflow_dispatch
-
Statement type: