Skip to main content

AI proposes. You decide. — Adversarial AI workflow engine with built-in quality arbitration

Project description

AI Flow Architect

AI proposes. You decide.

Two AI brains debate your task. One proposes, the other challenges. You make the final call. Built-in quality arbitration — not an afterthought.

Python 3.9+ License Alpha Tests

The Problem · How It Works · Comparison · Quick Start · Roadmap


AI Flow Architect — Dual-Brain Workflow


The Problem

You ask GPT-4 to design a user authentication system. It returns code that looks clean. You scan through it — APIs, database schema, middleware. Looks fine. You ship it.

What you didn't notice: the password hashing uses MD5 and there is no rate limiting on login endpoints.

You trusted a single AI and it hallucinated security. This happens constantly — not because AI is malicious, but because a single model has no mechanism to catch its own blind spots.

Existing approach The flaw
Single-model prompting Same model reviews its own output. Blind spots persist.
LangChain / CrewAI Flexible orchestration, but quality control is your responsibility.
"Trust me bro" Hoping the model gets it right this time.

How It Works

AI Flow Architect runs your task through two independent AI brains and makes their disagreement visible to you.

  You: "Design a user management system"
         |
         v
+--------------------+
| Brain #1 (Planner) |  Analyzes requirements, generates a step-by-step blueprint
| Model: GPT-4o      |  with risk annotations and alternative approaches
+--------+-----------+
         |
         v
+--------------------+
| Opponent Brain     |  Challenges the blueprint from adversarial perspectives:
| (5 review styles)  |  Security audit, cost analysis, user empathy, data rigor, minimalism
+--------+-----------+
         |
    [You review and approve the blueprint]
         |
         v
+--------------------+
| Expert Team        |  Each expert runs in an isolated session.
| Creative           |  No cross-contamination. Structured handoffs only.
| Evaluator          |
| Programmer         |
| Reviewer           |
+--------+-----------+
         |
         v
+--------------------+
| Brain #2 (Arbiter) |  Compares the blueprint against deliverables item-by-item.
| Model: Claude      |  Cross-model review. Different model = different blind spots.
+--------+-----------+
         |
         v
     You get: a quality report, not a gamble.

Key design decisions:

  • Single key works out of the box. Omit brain2 and it auto-selects a cheaper model from the same provider. One OpenAI key is enough to start.
  • Cross-provider is best. OpenAI + Anthropic gives the strongest arbitration — different training data, different failure modes.
  • Different models matter. Same-model self-review lets hallucinations through. The framework enforces model diversity for Brain #2.
  • Fixed workflow, not free orchestration. You trade flexibility for predictability. Every task follows the same quality-controlled pipeline.
  • Every expert is session-isolated. They don't know about each other. Data passes through structured fields only.
  • Opponent Brain before execution. Five adversarial perspectives challenge the plan before a single API call is wasted.

Comparison

AI Flow Architect LangChain CrewAI
Philosophy Adversarial quality control Flexible pipeline composition Role-based agent teams
Quality control Built-in (dual-brain arbitration + opponent review) Manual — your responsibility Optional
Single API key Works out of the box Works Works
Model isolation Auto-enforced (brain2 auto-resolves to different model) Not enforced Not enforced
Token saving 4 mechanisms, zero-config Manual optimization Manual optimization
Flow control Fixed 3-phase pipeline with user approval gate Free-form chains/agents Configurable process
Best for Auditable quality. You need to trust the output. Maximum flexibility. You own the pipeline. Multi-agent simulations.

If you need maximum flexibility, use LangChain or CrewAI. If you need auditable quality where AI hallucinations have consequences — this is it.

Model Support

Production-tested

Provider Models API Key
OpenAI gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo OPENAI_API_KEY
Anthropic claude-3-5-sonnet-20241022, claude-3-5-haiku-20240620, claude-3-opus ANTHROPIC_API_KEY

Community-ready (OpenAI-compatible protocol — needs your verification)

These providers expose OpenAI-compatible APIs. The framework already supports them through models.yaml configuration. If you test one and it works, a PR moving it to "Production-tested" is more than welcome.

Provider Models API Key Status
DashScope (Alibaba) qwen-max, qwen-plus, qwen-turbo DASHSCOPE_API_KEY Needs verification
Zhipu GLM glm-4, glm-4-flash, glm-3-turbo ZHIPU_API_KEY Needs verification
Moonshot moonshot-v1-8k, moonshot-v1-32k, moonshot-v1-128k MOONSHOT_API_KEY Needs verification
DeepSeek deepseek-chat, deepseek-coder DEEPSEEK_API_KEY Needs verification
Ollama (local) llama3, qwen2.5-coder, ... None required Needs verification
Custom API custom-model CUSTOM_API_KEY + CUSTOM_BASE_URL Needs verification

Adding a new provider takes 3 steps in models.yaml — no Python code changes needed:

  1. Add provider config (base_url, api_key_env)
  2. Add model entries (name, context_window, pricing)
  3. Add fallback paths

See models.yaml for the full configuration structure.

Token-Saving Mechanisms

All four work out of the box, zero configuration:

Mechanism How Cost
Semantic cache Same expert+task combo hits cache, skips API call 0 API calls
Context compression History exceeds threshold -> auto-compress ~60% fewer input tokens
Local rule precheck Empty task / unknown expert / invalid complexity — rejected before any API call 0 cost
Smart skip Current step fails -> skip remaining; all remaining are low complexity -> skip; explicit skip_next flag 0 API calls

Quick Start

1. Install

git clone https://github.com/wdnmd1265/ai-flow-architect.git
cd ai-flow-architect
pip install -e .

2. Configure

cp .env.example .env

Single key (works out of the box):

OPENAI_API_KEY=sk-your-key
# brain2 auto-selects gpt-4o-mini — one API key is enough to start

Dual key (recommended for best quality):

OPENAI_API_KEY=sk-your-key
ANTHROPIC_API_KEY=sk-ant-your-key
# brain2 uses a Claude model — cross-provider arbitration is most effective

3. Run

import asyncio
from ai_flow_architect import FlowArchitect

async def main():
    # Single key: brain2 auto-resolves to gpt-4o-mini
    # Dual key: brain2 defaults to your Anthropic model
    architect = FlowArchitect(config={
        "brain1": "gpt-4o",
        # "brain2": optional — auto-selected if omitted
    })

    result = await architect.run("Design a user management system")

    if result["status"] == "success":
        print(f"Quality score: {result['audit_result'].get('score', 'N/A')}/100")
    else:
        for s in result.get("revision_suggestions", []):
            print(f"  - {s}")

asyncio.run(main())

4. What happens at runtime

============================================================
Task Blueprint
============================================================
Task ID: task_20260518_001
Description: Design a user management system
Estimated tokens: 5000

Steps:
  1. Requirements analysis [expert: evaluator]
     Task: Analyze functional and non-functional requirements...
  2. Architecture design [expert: creative]
     Task: Design system architecture and technical approach...
  3. Implementation [expert: programmer]
     Task: Implement core user management features...

Opponent Brain Review (Security perspective):
  - WARNING: Authentication flow lacks rate limiting
  - WARNING: Password hashing algorithm not specified — verify bcrypt/argon2

============================================================

[A]pprove / [R]eject + feedback / [C]ancel: A

5. Run tests

pip install pytest pytest-asyncio
pytest tests/unit/ -v    # 114 tests

Architecture Deep Dive

Three-Layer Prompt System

Each expert receives a three-layer prompt stack:

  1. Global base (hardcoded): "All output must be valid JSON, no fluff."
  2. Role preset (from ExpertConfig): Domain-specific instructions
  3. Brain #1 task directive (per-task): Specific instructions for the current task

This ensures output format consistency while allowing per-task customization. The scheduler checks for empty prompts at zero cost before any API call.

Three-Level Error Handling

Level Trigger Response
1 — Retry Timeout, rate limit (429), connection error Exponential backoff, up to 3 retries
2 — Fallback Model not found, auth error, quota exhausted Switch to backup model (e.g. gpt-4o -> gpt-4o-mini), 1 retry
3 — User decision All else fails Prompt user: [R]etry / re[P]lan / [T]erminate

Field Filtering

Each expert declares required_input_fields. The scheduler extracts only those fields from previous step results and passes them in — no full context dump, no information overload.

class ProgrammerExpert(BaseExpert):
    required_input_fields = {"architecture", "api_spec", "data_model"}
    # Scheduler extracts only these 3 fields from prior steps

Built-in Expert Roles

Role Expert ID Purpose
Creative creative Innovation, design solutions, brainstorming
Evaluator evaluator Requirement analysis, feasibility assessment
Programmer programmer Code implementation, technical solutions
Reviewer reviewer Code review, quality control

Create custom experts by subclassing BaseExpert and declaring required_input_fields + output_format.

Project Structure

ai-flow-architect/
├── src/ai_flow_architect/
│   ├── __init__.py              # Exports FlowArchitect
│   ├── core/
│   │   ├── architect.py         # Three-phase orchestration + user approval loop
│   │   ├── scheduler.py         # Serial execution + 4 token-saving mechanisms
│   │   ├── context.py           # Session CRUD + history compression
│   │   └── cache.py             # CRUD + TTL + hit stats
│   ├── brains/
│   │   ├── brain_one.py         # Brain #1: requirement analysis + blueprint generation
│   │   ├── brain_two.py         # Brain #2: quality arbitration (cross-model)
│   │   └── brain_opponent.py    # Opponent Brain: 5 adversarial review styles
│   ├── experts/
│   │   ├── base.py              # BaseExpert + ExpertConfig + three-layer prompts
│   │   ├── creative.py          # CreativeExpert
│   │   ├── evaluator.py         # EvaluatorExpert
│   │   ├── programmer.py        # ProgrammerExpert
│   │   └── reviewer.py          # ReviewerExpert
│   └── utils/
│       ├── llm_client.py        # Unified LLM client (8 providers)
│       ├── token_counter.py     # Token counting + cost estimation
│       ├── compressor.py        # Context compression (4 strategies)
│       └── validator.py         # Input validation
├── tests/unit/                  # 114 unit tests
├── examples/
│   └── basic_usage.py
├── docs/
│   └── getting_started.md
├── .env.example
├── pyproject.toml
└── models.yaml                  # Provider + model configuration

Roadmap

  • PyPI packagepip install ai-flow-architect
  • CLI interfaceai-flow run "design a system"
  • Expert execution layer — Real LLM calls in expert execute() (currently mock data)
  • Expert team templates — Pre-configured teams for web dev, data analysis, content creation
  • Web UI — Visual blueprint editor and execution monitor
  • DeepSeek verification — Most cost-effective provider, high priority for community validation
  • Model providers — OpenAI + Anthropic production-tested, 5 more via OpenAI-compatible protocol
  • Parallel execution — Independent steps run concurrently
  • Streaming output — Real-time expert output streaming

Contributing

Contributions are welcome — especially provider verification PRs. See CONTRIBUTING.md for guidelines.

git clone https://github.com/wdnmd1265/ai-flow-architect.git
cd ai-flow-architect
pip install -e .
pytest tests/unit/ -v

License

Apache License 2.0 — Copyright 2026 盛鑫


AI generates. AI challenges. You decide. This is how we solve hallucination.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_flow_architect-0.1.0.tar.gz (252.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_flow_architect-0.1.0-py3-none-any.whl (83.1 kB view details)

Uploaded Python 3

File details

Details for the file ai_flow_architect-0.1.0.tar.gz.

File metadata

  • Download URL: ai_flow_architect-0.1.0.tar.gz
  • Upload date:
  • Size: 252.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for ai_flow_architect-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a90da259384cb17c2c7eb52d4702df79c8e5c9781f391e13602b9fbf74077a4f
MD5 5939bf69fe3e193a9fe82e46d041be5f
BLAKE2b-256 ba2521618bda79456cf478bcb1febf26204d16528f7059e1b701c1323ca5fe91

See more details on using hashes here.

File details

Details for the file ai_flow_architect-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ai_flow_architect-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 882c029548f0074eadfc4a4066b2635ec952e2abaf403084b56db4828058ceac
MD5 0e0a61b950819572792bfa59c65be649
BLAKE2b-256 c8c411ec325619665b372d5bed8c91d1b9fb6c539234aabe3a057ae03c58609c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page