AI proposes. You decide. — Adversarial AI workflow engine with built-in quality arbitration

These details have not been verified by PyPI

Project links

Project description

AI Flow Architect

AI proposes. You decide.

Two AI brains debate your task. One proposes, the other challenges. You make the final call. Built-in quality arbitration — not an afterthought.

The Problem · How It Works · Comparison · Quick Start · Roadmap

AI Flow Architect — Dual-Brain Workflow

The Problem

You ask GPT-4 to design a user authentication system. It returns code that looks clean. You scan through it — APIs, database schema, middleware. Looks fine. You ship it.

What you didn't notice: the password hashing uses MD5 and there is no rate limiting on login endpoints.

You trusted a single AI and it hallucinated security. This happens constantly — not because AI is malicious, but because a single model has no mechanism to catch its own blind spots.

Existing approach	The flaw
Single-model prompting	Same model reviews its own output. Blind spots persist.
LangChain / CrewAI	Flexible orchestration, but quality control is your responsibility.
"Trust me bro"	Hoping the model gets it right this time.

How It Works

AI Flow Architect runs your task through two independent AI brains and makes their disagreement visible to you.

  You: "Design a user management system"
         |
         v
+--------------------+
| Brain #1 (Planner) |  Analyzes requirements, generates a step-by-step blueprint
| Model: GPT-4o      |  with risk annotations and alternative approaches
+--------+-----------+
         |
         v
+--------------------+
| Opponent Brain     |  Challenges the blueprint from adversarial perspectives:
| (5 review styles)  |  Security audit, cost analysis, user empathy, data rigor, minimalism
+--------+-----------+
         |
    [You review and approve the blueprint]
         |
         v
+--------------------+
| Expert Team        |  Each expert runs in an isolated session.
| Creative           |  No cross-contamination. Structured handoffs only.
| Evaluator          |
| Programmer         |
| Reviewer           |
+--------+-----------+
         |
         v
+--------------------+
| Brain #2 (Arbiter) |  Compares the blueprint against deliverables item-by-item.
| Model: Claude      |  Cross-model review. Different model = different blind spots.
+--------+-----------+
         |
         v
     You get: a quality report, not a gamble.

Key design decisions:

Single key works out of the box. Omit brain2 and it auto-selects a cheaper model from the same provider. One OpenAI key is enough to start.
Cross-provider is best. OpenAI + Anthropic gives the strongest arbitration — different training data, different failure modes.
Different models matter. Same-model self-review lets hallucinations through. The framework enforces model diversity for Brain #2.
Fixed workflow, not free orchestration. You trade flexibility for predictability. Every task follows the same quality-controlled pipeline.
Every expert is session-isolated. They don't know about each other. Data passes through structured fields only.
Opponent Brain before execution. Five adversarial perspectives challenge the plan before a single API call is wasted.

Comparison

	AI Flow Architect	LangChain	CrewAI
Philosophy	Adversarial quality control	Flexible pipeline composition	Role-based agent teams
Quality control	Built-in (dual-brain arbitration + opponent review)	Manual — your responsibility	Optional
Single API key	Works out of the box	Works	Works
Model isolation	Auto-enforced (brain2 auto-resolves to different model)	Not enforced	Not enforced
Token saving	4 mechanisms, zero-config	Manual optimization	Manual optimization
Flow control	Fixed 3-phase pipeline with user approval gate	Free-form chains/agents	Configurable process
Best for	Auditable quality. You need to trust the output.	Maximum flexibility. You own the pipeline.	Multi-agent simulations.

If you need maximum flexibility, use LangChain or CrewAI. If you need auditable quality where AI hallucinations have consequences — this is it.

Model Support

Production-tested

Provider	Models	API Key
OpenAI	gpt-4o, gpt-4o-mini, gpt-4-turbo, gpt-3.5-turbo	`OPENAI_API_KEY`
Anthropic	claude-3-5-sonnet-20241022, claude-3-5-haiku-20240620, claude-3-opus	`ANTHROPIC_API_KEY`

Community-ready (OpenAI-compatible protocol — needs your verification)

These providers expose OpenAI-compatible APIs. The framework already supports them through models.yaml configuration. If you test one and it works, a PR moving it to "Production-tested" is more than welcome.

Provider	Models	API Key	Status
DashScope (Alibaba)	qwen-max, qwen-plus, qwen-turbo	`DASHSCOPE_API_KEY`	Needs verification
Zhipu GLM	glm-4, glm-4-flash, glm-3-turbo	`ZHIPU_API_KEY`	Needs verification
Moonshot	moonshot-v1-8k, moonshot-v1-32k, moonshot-v1-128k	`MOONSHOT_API_KEY`	Needs verification
DeepSeek	deepseek-chat, deepseek-coder	`DEEPSEEK_API_KEY`	Needs verification
Ollama (local)	llama3, qwen2.5-coder, ...	None required	Needs verification
Custom API	custom-model	`CUSTOM_API_KEY` + `CUSTOM_BASE_URL`	Needs verification

Adding a new provider takes 3 steps in models.yaml — no Python code changes needed:

Add provider config (base_url, api_key_env)
Add model entries (name, context_window, pricing)
Add fallback paths

See models.yaml for the full configuration structure.

Token-Saving Mechanisms

All four work out of the box, zero configuration:

Mechanism	How	Cost
Semantic cache	Same expert+task combo hits cache, skips API call	0 API calls
Context compression	History exceeds threshold -> auto-compress	~60% fewer input tokens
Local rule precheck	Empty task / unknown expert / invalid complexity — rejected before any API call	0 cost
Smart skip	Current step fails -> skip remaining; all remaining are `low` complexity -> skip; explicit `skip_next` flag	0 API calls

Quick Start

1. Install

git clone https://github.com/wdnmd1265/ai-flow-architect.git
cd ai-flow-architect
pip install -e .

2. Configure

cp .env.example .env

Single key (works out of the box):

OPENAI_API_KEY=sk-your-key
# brain2 auto-selects gpt-4o-mini — one API key is enough to start

Dual key (recommended for best quality):

OPENAI_API_KEY=sk-your-key
ANTHROPIC_API_KEY=sk-ant-your-key
# brain2 uses a Claude model — cross-provider arbitration is most effective

3. Run

import asyncio
from ai_flow_architect import FlowArchitect

async def main():
    # Single key: brain2 auto-resolves to gpt-4o-mini
    # Dual key: brain2 defaults to your Anthropic model
    architect = FlowArchitect(config={
        "brain1": "gpt-4o",
        # "brain2": optional — auto-selected if omitted
    })

    result = await architect.run("Design a user management system")

    if result["status"] == "success":
        print(f"Quality score: {result['audit_result'].get('score', 'N/A')}/100")
    else:
        for s in result.get("revision_suggestions", []):
            print(f"  - {s}")

asyncio.run(main())

4. What happens at runtime

============================================================
Task Blueprint
============================================================
Task ID: task_20260518_001
Description: Design a user management system
Estimated tokens: 5000

Steps:
  1. Requirements analysis [expert: evaluator]
     Task: Analyze functional and non-functional requirements...
  2. Architecture design [expert: creative]
     Task: Design system architecture and technical approach...
  3. Implementation [expert: programmer]
     Task: Implement core user management features...

Opponent Brain Review (Security perspective):
  - WARNING: Authentication flow lacks rate limiting
  - WARNING: Password hashing algorithm not specified — verify bcrypt/argon2

============================================================

[A]pprove / [R]eject + feedback / [C]ancel: A

5. Run tests

pip install pytest pytest-asyncio
pytest tests/unit/ -v    # 114 tests

Architecture Deep Dive

Three-Layer Prompt System

Each expert receives a three-layer prompt stack:

Global base (hardcoded): "All output must be valid JSON, no fluff."
Role preset (from ExpertConfig): Domain-specific instructions
Brain #1 task directive (per-task): Specific instructions for the current task

This ensures output format consistency while allowing per-task customization. The scheduler checks for empty prompts at zero cost before any API call.

Three-Level Error Handling

Level	Trigger	Response
1 — Retry	Timeout, rate limit (429), connection error	Exponential backoff, up to 3 retries
2 — Fallback	Model not found, auth error, quota exhausted	Switch to backup model (e.g. gpt-4o -> gpt-4o-mini), 1 retry
3 — User decision	All else fails	Prompt user: [R]etry / re[P]lan / [T]erminate

Field Filtering

Each expert declares required_input_fields. The scheduler extracts only those fields from previous step results and passes them in — no full context dump, no information overload.

class ProgrammerExpert(BaseExpert):
    required_input_fields = {"architecture", "api_spec", "data_model"}
    # Scheduler extracts only these 3 fields from prior steps

Built-in Expert Roles

Role	Expert ID	Purpose
Creative	`creative`	Innovation, design solutions, brainstorming
Evaluator	`evaluator`	Requirement analysis, feasibility assessment
Programmer	`programmer`	Code implementation, technical solutions
Reviewer	`reviewer`	Code review, quality control

Create custom experts by subclassing BaseExpert and declaring required_input_fields + output_format.

Project Structure

ai-flow-architect/
├── src/ai_flow_architect/
│   ├── __init__.py              # Exports FlowArchitect
│   ├── core/
│   │   ├── architect.py         # Three-phase orchestration + user approval loop
│   │   ├── scheduler.py         # Serial execution + 4 token-saving mechanisms
│   │   ├── context.py           # Session CRUD + history compression
│   │   └── cache.py             # CRUD + TTL + hit stats
│   ├── brains/
│   │   ├── brain_one.py         # Brain #1: requirement analysis + blueprint generation
│   │   ├── brain_two.py         # Brain #2: quality arbitration (cross-model)
│   │   └── brain_opponent.py    # Opponent Brain: 5 adversarial review styles
│   ├── experts/
│   │   ├── base.py              # BaseExpert + ExpertConfig + three-layer prompts
│   │   ├── creative.py          # CreativeExpert
│   │   ├── evaluator.py         # EvaluatorExpert
│   │   ├── programmer.py        # ProgrammerExpert
│   │   └── reviewer.py          # ReviewerExpert
│   └── utils/
│       ├── llm_client.py        # Unified LLM client (8 providers)
│       ├── token_counter.py     # Token counting + cost estimation
│       ├── compressor.py        # Context compression (4 strategies)
│       └── validator.py         # Input validation
├── tests/unit/                  # 114 unit tests
├── examples/
│   └── basic_usage.py
├── docs/
│   └── getting_started.md
├── .env.example
├── pyproject.toml
└── models.yaml                  # Provider + model configuration

Roadmap

PyPI package — pip install ai-flow-architect
CLI interface — ai-flow run "design a system"
Expert execution layer — Real LLM calls in expert execute() (currently mock data)
Expert team templates — Pre-configured teams for web dev, data analysis, content creation
Web UI — Visual blueprint editor and execution monitor
DeepSeek verification — Most cost-effective provider, high priority for community validation
Model providers — OpenAI + Anthropic production-tested, 5 more via OpenAI-compatible protocol
Parallel execution — Independent steps run concurrently
Streaming output — Real-time expert output streaming

Contributing

Contributions are welcome — especially provider verification PRs. See CONTRIBUTING.md for guidelines.

git clone https://github.com/wdnmd1265/ai-flow-architect.git
cd ai-flow-architect
pip install -e .
pytest tests/unit/ -v

License

AI generates. AI challenges. You decide. This is how we solve hallucination.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.1

May 25, 2026

This version

0.1.0

May 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_flow_architect-0.1.0.tar.gz (252.6 kB view details)

Uploaded May 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ai_flow_architect-0.1.0-py3-none-any.whl (83.1 kB view details)

Uploaded May 19, 2026 Python 3

File details

Details for the file ai_flow_architect-0.1.0.tar.gz.

File metadata

Download URL: ai_flow_architect-0.1.0.tar.gz
Upload date: May 19, 2026
Size: 252.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for ai_flow_architect-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`a90da259384cb17c2c7eb52d4702df79c8e5c9781f391e13602b9fbf74077a4f`
MD5	`5939bf69fe3e193a9fe82e46d041be5f`
BLAKE2b-256	`ba2521618bda79456cf478bcb1febf26204d16528f7059e1b701c1323ca5fe91`

See more details on using hashes here.

File details

Details for the file ai_flow_architect-0.1.0-py3-none-any.whl.

File metadata

Download URL: ai_flow_architect-0.1.0-py3-none-any.whl
Upload date: May 19, 2026
Size: 83.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for ai_flow_architect-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`882c029548f0074eadfc4a4066b2635ec952e2abaf403084b56db4828058ceac`
MD5	`0e0a61b950819572792bfa59c65be649`
BLAKE2b-256	`c8c411ec325619665b372d5bed8c91d1b9fb6c539234aabe3a057ae03c58609c`

See more details on using hashes here.

ai-flow-architect 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AI proposes. You decide.

The Problem

How It Works

Comparison

Model Support

Production-tested

Community-ready (OpenAI-compatible protocol — needs your verification)

Token-Saving Mechanisms

Quick Start

1. Install

2. Configure

3. Run

4. What happens at runtime

5. Run tests

Architecture Deep Dive

Three-Layer Prompt System

Three-Level Error Handling

Field Filtering

Built-in Expert Roles

Project Structure

Roadmap

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes