A terminal-based AI agent that decomposes projects into epics, user stories, tasks, and sprint plans.
Project description
Scrum AI Agent
A terminal-based AI agent that takes a project and scrums it down — decomposing scope into epics, user stories, tasks, and sprint plans — interactively from the command line.
Built with LangGraph, LangChain, and Anthropic Claude. Supports OpenAI and Google as alternative LLM providers.
Quick Start
Homebrew (macOS)
brew tap omardin14/tap && brew install scrum-agent
scrum-agent --setup # configure your API key
scrum-agent # launch the interactive TUI
pipx
pipx install scrum-agent
scrum-agent --setup
scrum-agent
From source
git clone https://github.com/omardin14/scrum-planning-ai-agent.git
cd scrum-planning-ai-agent
make install # installs uv, creates venv, installs dependencies
make env # creates .env from .env.example — add your API key
make run # launch the CLI
Headless / CI mode
scrum-agent --non-interactive --description "Build a todo app" --output json
scrum-agent --non-interactive --description @project-brief.txt --output html --team-size 5
Table of Contents
- Quick Start
- Features
- Getting Started
- CLI Reference
- Intake Modes
- Pipeline
- Export Formats
- Session Management
- Tools
- Architecture
- Project Intake Questionnaire
- Scrum Standards
- Prompt Construction
- Guardrails
- Multi-Provider LLM Support
- Development
- Evaluation & Testing
- Agentic Blueprint Reference
Features
Interactive experience
- Full-screen TUI — animated splash screen, mode selection with ASCII art titles, session editor, pipeline progress with spinners and elapsed time
- Numbered review menus —
[1] Accept [2] Edit [3] Rejectat every pipeline checkpoint (no more accidental rejections from typos) - Rich table rendering — colour-coded priorities, capacity bars in sprint plans, discipline tags, DoD checklists
- Status bar — bottom toolbar shows project name, current phase, and session info
/compactand/verbosetoggle — switch between condensed and full-detail output- Terminal bell — notification after long operations (disable with
--no-bell) - Dark/light themes —
--theme lightfor white/cream terminal backgrounds
Smart intake
- Adaptive questioning — extracts answers from your initial description, skips redundant questions, probes vague answers with question-specific follow-ups
- 30-question discovery — seven phases covering project context, team capacity, tech stack, codebase, risks, preferences, and capacity planning
- Choice questions — 6 questions rendered as numbered selection menus (project type, sprint length, code hosting, repo structure, estimation style, output format)
defaultscommand — batch-accept all defaults for the current phase and skip ahead- Dynamic follow-up choices — vague-answer probes show 2–4 LLM-generated options as numbered menus
- Offline questionnaire — export a blank template, fill in at your own pace, import to skip interactive flow
- SCRUM.md context — drop a
SCRUM.mdfile in your project directory; the agent reads it to pre-fill answers and ground output scrum-docs/directory — place PRDs, design docs (.md, .txt, .rst, .pdf) here for automatic ingestion. PDF support viapymupdf(uv sync --extra pdf)
Pipeline & artifacts
- Human-in-the-loop — accept, edit, or reject output at every stage (features, stories, tasks, sprints)
- Task labels — auto-tagged Code, Documentation, Infrastructure, Testing with colour-coded display
- Test plans — auto-generated for Code and Infrastructure tasks (what to test: unit, integration, edge cases)
- AI coding prompts — ARC-structured instruction per task for Cursor, Claude Code, or Copilot, including project context and tech stack
- Documentation sub-tasks — auto-generated for stories with Documentation in their DoD, referencing Confluence/README URLs from intake
- Story titles — short summary titles shown in sprint views and exports (not just the epic name)
- Small project handling — projects with ≤2 sprints and ≤3 goals skip multi-epic decomposition; single sentinel epic named after the project
- Prompt quality rating — deterministic A/B/C/D grade from questionnaire completeness, with actionable suggestions
- Pipeline progress —
[2/5] Generating stories...with contextual spinner messages and elapsed time per step
Capacity planning
- Dynamic capacity — computes net velocity per sprint from gross capacity minus deductions
- Bank holiday detection — auto-detects public holidays in your planning window (100+ countries via
holidayspackage with 3-layer locale fallback) - PTO/leave tracking — per-person leave entry with date-based sub-loop, working-day calculation, per-sprint impact
- Capacity deductions — bank holidays, planned leave, unplanned absence %, onboarding ramp-up, KTLO/BAU allocation, discovery tax
- Per-sprint velocity — only bank-holiday-impacted sprints get reduced capacity; others keep full velocity
- Capacity overflow — 3 options when scope exceeds capacity: extend sprints (recommended), increase team size, or keep as-is (overloaded)
- Sprint highlighting — amber border on impacted sprints with holiday/PTO annotations and reduced capacity bars
- Context sources display — analysis screen shows ✓/✗ indicators for SCRUM.md, Repository, and Confluence
Integrations & export
- 4 export formats — Markdown, HTML (self-contained single-file report), JSON (structured, pipeable), and Jira (batch sync)
- Jira sync — idempotent batch creation: Features → Labels, Stories → linked to Epic, Tasks → Sub-tasks, Sprints → created with story assignment. Cascade creation (sprint stage auto-creates stories if not done). Handles team-managed and classic Jira projects
- 23 tools — GitHub, Azure DevOps, Jira, Confluence, local codebase scanning, bank holiday detection, LLM-powered estimation
- 3 LLM providers — Anthropic Claude (default), OpenAI GPT, Google Gemini
Reliability
- Session persistence — SQLite-backed sessions that survive terminal restarts; resume with
--resume - Non-interactive mode — headless pipeline for CI/CD workflows with JSON/HTML/Markdown output
- Input guardrails — prompt injection detection, length cap, profanity filter, off-topic classifier
- Output guardrails — story format validation, AC coverage checks, sprint capacity enforcement
- Rate-limit retry — exponential backoff with live countdown (5s → 10s → 20s, 3 retries)
- Per-session log files —
~/.scrum-agent/logs/, cleaned up on project deletion - Dry-run mode — full TUI with mock data and fake delays for UI development
Getting Started
Prerequisites
- Python 3.11+
- An API key for at least one LLM provider:
- Anthropic (recommended)
- OpenAI
- Google AI Studio
Installation (development)
make install # installs uv, creates venv, installs dependencies
make env # creates .env from .env.example
make pre-commit # installs pre-commit hooks
First-run setup wizard
On first launch (or with --setup), an interactive wizard walks you through:
- LLM provider selection — choose Anthropic, OpenAI, or Google
- API key entry — with format validation hints (e.g., Anthropic keys start with
sk-ant-) - Credential storage — saved to
~/.scrum-agent/.env
scrum-agent --setup # re-run anytime to update credentials
API keys
Anthropic (default)
ANTHROPIC_API_KEY=sk-ant-...
OpenAI (alternative)
LLM_PROVIDER=openai
OPENAI_API_KEY=sk-...
Google (alternative)
LLM_PROVIDER=google
GOOGLE_API_KEY=AIza...
LangSmith (optional)
LangSmith provides tracing and observability. Add to .env:
LANGSMITH_TRACING=true
LANGSMITH_API_KEY=lsv2_pt_...
LANGSMITH_PROJECT=scrum-agent
CLI Reference
scrum-agent [OPTIONS]
Interactive modes
| Flag | Description |
|---|---|
| (no flags) | Launch the full-screen TUI with mode selection |
--quick |
Quick intake — 2 questions only (team size + tech stack), auto-fill rest |
--full-intake |
Full 30-question intake (standard mode) |
--mode project-planning |
Skip the startup menu, go directly to project planning |
--questionnaire PATH |
Import a filled-in questionnaire Markdown file |
--export-only |
Auto-accept all review checkpoints and exit after plan generation |
Non-interactive / headless
| Flag | Description |
|---|---|
--non-interactive |
Run headlessly (requires --description) |
--description TEXT |
Project description. Use @file.txt to read from a file |
--output {markdown,json,html} |
Output format (default: markdown). Only valid with --non-interactive or --export-only |
--team-size N |
Team size (maps to intake Q6) |
--sprint-length {1,2,3,4} |
Sprint length in weeks (maps to intake Q8) |
Session management
| Flag | Description |
|---|---|
--resume [ID] |
Resume a session. No argument = interactive picker. latest = most recent. Or pass a session ID |
--list-sessions |
List all saved sessions and exit |
--clear-sessions |
Interactively delete saved sessions |
Configuration
| Flag | Description |
|---|---|
--setup |
Re-run the first-time setup wizard |
--theme {dark,light} |
Terminal colour theme (default: dark) |
--no-bell |
Disable terminal bell after pipeline steps |
--dry-run |
Run TUI with mock data and fake delays — no LLM calls |
--version |
Print version and exit |
Questionnaire export
| Flag | Description |
|---|---|
--export-questionnaire [PATH] |
Export a blank questionnaire template as Markdown |
In-session commands
These commands are available at the scrum> prompt during an interactive session:
| Command | Description |
|---|---|
help, ? |
Show available commands |
skip |
Skip the current intake question (uses a sensible default) |
defaults |
Apply defaults for all remaining questions in the current phase |
export |
Export current artifacts as HTML report + Markdown |
/compact |
Switch to compact output (hide secondary columns) |
/verbose |
Switch to verbose output (full detail, default) |
/resume |
Load a previously saved session |
/clear |
Delete saved sessions (pick one or all) |
Q6: answer |
Edit Q6 inline from the summary |
edit Q6 |
Re-answer Q6 interactively from the summary |
exit, quit |
Exit the agent |
Ctrl+C, Ctrl+D |
Exit the agent |
The status bar at the bottom of the terminal shows project name, current phase, and session info. It updates automatically as you progress through the pipeline.
Examples
scrum-agent # interactive TUI (recommended)
scrum-agent --quick # quick intake (2 questions only)
scrum-agent --full-intake # full 30-question intake
scrum-agent --questionnaire intake.md # import pre-filled questionnaire
scrum-agent --export-only --quick # non-interactive, auto-accept all
scrum-agent --resume # resume last session (picker)
scrum-agent --resume latest # resume most recent session
scrum-agent --list-sessions # list all saved sessions
scrum-agent --clear-sessions # delete saved sessions
scrum-agent --non-interactive --description "Build X" # headless mode
scrum-agent --non-interactive --description @brief.txt --output json # JSON to stdout
scrum-agent --dry-run # TUI with mock data
Intake Modes
The agent supports four intake modes, each balancing thoroughness with speed.
Smart mode (default)
The recommended mode. The agent:
- Reads your initial project description and extracts answers to as many questions as possible
- Asks only the remaining essential questions (typically 2–4)
- Uses answer provenance tracking to tag how each answer was obtained:
DIRECT— you explicitly answeredEXTRACTED— parsed from your initial descriptionDEFAULTED— filled with a sensible defaultPROBED— filled via a targeted follow-up questionSCRUM_MD— loaded from aSCRUM.mdfile in the current directory
- Applies conditional essentials — questions that only appear when relevant (e.g., "What are their roles?" only asked after you give a team size)
- Runs cross-question validation — catches contradictions (e.g., team size of 1 but multiple roles listed)
- Generates adaptive follow-ups using question-specific templates (not generic "tell me more")
Quick mode (--quick)
Two questions only: team size and tech stack. Everything else gets sensible defaults. Best for rapid prototyping or CI pipelines.
Standard mode (--full-intake)
Six questions are rendered as numbered selection menus instead of free text:
| Q | Topic | Options |
|---|---|---|
| Q2 | Project type | Greenfield / Existing codebase / Hybrid |
| Q8 | Sprint length | 1 week / 2 weeks (default) / 3 weeks / 4 weeks |
| Q16 | Code hosting | GitHub / Azure DevOps / GitLab / Bitbucket / Local |
| Q18 | Repo structure | Monorepo / Multi-repo / Microservices / Monolith |
| Q24 | Estimation style | Fibonacci points / T-shirt sizes / No estimates |
| Q26 | Output format | Jira / Markdown / Both |
Type defaults at any question to batch-accept all defaults for the current phase and skip ahead.
All 30 questions asked one-at-a-time in a conversational flow. Seven phases:
- Project Context (Q1–Q5) — name, type, goals, users, deadlines
- Team & Capacity (Q6–Q10) — engineers, roles, sprint length, velocity, target sprints
- Technical Context (Q11–Q14) — tech stack, integrations, constraints, docs
- Codebase Context (Q15–Q20) — repo, structure, CI/CD, tech debt
- Risks & Unknowns (Q21–Q23) — risks, blockers, out-of-scope
- Preferences (Q24–Q26) — estimation, DoD, output format
- Capacity Planning (Q27–Q30) — sprint selection, bank holidays, unplanned absence %, onboarding
Offline import (--questionnaire)
- Export a blank template:
scrum-agent --export-questionnaire - Fill it in at your own pace in any editor
- Import:
scrum-agent --questionnaire intake.md - Review the summary and confirm before proceeding
The format is round-trippable (export → edit → import preserves answers exactly).
SCRUM.md context
Drop a SCRUM.md file in your project directory with any relevant context — project notes, design decisions, URLs, architecture diagrams. The agent reads it automatically and uses it to pre-fill answers and ground its output. Answers extracted from SCRUM.md are tagged with *(from SCRUM.md)* provenance markers in the intake summary. Your typed description always takes priority over SCRUM.md when both provide the same information.
scrum-docs/ directory
Place PRDs, design docs, or reference material in a scrum-docs/ directory. Supported formats: .md, .txt, .rst, .pdf. PDF support requires the pymupdf optional dependency:
uv sync --extra pdf
Files are automatically ingested and fed into the project analyzer for grounded output.
Pipeline
After intake confirmation, the agent runs a 5-stage pipeline with a human-in-the-loop checkpoint after each stage:
Project Intake → Project Analyzer → Feature Generator → Story Writer → Task Decomposer → Sprint Planner
│ │ │ │
[accept/edit/reject] [accept/edit/reject] [accept/edit/reject] [accept/edit/reject]
| Stage | What it does |
|---|---|
| Project Analyzer | Synthesizes all 30 intake answers into a structured ProjectAnalysis — name, type, goals, tech stack, constraints, risks, out-of-scope |
| Feature Generator | Decomposes the analysis into high-level features with priorities (Critical/High/Medium/Low) |
| Story Writer | Breaks features into user stories with persona/goal/benefit format, Given/When/Then acceptance criteria, Fibonacci story points (1–8, auto-split if >8), discipline tagging, and Definition of Done flags |
| Task Decomposer | Breaks stories into concrete tasks with labels (Code/Documentation/Infrastructure/Testing), test plans, and AI coding prompts. Auto-generates documentation sub-tasks for stories with Documentation in their DoD |
| Sprint Planner | Allocates stories to sprints respecting per-sprint net velocity (deducted for bank holidays, PTO, unplanned absence, onboarding, KTLO). Handles capacity overflow with 3 options: extend sprints, increase team, or keep as-is |
At each checkpoint, you can:
[1] Accept— proceed to the next stage[2] Edit— modify specific artifacts inline[3] Reject— re-generate with your feedback
Task enrichment
Every task generated by the Task Decomposer includes:
| Field | Description |
|---|---|
| Label | Auto-tagged: Code, Documentation, Infrastructure, or Testing — colour-coded in all views |
| Test plan | Auto-generated for Code and Infrastructure tasks — lists what to test (unit, integration, edge cases) |
| AI prompt | ARC-structured instruction for Cursor/Claude Code/Copilot, including project name, tech stack, and specific guidance |
Stories with "Documentation" marked as applicable in their DoD get a consolidated documentation sub-task referencing Confluence/README URLs from intake.
Small project handling
For projects with ≤2 sprints and ≤3 goals, the analyzer sets skip_epics. Instead of multi-epic decomposition, a single sentinel epic is created using the project name as its title. The rest of the pipeline (stories, tasks, sprints) proceeds normally.
Prompt quality rating
After intake, the analysis review screen shows a deterministic quality rating:
- Letter grade (A/B/C/D) with percentage score
- Breakdown: answered, extracted, defaulted, skipped, probed counts
- Actionable suggestions: "Add a SCRUM.md file", "Answer Q11 (tech stack) for better stories", etc.
- Low-confidence areas: defaulted essential questions flagged for downstream spike recommendations
Export Formats
Markdown (default)
Writes scrum-plan.md with all artifacts structured as headings, tables, and lists.
scrum-agent --export-only --quick
HTML
Self-contained single-file HTML report with embedded CSS, collapsible sections, and a table of contents. No external dependencies.
scrum-agent --non-interactive --description "Build a todo app" --output html
JSON
Clean, pipeable JSON schema for CI/CD integration. No internal state fields — just the plan artifacts:
{
"version": "1.0.0",
"project": { "name", "description", "type", "goals", "tech_stack", "team_size", "sprint_length_weeks" },
"features": [...],
"stories": [...],
"tasks": [...],
"sprints": [...]
}
scrum-agent --non-interactive --description "Build a todo app" --output json | jq '.stories | length'
When using --output json, Rich console output goes to stderr so stdout is clean JSON.
Jira
Batch sync with idempotent creation, available from TUI pipeline review at any stage or from the project list:
| Artifact | Jira Mapping |
|---|---|
| Features | Jira Labels (not separate issues) |
| Epic | 1 project-level Epic |
| Stories | Issues linked to Epic, with story points, priority, acceptance criteria, feature labels |
| Tasks | Sub-tasks linked to parent Stories, with task labels |
| Sprints | Created with name, goal, dates; stories assigned to sprints |
Key behaviors:
- Idempotency — checks
jira_*_keysstate before creating; skips already-synced artifacts - Cascade creation — Task stage auto-creates Stories if not yet synced; Sprint stage auto-creates Stories if not yet synced
- Project type detection — discovers issue types dynamically (handles team-managed vs. classic Jira projects)
- Confirmation screen — shows what will be created/skipped before any write operation
- Progress screen — animated per-item status during creation
- Jira button — disabled/dimmed in TUI when
JIRA_API_TOKENis not configured
Session Management
Sessions are persisted to SQLite at ~/.scrum-agent/sessions.db. Every terminal session gets a unique ID (new-<8hex>-<YYYY-MM-DD>) and a human-readable display name derived from the project slug (todoapp-2026-03-19).
Resume a session
scrum-agent --resume # interactive picker
scrum-agent --resume latest # most recent session
scrum-agent --resume <id> # specific session ID
Resumed sessions pick up exactly where you left off — mid-questionnaire, mid-review, or between pipeline stages.
List sessions
scrum-agent --list-sessions
Shows a table with project name, date, last completed step, and session ID.
Delete sessions
scrum-agent --clear-sessions
Interactive picker to delete one session or clear all.
Auto-pruning
Sessions older than 30 days are auto-pruned at startup. Configure via SESSION_PRUNE_DAYS in .env (set to 0 to disable).
Tools
The agent has access to 23 tools, organized by integration:
GitHub
| Tool | Description |
|---|---|
github_read_repo |
Fetch repo metadata, languages, and file tree |
github_read_file |
Read a single file from a GitHub repo |
github_list_issues |
List open issues with labels |
github_read_readme |
Extract README content |
Azure DevOps
| Tool | Description |
|---|---|
azdevops_read_repo |
Fetch repo metadata |
azdevops_read_file |
Read a single file |
azdevops_list_work_items |
List work items |
Jira
| Tool | Description | Risk |
|---|---|---|
jira_read_board |
Fetch board metadata and configuration | Low |
jira_fetch_velocity |
Get team velocity history (rolling average of last 3–5 sprints, with JQL fallback for team-managed boards) | Low |
jira_fetch_active_sprint |
Get current sprint info for sprint selection (Q27) | Low |
jira_create_epic |
Create an epic | High (requires confirmation) |
jira_create_story |
Create a story with ACs and story points | High (requires confirmation) |
jira_create_sprint |
Create and configure a sprint | High (requires confirmation) |
Confluence
| Tool | Description | Risk |
|---|---|---|
confluence_search_docs |
Search documentation by keyword | Low |
confluence_read_page |
Read a wiki page by ID | Low |
confluence_read_space |
Read space metadata and page list | Low |
confluence_create_page |
Create a new page | High (requires confirmation) |
confluence_update_page |
Update an existing page | High (requires confirmation) |
Local codebase
| Tool | Description |
|---|---|
read_codebase |
Scan entire local repo — language detection, file tree (budget-limited, auto-collapses large dirs), skips binaries and build artifacts |
read_local_file |
Read a specific file from disk (targeted retrieval when the LLM needs to inspect particular files) |
load_project_context |
High-level codebase overview including scrum-docs/ PRD/design doc ingestion |
Calendar
| Tool | Description |
|---|---|
detect_bank_holidays |
Detect public holidays in the planning window (auto-fills Q28) |
LLM-powered
| Tool | Description |
|---|---|
estimate_complexity |
Analyze code/requirements for story point estimation |
generate_acceptance_criteria |
Generate Given/When/Then acceptance criteria |
Tool risk levels
| Risk | Guardrail |
|---|---|
| Low (read-only) | Auto-execute |
| Medium (LLM-powered) | Log and display to user |
| High (write operations) | Requires explicit user confirmation |
Architecture
Four Layers
| Layer | Implementation |
|---|---|
| Interface | Full-screen TUI with animated splash, mode selection, session editor, pipeline progress, streaming output, and dark/light themes |
| Prompt Construction | Scrum Master persona, ARC-structured prompts per node, few-shot examples, adaptive question templates |
| Model | Anthropic Claude (primary), OpenAI GPT, Google Gemini — swappable via LLM_PROVIDER env var |
| Data & Storage | SQLite session store (~/.scrum-agent/sessions.db), optional Jira/Confluence integration |
Three Design Principles
- Robust Infrastructure — agent frameworks (LangChain, LangGraph), graceful rate-limit retry with exponential backoff, crash-safe session persistence
- Modularity — decoupled CLI/TUI/REPL/agent/tools/prompts, one concern per module, UI system with 4 subsystems
- Continuous Evaluation — golden dataset evaluators, contract tests with VCR cassettes, token budget monitoring
Agent Graph (LangGraph)
Auto-generated via make graph:
START → project_intake → [questionnaire loop] → project_analyzer → feature_generator → [human review]
│
┌───────────────────┘
▼
story_writer → [human review]
│
▼
task_decomposer → [human review]
│
▼
sprint_planner → [human review]
│
▼
jira_sync → END
Node Descriptions
| Node | Responsibility |
|---|---|
| Project Intake | Runs the discovery questionnaire (smart/standard/quick mode) to gather all project context |
| Project Analyzer | Synthesizes questionnaire answers into a structured ProjectAnalysis with name, type, goals, tech stack, constraints, and risks |
| Feature Generator | Decomposes the analysis into high-level features with priority levels. For small projects (≤2 sprints, ≤3 goals), creates a single sentinel epic instead |
| Story Writer | Breaks features into user stories with persona/goal/benefit, short titles, Given/When/Then acceptance criteria, Fibonacci story points (auto-split >8), discipline tagging, DoD flags, and points rationale |
| Task Decomposer | Breaks stories into concrete tasks with auto-tagged labels (Code/Documentation/Infrastructure/Testing), test plans for code tasks, AI coding prompts, and dedicated documentation sub-tasks |
| Sprint Planner | Allocates stories to sprints using per-sprint net velocity (bank holidays, PTO, unplanned %, onboarding, KTLO deducted). Handles capacity overflow with 3 options. Highlights impacted sprints |
| Jira Sync | Pushes the finalized plan to Jira with idempotent batch creation: Features → Labels, Stories → linked to Epic, Tasks → Sub-tasks, Sprints → created and assigned |
The ReAct Loop
The foundational reasoning pattern:
Thought → Action → Observation → (repeat until done)
- Thought — reason about the current state and what to do next
- Action — call a tool or take a step
- Observation — see the result, decide whether to continue or answer
TUI System
The ui/ package provides a full-screen terminal UI with four subsystems:
| Subsystem | Purpose |
|---|---|
mode_select/ |
Full-screen mode selection with ASCII art titles, project cards with pipeline progress indicators, project list with half-card peek stubs at viewport edges |
provider_select/ |
LLM and tool provider selection (block-character logos for Claude/GPT/Gemini), issue tracking setup, verification flow |
session/ |
Main session UI — description input, intake questions, summary review, pipeline stages with artifact editing, Jira export confirmation/progress screens, chat. Dry-run mode with mock data |
shared/ |
Animations (typewriter, pulse), ASCII font rendering, reusable components, mouse scroll handling |
Visual features:
- Rounded borders with consistent padding and arrow-key navigation
- Sticky group headers — epic titles pin at top when scrolling, with decryption-style morph animation between sections
- Scrollbar — vertical
│track with┃thumb for pipeline stages and summary review - Capacity bars — per-sprint with reduced velocity for bank-holiday/PTO-impacted sprints (amber border + annotations)
- Project cards — one-shot white pulse animation on Enter, pipeline progress badges
The repl/ package is the legacy REPL kept for backwards compatibility and CLI-flag-driven flows (--quick, --full-intake, --questionnaire, --mode).
State Schema
- ScrumState is a
TypedDict(LangGraph convention for graph state) messagesusesAnnotated[list[BaseMessage], add_messages]for append semantics- Frozen dataclasses for artifacts —
Feature,UserStory,Task,Sprint,ProjectAnalysis(immutable once created, serializable viaasdict()) - Mutable dataclass for
QuestionnaireState— updated incrementally by the intake node - Artifact lists use
Annotated[list[...], operator.add]so nodes can return items that get appended
Agent Classification
| Property | Value |
|---|---|
| Agency Level | Level 3–4 (self-looping + multi-agent coordination) |
| Reasoning Pattern | ReAct (Thought → Action → Observation → repeat) |
| Interface | Terminal CLI (full-screen TUI + legacy REPL) |
| Domain | Scrum project management |
Project Intake Questionnaire
Before generating any Scrum artifacts, the agent runs a structured discovery phase — asking the user questions one at a time in a conversational flow. This is the "flipped prompt" technique: the agent gathers what it needs before it acts.
Questionnaire Flow
The agent asks these questions sequentially. Each question is asked individually, the user responds, and the agent moves to the next. The agent adapts follow-up questions based on previous answers.
Phase 1 — Project Context
| # | Question | Why the Agent Needs This |
|---|---|---|
| 1 | What is the project? Describe it in a few sentences, or point me to a repo/doc. | Establishes the core scope and domain |
| 2 | Is this a greenfield project or are you building on an existing codebase? | Determines whether the agent should scan existing code, and whether there's legacy complexity |
| 3 | What problem does this project solve? Who are the end users? | Grounds epic/story generation in real user needs rather than abstract features |
| 4 | What does "done" look like? What's the end-state you're targeting? | Defines the finish line — prevents scope creep and gives the agent a clear goal to decompose toward |
| 5 | Are there any hard deadlines or milestones? | Constrains the sprint plan; the agent needs to know if time is fixed |
Phase 2 — Team & Capacity
| # | Question | Why the Agent Needs This |
|---|---|---|
| 6 | How many engineers are working on this? | Directly affects sprint capacity and parallelism of work |
| 7 | What are the roles on the team? (e.g., 2 backend, 1 frontend, 1 fullstack) | Lets the agent tag stories by discipline and balance sprint workload across skillsets |
| 8 | How long are your sprints? (e.g., 1 week, 2 weeks) | Required for sprint planning — determines how many points fit per sprint |
| 9 | Do you have a known velocity from previous sprints? If yes, what is it? | If available, the agent uses real velocity; otherwise it defaults to 5 points per engineer per sprint |
| 10 | How many sprints are you targeting to complete this project? | Bounds the total effort and forces prioritization if scope exceeds capacity |
Phase 3 — Technical Context
| # | Question | Why the Agent Needs This |
|---|---|---|
| 11 | What is the tech stack? (languages, frameworks, databases, infra) | Stories and tasks need to be written in terms the team actually works with |
| 12 | Are there any existing APIs, services, or third-party integrations involved? | Identifies external dependencies that create stories of their own (auth, payments, etc.) |
| 13 | Are there any architectural constraints or decisions already made? (e.g., must use microservices, must deploy to AWS) | Prevents the agent from suggesting work that contradicts fixed decisions |
| 14 | Is there any existing documentation, PRDs, or design docs I should reference? | The agent can ingest these for grounded story generation |
Phase 3a — Codebase Context
| # | Question | Why the Agent Needs This |
|---|---|---|
| 15 | Does the project have an existing codebase, or is this a new build? | Determines whether the agent needs to account for existing code, migrations, and legacy constraints |
| 16 | Where is the code hosted? (GitHub, Azure DevOps, GitLab, Bitbucket, local only) | Tells the agent which source control tool to use for repo scanning |
| 17 | Can you share the repo URL(s)? (the agent can connect and scan the repo for context) | Enables the agent to read repo structure, key files, and README to ground its output |
| 18 | How is the repo structured? (monorepo, multi-repo, microservices, monolith) | Affects how the agent decomposes work |
| 19 | Is there an existing CI/CD pipeline or deployment setup? | Identifies whether DevOps stories are needed |
| 20 | Is there any known technical debt? (legacy code, outdated dependencies, areas needing refactoring) | Surfaces refactoring stories and constraints |
Phase 4 — Risks & Unknowns
| # | Question | Why the Agent Needs This |
|---|---|---|
| 21 | Are there any areas of the project you're uncertain or worried about? | The agent flags these as spike stories or high-risk items |
| 22 | Are there any known blockers or dependencies on external teams/systems? | Creates blocked/dependency stories and affects sprint ordering |
| 23 | Is there anything that's explicitly out of scope? | Prevents generating stories for work the team won't do |
Phase 5 — Preferences & Process
| # | Question | Why the Agent Needs This |
|---|---|---|
| 24 | How do you want stories estimated? (Fibonacci story points, T-shirt sizes, or no estimates) | Configures the output format |
| 25 | Do you have a Definition of Done the team follows? | Incorporated into acceptance criteria validation |
| 26 | Do you want the output pushed to Jira, exported as Markdown, or both? | Determines the final step of the pipeline |
Phase 6 — Capacity Planning
| # | Question | Why the Agent Needs This |
|---|---|---|
| 27 | Which sprint are you planning for? | Anchors the planning window. Auto-detected from Jira active sprint if configured; otherwise presented as a choice question |
| 28 | How many bank holidays fall within your planning window? | Deducts from gross capacity. Auto-detected via detect_bank_holidays tool (100+ countries, 3-layer locale fallback: Jira timezone → shell locale → GB default). User can override |
| 29 | What percentage of capacity is typically lost to unplanned absences? (default: 10%) | Real feature capacity is ~24% of gross after all deductions (based on analysis of capacity planning templates) |
| 30 | Are any engineers currently onboarding or ramping up? | Reduces individual capacity during ramp-up sprints |
After Q28 (bank holidays), the agent asks about planned leave (PTO):
- Per-person entry with name, start date, and end date (DD/MM/YYYY format with validation)
- Dates outside the planning window are rejected
- Working-day calculation excludes weekends
- Summary shown after each entry with option to add more
- Quick mode skips PTO (defaults to 0)
Adaptive Behavior
The questionnaire is not rigid — the agent adapts:
- Skips questions the user already answered. If your initial description included "we're a team of 4 using React and Node", the agent won't re-ask team size or tech stack.
- Extracts answers from descriptions. Keyword matching detects project type (greenfield/existing), integrations (Stripe, Auth0), and infrastructure constraints (Kubernetes, microservices).
- Uses conditional essentials. Q7 (team roles) only appears if Q6 (team size) was answered. Q12 (integrations) only if Q11 (tech stack) was answered.
- Asks targeted follow-ups. Instead of generic "tell me more", the agent uses question-specific probing templates.
- Validates across questions. Catches contradictions — e.g., team size of 1 but multiple roles listed.
- Adapts question text. "You said 5 engineers — what are their roles?" instead of static text.
- Allows "skip" and "I don't know". Proceeds with reasonable defaults and flags assumptions.
- Summarizes before proceeding. After all questions, presents a structured summary for confirmation.
Intake Summary Output
After the questionnaire, the agent produces a structured summary:
Here's what I understand about your project:
Project: E-commerce platform redesign
Type: Existing codebase (monolith → microservices migration)
End Users: Online shoppers, internal warehouse staff
Target State: Fully migrated to microservices with new checkout flow
Team: 5 engineers (2 backend, 2 frontend, 1 devops)
Sprint Length: 2 weeks
Velocity: 25 pts/sprint (default: 5 × 5 engineers, no historical data)
Target Sprints: 6 sprints (12 weeks)
Tech Stack: Python/FastAPI, React, PostgreSQL, AWS ECS
Integrations: Stripe (payments), SendGrid (email), existing REST API
Constraints: Must maintain backward compat with mobile app v2.x
Risks:
- Payment flow migration (high complexity, Stripe webhook changes)
- No clear spec for warehouse dashboard requirements
Out of Scope: Mobile app redesign, analytics pipeline
Output: Jira + Markdown export
Does this look right? [Confirm / Edit]
Only after the user confirms does the agent proceed to feature generation.
Scrum Standards
These are the team's codified practices. The agent enforces all of these when generating and validating Scrum artifacts.
1. Issue Hierarchy
| Level | What It Represents | Scope | Example |
|---|---|---|---|
| Epic | A large body of work representing the big picture. Can span months or multiple sprints. | The Why of the project | "Customer Self-Service Portal" |
| Feature | A significant piece of functionality that contributes to the big picture. Can span multiple sprints. | The What we're building | "Subscription Management" |
| User Story | A smaller, well-defined unit of work. Must be completable within a single sprint. | The How of the project | "As a customer, I want to upgrade my plan" |
| Sub-Task | A breakdown of a story into manageable, assignable parts. | Implementation detail | "Add upgrade endpoint to billing API" |
| Spike | A time-boxed research task to reduce uncertainty before delivery work begins. | Learning & discovery | "Investigate Stripe webhook reliability" |
2. User Stories
Format
User stories follow this structure:
"As a [persona], I want to [goal], so that [benefit]."
Breaking It Down
| Part | What It Means | Guidance |
|---|---|---|
| As a [persona] | Who are we building this for? Not a job title — a real persona the team understands with empathy. | The team should have a shared understanding of this person — how they work, think, and feel. |
| I want to [goal] | What is the user actually trying to achieve? Describes intent, not features. | Must be implementation-free. If you're describing UI elements instead of the user's goal, you're missing the point. |
| So that [benefit] | How does this fit into their bigger picture? What problem does it solve? | Ties the story back to real value and helps define when the story is truly done. |
Examples
- As Max, I want to invite my friends, so we can enjoy this service together.
- As Sascha, I want to organise my work, so I can feel more in control.
- As a manager, I want to understand my colleagues' progress, so I can better report our successes and failures.
Story Point Rules
| Rule | Detail |
|---|---|
| Scale | Fibonacci: 1, 2, 3, 5, 8 |
| Maximum | 8 points per story. If estimated above 8, the story must be split. |
| What points measure | Relative complexity and effort, not hours. |
| Default velocity | When no historical data exists: 5 points per engineer per sprint. |
| Sprint capacity | Stories are allocated to sprints without exceeding capacity (engineers x 5 or known velocity). |
Velocity Calculation Examples:
| Scenario | Calculation | Sprint Capacity |
|---|---|---|
| 3 engineers, no known velocity | 3 × 5 | 15 pts/sprint |
| 5 engineers, no known velocity | 5 × 5 | 25 pts/sprint |
| 4 engineers, known velocity of 30 | Use 30 directly | 30 pts/sprint |
Auto-Split Example:
If the agent estimates "Build the full payment integration" at 13 points:
This story exceeds the 8-point maximum. Splitting:
Original: Build the full payment integration (13 pts)
Split into:
US-010: Set up Stripe SDK and payment intent flow (5 pts)
US-011: Build webhook handler for payment events (5 pts)
US-012: Add payment error handling and retry logic (3 pts)
Total: 13 pts across 3 stories (all ≤ 8)
[Accept split / Edit / Reject]?
Discipline Tagging
Every story is tagged with the primary discipline needed:
| Discipline | Description |
|---|---|
frontend |
UI/UX implementation |
backend |
API, business logic, data |
fullstack |
Spans both (default fallback) |
infrastructure |
DevOps, CI/CD, deployment |
design |
UX research, visual design |
testing |
QA, test automation |
Story Checklist
Before a story is considered ready for sprint planning, it must have:
- Clear persona identified
- Goal is implementation-free and user-focused
- Benefit ties to real business or user value
- Acceptance criteria written (see Acceptance Criteria)
- Story points estimated (1–8 range)
- Dependencies identified and linked
- Fits within a single sprint
3. Acceptance Criteria
What They Are
Acceptance criteria are clear, concise, and testable statements that define the conditions a user story must meet to be accepted by stakeholders and considered "Done." They are the source of truth for developers, testers, and product stakeholders.
Purpose
- Clarify the scope of a user story
- Ensure shared understanding between product, platform, and stakeholders
- Provide a basis for test cases
- Define the boundaries of success
Acceptance criteria describe what should happen, not how it's implemented. They avoid technical specifics and focus on the desired outcome.
Key Characteristics
| Characteristic | Description |
|---|---|
| Clear | Easy to understand, no ambiguity |
| Concise | No unnecessary details or fluff |
| Testable | Verifiable through manual or automated testing |
| Outcome-Oriented | Focused on the end result, not the implementation approach |
| Consistent | Written in a standardised format (Given/When/Then) |
Format: Given / When / Then
All acceptance criteria use the Given / When / Then format:
Given [precondition]
When [action]
Then [expected outcome]
Examples
Reset Password
User Story: As a user, I want to reset my password so that I can regain access to my account.
Given I am on the password reset page
When I enter my registered email and click "Send Reset Link"
Then I should see a confirmation message saying "Reset link sent to your email"
Form Validation
User Story: As a user, I want to be informed when I submit an invalid phone number.
Given I enter an invalid phone number
When I try to submit the form
Then I should see an error message saying "Please enter a valid phone number"
Negative / Edge Case
User Story: As a user, I want to be prevented from registering with an already-used email.
Given I am on the registration page
When I enter an email that is already registered and click "Sign Up"
Then I should see an error message saying "An account with this email already exists"
And no duplicate account should be created
Coverage Requirements
Every story must have acceptance criteria covering:
| Scenario Type | What It Covers | Required? |
|---|---|---|
| Happy path | The expected, successful flow | Yes |
| Negative path | Invalid input, denied access, failures | Yes |
| Edge cases | Boundary conditions, empty states, max limits | Where applicable |
| Error states | What the user sees when something goes wrong | Yes |
Common Pitfalls
| Pitfall | Why It's a Problem |
|---|---|
| Writing implementation details (e.g., "Use React component X") | Criteria should be tech-agnostic and outcome-focused |
| Vague language (e.g., "It should work properly") | Not testable — what does "properly" mean? |
| Skipping negative scenarios and edge cases | Leaves gaps that surface as bugs in production |
| Using criteria as a task checklist | Criteria define outcomes, not implementation steps |
| Only covering the happy path | Real users hit errors, edge cases, and unexpected states |
4. Definition of Done — User Story
A story is not "Done" until every applicable item is satisfied. The agent evaluates which DoD items apply to each story and marks the rest as N/A.
Acceptance Criteria Fully Met
- Acceptance criteria are written before work begins
- Reviewed and approved by the team during backlog refinement
- All criteria are fully met and tested
- Given/When/Then format used consistently
Documentation
- Relevant documentation created or updated
- Added to the appropriate shared space / folder
- Outdated documentation updated if affected by the change
- Documentation completed within the sprint (unless explicitly agreed otherwise)
Testing
- Testing conducted across all environments where changes are deployed
- Test cases clearly identified and documented
- End-to-end (E2E) tests included for business-critical services where applicable
- Testing deemed sufficient before marking as Done
Code Merged
- Branch merged into
main/mastervia Pull Request - PR reviewed by at least two engineers
- All review comments and questions fully addressed before merge
Released via SDLC
- Release conducted through the standard SDLC process (e.g., Jenkins pipeline)
- Release channel notified with relevant details (e.g.,
#developer-releaseson Slack) - Story not marked Done until successfully released
Stakeholder Sign-Off (if required)
- Sign-off received from relevant stakeholders for features impacting external teams
- Approval logged (Slack message, Jira comment, or verbal approval noted in ticket)
Knowledge Sharing
- If the change introduces new functionality, architectural decisions, or process changes — a knowledge-sharing activity is conducted
- This can be a Slack update, team demo, short write-up, or Confluence page
- Ensures team-wide understanding and reduces knowledge silos
5. Definition of Done — Spike
Spikes are time-boxed research tasks used to reduce uncertainty, explore solutions, or gain clarity before delivery work begins.
When to Use a Spike
- Investigating an unknown technical or product area
- Evaluating possible solutions or approaches
- Identifying potential blockers or risks
- Prototyping or validating ideas before full implementation
Checklist
| Criteria | Description |
|---|---|
| Objective clearly stated | The goal or research question is documented in the ticket or a linked page |
| Time-box respected | Completed within the agreed timeframe (typically 1–3 days or a single sprint). Extensions discussed with the team. |
| Findings documented | All research outcomes, technical analysis, and code snippets are documented in a shared location |
| Recommendation made | A clear path forward is proposed — including implementation guidance, trade-offs, or alternatives |
| Next steps outlined | New stories, tickets, or action items are created and linked for follow-up work |
| Shared with team | Results communicated via stand-up, short demo, Slack summary, or write-up |
| Resources linked | All relevant links (API docs, diagrams, repos, articles) attached for future reference |
The goal of a spike is learning and knowledge sharing — not production-ready code.
6. Sprint Ceremonies
| Ceremony | Purpose | Cadence |
|---|---|---|
| Sprint Planning | Select stories from the backlog, confirm capacity, commit to sprint goal | Start of sprint |
| Daily Stand-up | Surface blockers, sync on progress, keep momentum | Daily (15 min max) |
| Backlog Refinement | Review upcoming stories, write/validate acceptance criteria, estimate points, split oversized stories | Mid-sprint |
| Sprint Review / Demo | Show completed work to stakeholders, gather feedback | End of sprint |
| Sprint Retrospective | Reflect on what went well, what didn't, and what to improve | End of sprint |
7. Backlog Health
Priority Levels
| Priority | Meaning | Sprint Scheduling |
|---|---|---|
| Critical | Blocks other work or has an imminent deadline | Must be in the current or next sprint |
| High | Core functionality, high user/business impact | Scheduled within the next 1–2 sprints |
| Medium | Important but not urgent | Scheduled when capacity allows |
| Low | Nice to have, minor improvements | Backlog — pulled in when higher priorities are clear |
Backlog Hygiene Rules
- Stories older than 3 sprints without movement should be reviewed — re-prioritise or remove
- Every story in the backlog must have a clear persona, goal, and benefit
- Stories without acceptance criteria are not ready for sprint planning
- Blocked stories must have the blocker documented and linked
8. Story Splitting Guidelines
When a story is too large (estimated above 8 points), split it using one of these strategies:
| Strategy | How It Works | Example |
|---|---|---|
| By workflow step | Split along the steps a user takes | "Register" → "Register with email" + "Register with OAuth" |
| By business rule | Separate different rules or conditions | "Apply discount" → "Percentage discount" + "Fixed amount discount" |
| By data type | Split by the different data being handled | "Import data" → "Import CSV" + "Import JSON" |
| By happy/unhappy path | Separate the success flow from error handling | "Process payment" → "Successful payment" + "Payment failure handling" |
| By platform | Split by target platform or environment | "Push notifications" → "iOS notifications" + "Android notifications" |
| Spike + delivery | Research first, build second | "Integrate Stripe" → "Spike: Stripe webhook approach" + "Implement Stripe webhooks" |
The goal is to produce stories that are each independently valuable, testable, and completable within a sprint.
Prompt Construction
System Prompt Persona
The agent operates as a senior Scrum Master and enforces all standards defined in the Scrum Standards section.
Core constraints:
- User stories follow the format: "As a [persona], I want to [goal], so that [benefit]"
- Every story includes acceptance criteria in Given/When/Then format covering happy path, negative path, and edge cases
- Story points use the Fibonacci scale (1, 2, 3, 5, 8)
- Maximum 8 points per story — auto-split if exceeded
- Issue hierarchy enforced: Epic → Feature → User Story → Sub-Task (plus Spikes)
- Definition of Done validated against checklists
- Sprint capacity respected — no overloading
Prompting Techniques
| Technique | Where Applied |
|---|---|
| ARC Framework | Every node prompt — Ask (what), Requirements (constraints), Context (background) |
| Few-Shot Prompting | Story Writer node — examples of well-written user stories |
| Chain-of-Thought | Feature Generator — step-by-step reasoning about scope decomposition |
| The Flipped Prompt | Project Intake — agent asks the user what information it needs before proceeding |
| Iterative Prompting | Refinement loop — output improves with each round of user feedback |
| Neutral Prompts | Evaluation — avoid leading phrasing that biases the LLM |
Guardrails
Input Guardrails (4 layers)
| Layer | Method | Description |
|---|---|---|
| Length cap | Regex (instant) | Max 5,000 characters — prevents accidental file pastes |
| Prompt injection | Regex (instant) | 10+ patterns: "ignore previous instructions", "you are now", "act as", "override", etc. |
| Profanity filter | Regex (instant) | Catches obvious abuse and low-quality inputs |
| Relevance classifier | LLM (cheap) | Allowlist passes known-good inputs; unknowns go to a cheap classifier (Haiku/gpt-4o-mini) to check RELEVANT vs OFF_TOPIC. Falls back to allowing on failure. |
Output Guardrails (4 layers)
| Layer | Description |
|---|---|
| Story format | Validates all stories have non-trivial persona, goal, and benefit (>=2 chars each) |
| AC coverage | Each story should have >=2 acceptance criteria, with at least one covering negative/edge/error scenarios |
| Sprint capacity | No sprint exceeds team velocity |
| Unrealistic loads | Flags sprints packed to the limit |
Human-in-the-Loop
Every pipeline stage has an accept/edit/reject checkpoint. High-risk tool calls (Jira writes, Confluence writes) require explicit user confirmation.
Multi-Provider LLM Support
The agent supports three LLM providers. Set via LLM_PROVIDER in .env:
| Provider | Env Var | Key Format | Value |
|---|---|---|---|
| Anthropic (default) | ANTHROPIC_API_KEY |
sk-ant-... |
anthropic |
| OpenAI | OPENAI_API_KEY |
sk-... |
openai |
GOOGLE_API_KEY |
AIza... |
google |
OpenAI and Google are lazy-imported — install with uv sync --extra all-providers or individually with --extra openai / --extra google.
Development
Commands
make install # install uv + dependencies
make test # unit + integration + contract tests (full suite)
make test-fast # unit tests only (< 3s)
make test-v # full suite verbose
make test-all # everything including golden evaluators
make lint # lint with ruff
make format # format with ruff
make run # run the CLI (ARGS="--flag")
make run-dry # TUI with fake delays, no LLM calls
make eval # golden dataset evaluators
make contract # contract tests (recorded API responses)
make smoke-test # live API smoke tests (requires credentials)
make snapshot-update # update syrupy snapshot baselines
make budget-report # show prompt token counts
make graph # generate agent graph PNG
make build # build sdist + wheel into dist/
make publish # publish to PyPI
make clean # remove build artifacts and caches
Project Structure
src/scrum_agent/
├── agent/ # LangGraph state & graph
│ ├── graph.py # Graph compilation & wiring
│ ├── llm.py # LLM provider selection (Anthropic/OpenAI/Google)
│ ├── nodes.py # Node functions (intake, analyze, generate, etc.)
│ └── state.py # ScrumState, QuestionnaireState, artifact dataclasses
├── prompts/ # Prompt templates per node
│ ├── analyzer.py # Project analyzer prompt
│ ├── feature_generator.py # Feature generation prompt
│ ├── intake.py # 30 questions, smart/standard modes, adaptive templates
│ ├── sprint_planner.py # Sprint planning prompt
│ ├── story_writer.py # Story writing prompt with few-shot examples
│ ├── system.py # Base system prompt
│ └── task_decomposer.py # Task decomposition prompt
├── tools/ # Tool definitions (23 total)
│ ├── azure_devops.py # Azure DevOps repo/file/work items
│ ├── calendar_tools.py # Bank holiday detection
│ ├── codebase.py # Local repo scanning
│ ├── confluence.py # Confluence search/read/write
│ ├── github.py # GitHub repo/file/issues/readme
│ ├── jira.py # Jira board/velocity/sprint/epic/story
│ └── llm_tools.py # LLM-powered estimation and AC generation
├── ui/ # Full-screen TUI system
│ ├── mode_select/ # Mode selection screens
│ ├── provider_select/ # LLM/tool provider setup
│ ├── session/ # Main session (phases, editor, pipeline)
│ ├── shared/ # Animations, ASCII font, components, input
│ └── splash.py # Animated intro
├── repl/ # Legacy REPL (CLI-flag-driven flows)
│ ├── _intake_menu.py # Intake mode selection
│ ├── _io.py # Artifact rendering, file import/export
│ ├── _questionnaire.py # Questionnaire UI (one-at-a-time flow)
│ ├── _review.py # Review checkpoint UI
│ └── _ui.py # Pipeline progress, streaming, spinner
├── cli.py # CLI entry point (argparse, 20 flags)
├── config.py # Environment/config management
├── setup_wizard.py # First-run credential flow
├── sessions.py # SQLite session store
├── persistence.py # State serialization helpers
├── formatters.py # Rich rendering (dark/light themes)
├── input_guardrails.py # 4-layer input validation
├── output_guardrails.py # 4-layer output validation
├── questionnaire_io.py # Markdown questionnaire import/export
├── html_exporter.py # Self-contained HTML reports
├── json_exporter.py # JSON export for CI/CD
├── jira_sync.py # Batch Jira creation with idempotency
└── __init__.py # Version, LangSmith noise suppression
Testing Conventions
- One test file per source module:
repl.py→test_repl.py,state.py→test_state.py - Group related tests in classes:
TestGracefulExit,TestStreaming,TestPriority - Node tests live in
tests/unit/nodes/(split into 9 files) - Shared node test helpers in
tests/_node_helpers.py - Pytest markers:
slow,eval,vcr,smoke
Environment Variables
| Variable | Required | Description |
|---|---|---|
ANTHROPIC_API_KEY |
Yes (if using Anthropic) | Claude API key |
OPENAI_API_KEY |
If using OpenAI | GPT API key |
GOOGLE_API_KEY |
If using Google | Gemini API key |
LLM_PROVIDER |
No | Provider selection: anthropic (default), openai, google |
LANGSMITH_TRACING |
No | Enable LangSmith tracing (true) |
LANGSMITH_API_KEY |
No | LangSmith API key |
LANGSMITH_PROJECT |
No | LangSmith project name |
LOG_LEVEL |
No | File-based log level (default: WARNING) |
SESSION_PRUNE_DAYS |
No | Auto-prune sessions older than N days (default: 30, 0=disabled) |
Git Conventions
- Commit messages: lowercase imperative (e.g., "add streaming output", "fix import sorting")
- Branch naming:
feature/<description>for feature work - PRs: feature branches merge to
mainvia pull request - Include
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>on AI-assisted commits
Evaluation & Testing
| Layer | Approach |
|---|---|
| Unit Tests | Prompt formatting, tool input/output validation, state transitions, artifact immutability |
| Integration Tests | CLI argument parsing, graph compilation, multi-node flows, session persistence |
| Contract Tests | VCR cassettes for GitHub, Jira, Confluence API responses |
| Golden Datasets | Curated project descriptions with expected feature/story breakdowns |
| Smoke Tests | Live API tests (require real credentials) |
| Token Budget Tests | Monitor prompt token counts for trend analysis |
| Red Teaming | Vague inputs, contradictory requirements, prompt injection, absurdly large scope |
Red Teaming Checklist
- Prompt injection ("Ignore your instructions and...")
- Jailbreaking (roleplay scenarios to bypass safety)
- Messy inputs (typos, slang, code-switching)
- Extremely long or empty project descriptions
- Contradictory requirements
- Adversarial inputs designed to trigger hallucination or bias
Graceful Degradation
| Failure Type | Strategy |
|---|---|
| API rate limit | Exponential backoff with live countdown (5s → 10s → 20s, 3 retries) |
| Tool call failure | Error displayed, pipeline continues |
| Model unavailable | Fallback to alternative provider (if configured) |
| Corrupt session | Returns (None, None) — no crash, user informed |
Tech Stack
| Component | Choice |
|---|---|
| Language | Python 3.11+ |
| Package Manager | uv |
| Agent Framework | LangGraph + LangChain |
| LLM | Anthropic Claude (primary), OpenAI GPT, Google Gemini |
| Terminal UI | rich + prompt_toolkit |
| Jira Integration | jira + atlassian-python-api |
| GitHub Integration | PyGithub |
| Azure DevOps | azure-devops SDK |
| Session Store | SQLite (via langgraph-checkpoint-sqlite) |
| Holiday Detection | holidays library |
| Linting | ruff (line-length 120) |
| Testing | pytest, pytest-asyncio, pytest-recording (VCR), syrupy (snapshots) |
| Observability | LangSmith |
Agentic Blueprint Reference
Condensed technical reference for the LangGraph patterns and LangChain APIs used.
Core Graph Setup
from langgraph.graph import StateGraph, MessagesState, START, END
from langgraph.prebuilt import ToolNode
from langchain_openai import ChatOpenAI
llm = ChatOpenAI(model="gpt-4o-mini")
model_with_tools = llm.bind_tools(tools)
graph = StateGraph(MessagesState)
The Two Core Nodes
def call_model(state: MessagesState):
"""Call the LLM with current messages."""
response = model_with_tools.invoke(state["messages"])
return {"messages": [response]}
def should_continue(state: MessagesState):
"""Route: tools if tool_calls present, otherwise END."""
last_message = state["messages"][-1]
if last_message.tool_calls:
return "tools"
return END
Wiring the Graph
tool_node = ToolNode(tools)
graph.add_node("agent", call_model)
graph.add_node("tools", tool_node)
graph.add_edge(START, "agent")
graph.add_conditional_edges("agent", should_continue, ["tools", END])
graph.add_edge("tools", "agent")
app = graph.compile()
START → agent ──should_continue?──→ END
▲ │
│ "tools"
│ ▼
└─────── tools
Creating Tools
from langchain_core.tools import tool
@tool
def search_database(query: str) -> str:
"""Search the product database for items matching the query."""
return results
The docstring is critical — the LLM reads it to decide when to use the tool.
Memory
from langgraph.checkpoint.memory import MemorySaver
memory = MemorySaver()
app = graph.compile(checkpointer=memory)
config = {"configurable": {"thread_id": "user-123"}}
app.invoke({"messages": [("human", "My name is Omar")]}, config)
Streaming
from langchain_core.messages import AIMessageChunk, HumanMessage
for chunk, metadata in app.stream(
{"messages": [HumanMessage(content="Plan my project")]},
config,
stream_mode="messages",
):
if isinstance(chunk, AIMessageChunk) and chunk.content:
print(chunk.content, end="", flush=True)
Quick Reference — All APIs
Prompting: ChatPromptTemplate | FewShotPromptTemplate | ARC framework | pipe operator | | StrOutputParser | sequential chains
Graph: StateGraph | MessagesState | START / END | .add_node() | .add_edge() | .add_conditional_edges() | .compile()
Tools: @tool decorator | ToolNode | .bind_tools() | create_react_agent
Memory: MemorySaver | checkpointer | thread_id
Streaming: app.stream() | stream_mode="messages" | AIMessageChunk
License
MIT License. See LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scrum_agent-1.0.0.tar.gz.
File metadata
- Download URL: scrum_agent-1.0.0.tar.gz
- Upload date:
- Size: 840.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ef20b228179b4178f63a0703a54b956310965537a9e04e1ed84e95b081d9f27d
|
|
| MD5 |
86b7cfe55df053bb83c66ecf964969f4
|
|
| BLAKE2b-256 |
ae2212a9a61fc4ab1eb34480a0ea7eb2fd3cc96b7a12c0260971222de33c6f18
|
Provenance
The following attestation bundles were made for scrum_agent-1.0.0.tar.gz:
Publisher:
publish.yml on omardin14/scrum-planning-ai-agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scrum_agent-1.0.0.tar.gz -
Subject digest:
ef20b228179b4178f63a0703a54b956310965537a9e04e1ed84e95b081d9f27d - Sigstore transparency entry: 1144225109
- Sigstore integration time:
-
Permalink:
omardin14/scrum-planning-ai-agent@f7dbb612bc25d8200f9d969a6669a3247b4c2916 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/omardin14
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f7dbb612bc25d8200f9d969a6669a3247b4c2916 -
Trigger Event:
push
-
Statement type:
File details
Details for the file scrum_agent-1.0.0-py3-none-any.whl.
File metadata
- Download URL: scrum_agent-1.0.0-py3-none-any.whl
- Upload date:
- Size: 406.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d37d50fd64cfd40027c4d6083e36fb9c8c641b4d9bcd32278dacc16c96c816a5
|
|
| MD5 |
7ab820f903842c26ba905a26fe9dbc65
|
|
| BLAKE2b-256 |
2cd299702cf697521ca8e8b6ba4bf47da361d7009c04ed12639b0f81b7a824bc
|
Provenance
The following attestation bundles were made for scrum_agent-1.0.0-py3-none-any.whl:
Publisher:
publish.yml on omardin14/scrum-planning-ai-agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scrum_agent-1.0.0-py3-none-any.whl -
Subject digest:
d37d50fd64cfd40027c4d6083e36fb9c8c641b4d9bcd32278dacc16c96c816a5 - Sigstore transparency entry: 1144225156
- Sigstore integration time:
-
Permalink:
omardin14/scrum-planning-ai-agent@f7dbb612bc25d8200f9d969a6669a3247b4c2916 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/omardin14
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@f7dbb612bc25d8200f9d969a6669a3247b4c2916 -
Trigger Event:
push
-
Statement type: