Open-source platform for understanding, documenting, and analyzing legacy COBOL codebases using static analysis and LLM
Project description
cobol-intel
Open-source static analysis and LLM explanation platform for legacy COBOL codebases. Built for banking, fintech, and regulated modernization workflows.
Why This Exists
Legacy COBOL systems fail the same way: key maintainers retire, documentation
goes stale, impact analysis is manual, and regulators still need clear
explanations. cobol-intel fixes that with a structured pipeline:
COBOL source → parser & AST → call graph & business rules → LLM explanation
→ impact analysis
→ documentation
The LLM consumes clean, traceable artifacts — not raw COBOL.
Quickstart
pip install cobol-intel
# Optional extras
pip install "cobol-intel[api]" # REST API
pip install "cobol-intel[local]" # local HuggingFace inference
pip install "cobol-intel[train]" # fine-tuning scripts
# Analyze a COBOL directory
cobol-intel analyze samples/ --copybook-dir copybooks
# Explain with an LLM backend
cobol-intel explain samples/complex/payment.cbl --model claude --mode business
# Generate documentation
cobol-intel docs artifacts/samples/run_xxx --format html
# Analyze change impact
cobol-intel impact artifacts/samples/run_xxx --changed-program PAYMENT --changed-field WS-BALANCE
Output:
[cobol-intel] analyze: samples/
Run ID: run_20260401_001
Status: completed
Artifacts: artifacts/samples/run_20260401_001
Features
Static Analysis
- ANTLR4-based parser (fixed + free format COBOL)
- COPYBOOK resolution with circular dependency detection
- Call graph builder and business rules extractor
- Control flow graph (CFG) with branch, perform, and fallthrough edges
- Field-level data flow analysis (MOVE, COMPUTE, READ INTO, WRITE FROM, CALL)
- Dead code detection: unreachable paragraphs, unused data items, dead branches
- Field reference indexer with read/write/condition classification
- Data item hierarchy with PIC, COMP-3, REDEFINES, OCCURS, level-88
LLM Explanation
- Multi-backend: Claude, OpenAI, Ollama
- Three modes:
technical,business,audit - Governance: audit logging, sensitivity classification, prompt redaction
- Policy enforcement, token budgets, retry/timeout
- Parallel processing with bounded concurrency
- File-based cache with composite keys
Change Impact Analysis
- "If I change field X, what breaks?"
- BFS traversal on reverse call graph
- Field reference scanning across ASTs and business rules
- Configurable depth limit
Output & Documentation
- Versioned JSON artifact contracts (Pydantic v2)
- Markdown + HTML report generation
- Self-contained HTML with sidebar nav, search, and Mermaid graphs
- Structured error codes for operational monitoring
Fine-Tuning
- Dataset builder: generates instruction-tuning pairs from pipeline output
- LoRA/PEFT fine-tuning script for CodeLlama-7B or similar (QLoRA supported)
- Local fine-tuned model backend for fully offline inference
- Prompt comparison benchmark: raw source vs structured pipeline prompts
API & Distribution
- Versioned REST API (
/api/v1/) with OpenAPI docs and typed error responses - Docker image + docker-compose with optional Ollama sidecar
- Cross-platform CI (Linux + Windows, Python 3.11 + 3.12)
- PyPI-ready wheel with PEP 561 type stubs
CLI Commands
| Command | Description |
|---|---|
analyze |
Parse COBOL files, build AST, call graph, business rules |
explain |
Run analysis + LLM explanation |
graph |
Build dependency and call graph artifacts |
impact |
Analyze change impact from a completed run |
docs |
Generate documentation (Markdown or HTML) |
Global:
cobol-intel --version # Show version
Key flags:
--model claude|openai|ollama|local # LLM backend
--mode technical|business|audit # Explanation style
--parallel # Enable parallel LLM processing
--max-workers N # Override concurrency limit
--cache / --no-cache # Explanation cache toggle
--strict-policy # Hard block policy violations
--max-tokens-per-run N # Token budget cap
--format markdown|html # Documentation format
API Usage
pip install "cobol-intel[api]"
cobol-intel-api # starts on port 8000
curl http://localhost:8000/api/v1/health
curl http://localhost:8000/api/v1/runs?output_dir=artifacts
curl http://localhost:8000/api/v1/version
See docs/API_GUIDE.md for full endpoint reference.
Output Artifacts
Each run produces a stable artifact tree:
artifacts/<project>/<run_id>/
manifest.json # Run metadata, governance, errors
ast/ # Per-program AST JSON
graphs/ # Call graph JSON + Mermaid
rules/ # Business rules JSON + Markdown
analysis/ # CFG, data flow, dead code, references
docs/ # Explanations, documentation
logs/ # Audit event log
See docs/OUTPUT_GALLERY.md for sample artifacts.
COBOL Subset Coverage
- Fixed-format and free-format COBOL
COPY, circular copy detection,COPY ... REPLACINGWORKING-STORAGE,FILE,LINKAGEsectionsPROCEDURE DIVISION USINGPIC,COMP-3,REDEFINES,OCCURS, level-88 conditionsIF,EVALUATE,PERFORM,PERFORM THRU,CALL,STRING,UNSTRING,INSPECT- File I/O:
OPEN,READ,WRITE,REWRITE,CLOSE EXEC SQLsubset and basicEXEC CICSblock extraction for static-analysis context
Development
git clone https://github.com/WwzFwz/cobol-intel.git
cd cobol-intel
pip install -e ".[dev]"
make lint # ruff + tach
make test # pytest
make bench # benchmark suite
make build # build wheel
Offline inference and training extras:
pip install -e ".[local]" # local HuggingFace backend
pip install -e ".[train]" # dataset + fine-tuning tooling
See CONTRIBUTING.md for full dev setup and guidelines.
Documentation
- Architecture
- Architecture Decisions
- API Guide
- Output Gallery
- Fintech Readiness
- Parser Evaluation
- Project Plan
- Progress
- Changelog
- Security Policy
- Support
- Code of Conduct
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cobol_intel-0.3.1.tar.gz.
File metadata
- Download URL: cobol_intel-0.3.1.tar.gz
- Upload date:
- Size: 2.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9062f49ed3bdc0b867fd8b0baa042f568dd2100f9d778b768877c3eb26750a88
|
|
| MD5 |
2a94e6212a67e4965bb0d946aff288ea
|
|
| BLAKE2b-256 |
832e258aa6fbd7d9f45984fe796a0d45d107c10fbc463d03be627f4184d85749
|
Provenance
The following attestation bundles were made for cobol_intel-0.3.1.tar.gz:
Publisher:
release.yml on WwzFwz/finoss-cobol-intel
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cobol_intel-0.3.1.tar.gz -
Subject digest:
9062f49ed3bdc0b867fd8b0baa042f568dd2100f9d778b768877c3eb26750a88 - Sigstore transparency entry: 1224783901
- Sigstore integration time:
-
Permalink:
WwzFwz/finoss-cobol-intel@6d095adb1c0c6d41547a38a52d3861f4908096b0 -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/WwzFwz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6d095adb1c0c6d41547a38a52d3861f4908096b0 -
Trigger Event:
push
-
Statement type:
File details
Details for the file cobol_intel-0.3.1-py3-none-any.whl.
File metadata
- Download URL: cobol_intel-0.3.1-py3-none-any.whl
- Upload date:
- Size: 153.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4462fe20cdab412cfb213e6bb43d7beca44fda74989e461d91cb395003262a58
|
|
| MD5 |
ed8bd0a49920ccf972dc91b4b8501283
|
|
| BLAKE2b-256 |
18c748dc34e463c1da0fb27ceb6f806b826d05a4fff28fe3ea3c2bce28cb1b3d
|
Provenance
The following attestation bundles were made for cobol_intel-0.3.1-py3-none-any.whl:
Publisher:
release.yml on WwzFwz/finoss-cobol-intel
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
cobol_intel-0.3.1-py3-none-any.whl -
Subject digest:
4462fe20cdab412cfb213e6bb43d7beca44fda74989e461d91cb395003262a58 - Sigstore transparency entry: 1224783956
- Sigstore integration time:
-
Permalink:
WwzFwz/finoss-cobol-intel@6d095adb1c0c6d41547a38a52d3861f4908096b0 -
Branch / Tag:
refs/tags/v0.3.1 - Owner: https://github.com/WwzFwz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6d095adb1c0c6d41547a38a52d3861f4908096b0 -
Trigger Event:
push
-
Statement type: