Skip to main content

Open-source platform for understanding, documenting, and analyzing legacy COBOL codebases using static analysis and LLM

Project description

cobol-intel

CI Python 3.11+ License: MIT

Open-source static analysis and LLM explanation platform for legacy COBOL codebases. Built for banking, fintech, and regulated modernization workflows.

Why This Exists

Legacy COBOL systems fail the same way: key maintainers retire, documentation goes stale, impact analysis is manual, and regulators still need clear explanations. cobol-intel fixes that with a structured pipeline:

COBOL source → parser & AST → call graph & business rules → LLM explanation
                                                           → impact analysis
                                                           → documentation

The LLM consumes clean, traceable artifacts — not raw COBOL.

Quickstart

pip install cobol-intel

# Optional extras
pip install "cobol-intel[api]"    # REST API
pip install "cobol-intel[local]"  # local HuggingFace inference
pip install "cobol-intel[train]"  # fine-tuning scripts

# Analyze a COBOL directory
cobol-intel analyze samples/ --copybook-dir copybooks

# Explain with an LLM backend
cobol-intel explain samples/complex/payment.cbl --model claude --mode business

# Generate documentation
cobol-intel docs artifacts/samples/run_xxx --format html

# Analyze change impact
cobol-intel impact artifacts/samples/run_xxx --changed-program PAYMENT --changed-field WS-BALANCE

Output:

[cobol-intel] analyze: samples/
Run ID: run_20260401_001
Status: completed
Artifacts: artifacts/samples/run_20260401_001

Features

Static Analysis

  • ANTLR4-based parser (fixed + free format COBOL)
  • COPYBOOK resolution with circular dependency detection
  • Call graph builder and business rules extractor
  • Control flow graph (CFG) with branch, perform, and fallthrough edges
  • Field-level data flow analysis (MOVE, COMPUTE, READ INTO, WRITE FROM, CALL)
  • Dead code detection: unreachable paragraphs, unused data items, dead branches
  • Field reference indexer with read/write/condition classification
  • Data item hierarchy with PIC, COMP-3, REDEFINES, OCCURS, level-88

LLM Explanation

  • Multi-backend: Claude, OpenAI, Ollama
  • Three modes: technical, business, audit
  • Governance: audit logging, sensitivity classification, prompt redaction
  • Policy enforcement, token budgets, retry/timeout
  • Parallel processing with bounded concurrency
  • File-based cache with composite keys

Change Impact Analysis

  • "If I change field X, what breaks?"
  • BFS traversal on reverse call graph
  • Field reference scanning across ASTs and business rules
  • Configurable depth limit

Output & Documentation

  • Versioned JSON artifact contracts (Pydantic v2)
  • Markdown + HTML report generation
  • Self-contained HTML with sidebar nav, search, and Mermaid graphs
  • Structured error codes for operational monitoring

Fine-Tuning

  • Dataset builder: generates instruction-tuning pairs from pipeline output
  • LoRA/PEFT fine-tuning script for CodeLlama-7B or similar (QLoRA supported)
  • Local fine-tuned model backend for fully offline inference
  • Prompt comparison benchmark: raw source vs structured pipeline prompts

API & Distribution

  • Versioned REST API (/api/v1/) with OpenAPI docs and typed error responses
  • Docker image + docker-compose with optional Ollama sidecar
  • Cross-platform CI (Linux + Windows, Python 3.11 + 3.12)
  • PyPI-ready wheel with PEP 561 type stubs

CLI Commands

Command Description
analyze Parse COBOL files, build AST, call graph, business rules
explain Run analysis + LLM explanation
graph Build dependency and call graph artifacts
impact Analyze change impact from a completed run
docs Generate documentation (Markdown or HTML)

Global:

cobol-intel --version           # Show version

Key flags:

--model claude|openai|ollama|local  # LLM backend
--mode technical|business|audit # Explanation style
--parallel                      # Enable parallel LLM processing
--max-workers N                 # Override concurrency limit
--cache / --no-cache            # Explanation cache toggle
--strict-policy                 # Hard block policy violations
--max-tokens-per-run N          # Token budget cap
--format markdown|html          # Documentation format

API Usage

pip install "cobol-intel[api]"
cobol-intel-api  # starts on port 8000

curl http://localhost:8000/api/v1/health
curl http://localhost:8000/api/v1/runs?output_dir=artifacts
curl http://localhost:8000/api/v1/version

See docs/API_GUIDE.md for full endpoint reference.

Output Artifacts

Each run produces a stable artifact tree:

artifacts/<project>/<run_id>/
  manifest.json          # Run metadata, governance, errors
  ast/                   # Per-program AST JSON
  graphs/                # Call graph JSON + Mermaid
  rules/                 # Business rules JSON + Markdown
  analysis/              # CFG, data flow, dead code, references
  docs/                  # Explanations, documentation
  logs/                  # Audit event log

See docs/OUTPUT_GALLERY.md for sample artifacts.

COBOL Subset Coverage

  • Fixed-format and free-format COBOL
  • COPY, circular copy detection, COPY ... REPLACING
  • WORKING-STORAGE, FILE, LINKAGE sections
  • PROCEDURE DIVISION USING
  • PIC, COMP-3, REDEFINES, OCCURS, level-88 conditions
  • IF, EVALUATE, PERFORM, PERFORM THRU, CALL, STRING, UNSTRING, INSPECT
  • File I/O: OPEN, READ, WRITE, REWRITE, CLOSE
  • EXEC SQL subset and basic EXEC CICS block extraction for static-analysis context

Development

git clone https://github.com/WwzFwz/cobol-intel.git
cd cobol-intel
pip install -e ".[dev]"

make lint    # ruff + tach
make test    # pytest
make bench   # benchmark suite
make build   # build wheel

Offline inference and training extras:

pip install -e ".[local]"  # local HuggingFace backend
pip install -e ".[train]"  # dataset + fine-tuning tooling

See CONTRIBUTING.md for full dev setup and guidelines.

Documentation

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cobol_intel-0.3.1.tar.gz (2.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cobol_intel-0.3.1-py3-none-any.whl (153.2 kB view details)

Uploaded Python 3

File details

Details for the file cobol_intel-0.3.1.tar.gz.

File metadata

  • Download URL: cobol_intel-0.3.1.tar.gz
  • Upload date:
  • Size: 2.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cobol_intel-0.3.1.tar.gz
Algorithm Hash digest
SHA256 9062f49ed3bdc0b867fd8b0baa042f568dd2100f9d778b768877c3eb26750a88
MD5 2a94e6212a67e4965bb0d946aff288ea
BLAKE2b-256 832e258aa6fbd7d9f45984fe796a0d45d107c10fbc463d03be627f4184d85749

See more details on using hashes here.

Provenance

The following attestation bundles were made for cobol_intel-0.3.1.tar.gz:

Publisher: release.yml on WwzFwz/finoss-cobol-intel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file cobol_intel-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: cobol_intel-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 153.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for cobol_intel-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4462fe20cdab412cfb213e6bb43d7beca44fda74989e461d91cb395003262a58
MD5 ed8bd0a49920ccf972dc91b4b8501283
BLAKE2b-256 18c748dc34e463c1da0fb27ceb6f806b826d05a4fff28fe3ea3c2bce28cb1b3d

See more details on using hashes here.

Provenance

The following attestation bundles were made for cobol_intel-0.3.1-py3-none-any.whl:

Publisher: release.yml on WwzFwz/finoss-cobol-intel

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page