Skip to main content

MCP server that parses COBOL programs — extracts divisions, SQL, CICS commands, and generates AI-powered business logic summaries via any LLM

Project description

cobol-parser-mcp

MCP server that parses COBOL mainframe programs and extracts structured data plus AI-powered business logic summaries — works with any LLM provider.

pip install cobol-parser-mcp            # pure parsing, no LLM needed
pip install cobol-parser-mcp[anthropic] # + Claude
pip install cobol-parser-mcp[openai]    # + GPT-4o
pip install cobol-parser-mcp[groq]      # + Groq (fast + cheap)
pip install cobol-parser-mcp[all]       # + all providers

Part of the mainframe modernization pipeline — takes the manifest.json from mainframe-ingest-mcp and produces per-program JSON files ready for code generation.


What it extracts

For every COBOL program:

What Details
IDENTIFICATION DIVISION Program ID, author, date written
DATA DIVISION All working storage variables with level numbers and PIC clauses
PROCEDURE DIVISION Every paragraph with line ranges and PERFORM references
Embedded SQL Every SELECT/INSERT/UPDATE/DELETE with tables and columns
Embedded CICS Every SEND/RECEIVE MAP, LINK, XCTL, READ, WRITE
AI summary Plain-English description of what the program does (any LLM)

Claude Desktop config

{
  "mcpServers": {
    "cobol-parser-mcp": {
      "command": "cobol-parser-mcp",
      "env": {
        "ANTHROPIC_API_KEY": "your-key-here"
      }
    }
  }
}

Usage from Python

Parse a single file (no API key needed)

from cobol_parser_mcp.tools.parser import parse_cobol_file
import json

result = parse_cobol_file("/path/to/SS6001XX.cob")
print(json.dumps(result, indent=2))

Parse entire codebase with AI summaries

import asyncio
from cobol_parser_mcp.tools.batch import parse_all_programs

# Using Anthropic (Claude)
result = asyncio.run(parse_all_programs(
    manifest_path="./manifest.json",  # from mainframe-ingest-mcp
    output_dir="./parsed",
    ai_summaries=True,
    provider="anthropic",             # or openai, groq, ollama, custom
    api_key="sk-ant-...",             # or set ANTHROPIC_API_KEY env var
    max_ai_programs=20,
))
print(f"Parsed {result['parsed_ok']} of {result['total_programs']} programs")

Using other LLM providers

# OpenAI
result = asyncio.run(parse_all_programs(
    manifest_path="./manifest.json",
    output_dir="./parsed",
    provider="openai",
    api_key="sk-...",
    model="gpt-4o",
))

# Groq — fast and cheap
result = asyncio.run(parse_all_programs(
    manifest_path="./manifest.json",
    output_dir="./parsed",
    provider="groq",
    api_key="gsk_...",
))

# Ollama — fully local, no API key, no internet
result = asyncio.run(parse_all_programs(
    manifest_path="./manifest.json",
    output_dir="./parsed",
    provider="ollama",
    model="llama3",
))

# No AI at all — pure parsing only
result = asyncio.run(parse_all_programs(
    manifest_path="./manifest.json",
    output_dir="./parsed",
    ai_summaries=False,
))

Output format

Each program produces a <PROGRAM_NAME>.json file:

{
  "program": "SS6001XX",
  "line_count": 1048,
  "identification": {
    "program_id": "SS6001XX",
    "author": "STATE OF MARYLAND SDAT",
    "date_written": "1995-06-12"
  },
  "data_division": {
    "working_storage": [
      { "level": "1", "name": "WS-ENTITY-RECORD", "type": "GROUP" },
      { "level": "5", "name": "WS-ENTITY-ID",     "type": "PIC X(10)" }
    ],
    "copybooks_expanded": ["BUSENTIT", "SSCC03EQ", "DFHAID"]
  },
  "procedure_division": {
    "paragraph_count": 7,
    "paragraphs": [
      {
        "name": "0000-MAIN",
        "lines": "28-45",
        "calls_paragraphs": ["1000-INIT", "2000-PROCESS", "9999-EXIT"]
      }
    ]
  },
  "sql_statements": [
    {
      "type": "SELECT",
      "tables": ["BUSENTIT"],
      "columns": ["ENTITY_ID", "ENTITY_NAME", "STATUS_CD"],
      "where": "ENTITY_ID = :WS-ENTITY-ID",
      "line": 167
    }
  ],
  "cics_commands": [
    { "command": "RECEIVE MAP", "map": "SS6TMAP", "mapset": "SS6TMAP", "line": 152 },
    { "command": "LINK PROGRAM", "program": "SS6009XX", "line": 334 }
  ],
  "business_logic_summary": {
    "db2_tables_read":    ["BUSENTIT"],
    "db2_tables_written": ["BUSENTIT", "TRNSACTN"],
    "screens_used":       ["SS6TMAP", "SS6XMAP"],
    "programs_called":    ["SS6009XX"],
    "estimated_complexity": "HIGH",
    "ai_summary": {
      "purpose": "Handles online business entity inquiry and status update for CICS terminal users",
      "business_domain": "Business Entity Registration",
      "user_facing": true,
      "key_operations": [
        "Receive user input from SS6TMAP screen",
        "Query BUSENTIT table by entity ID",
        "Update entity status in BUSENTIT",
        "Log transaction to TRNSACTN",
        "Link to SS6009XX for downstream processing"
      ],
      "modernization_notes": "Maps cleanly to GET /api/entity/{id} and PUT /api/entity/{id}/status REST endpoints"
    }
  }
}

A _index.json summary file is also written with stats across all programs.


Supported LLM providers

Provider Default model API key env var Notes
anthropic claude-sonnet-4-20250514 ANTHROPIC_API_KEY Default
openai gpt-4o OPENAI_API_KEY
groq llama3-70b-8192 GROQ_API_KEY Fast and cheap
ollama llama3 none Fully local, free
custom gpt-4o OPENAI_API_KEY Any OpenAI-compatible endpoint, pass base_url

Part of the modernization pipeline

mainframe-ingest-mcp  →  manifest.json
        ↓
cobol-parser-mcp      →  per-program JSON files   ← you are here
        ↓
bms-to-angular-mcp    →  Angular components
db2-schema-mcp        →  PostgreSQL schema

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cobol_parser_mcp-0.1.1.tar.gz (15.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cobol_parser_mcp-0.1.1-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file cobol_parser_mcp-0.1.1.tar.gz.

File metadata

  • Download URL: cobol_parser_mcp-0.1.1.tar.gz
  • Upload date:
  • Size: 15.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for cobol_parser_mcp-0.1.1.tar.gz
Algorithm Hash digest
SHA256 9b01ea5396d7ae9211c3dea976e1efd89753bfe6e187cfabf91da50214b2a066
MD5 c443b15b601eb022dbd8107f0d64cfae
BLAKE2b-256 80b9ee1e6454537d3ca865cea4f73065a18b8e9b2ac13f59ff7069b040afe014

See more details on using hashes here.

File details

Details for the file cobol_parser_mcp-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for cobol_parser_mcp-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fb32ab3a3455532b3c78373366a9bf0ab2e7eb79bbe84d3b29f78c90bdd98c35
MD5 59f3c65aa5e83d7a65284d7d5e303f58
BLAKE2b-256 59f15b16db2bb7b3a95701c0093da957c7e9148ee73a4f48c32075015225c620

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page