Skip to main content

MCP server that parses COBOL programs — extracts divisions, SQL, CICS commands, and generates AI-powered business logic summaries via any LLM

Project description

cobol-parser-mcp

MCP server that parses COBOL mainframe programs and extracts structured data plus AI-powered business logic summaries — works with any LLM provider.

pip install cobol-parser-mcp            # pure parsing, no LLM needed
pip install cobol-parser-mcp[anthropic] # + Claude
pip install cobol-parser-mcp[openai]    # + GPT-4o
pip install cobol-parser-mcp[groq]      # + Groq (fast + cheap)
pip install cobol-parser-mcp[all]       # + all providers

Part of the mainframe modernization pipeline — takes the manifest.json from mainframe-ingest-mcp and produces per-program JSON files ready for code generation.


What it extracts

For every COBOL program:

What Details
IDENTIFICATION DIVISION Program ID, author, date written
DATA DIVISION All working storage variables with level numbers and PIC clauses
PROCEDURE DIVISION Every paragraph with line ranges and PERFORM references
Embedded SQL Every SELECT/INSERT/UPDATE/DELETE with tables and columns
Embedded CICS Every SEND/RECEIVE MAP, LINK, XCTL, READ, WRITE
AI summary Plain-English description of what the program does (any LLM)

Claude Desktop config

{
  "mcpServers": {
    "cobol-parser-mcp": {
      "command": "cobol-parser-mcp",
      "env": {
        "ANTHROPIC_API_KEY": "your-key-here"
      }
    }
  }
}

Usage from Python

Parse a single file (no API key needed)

from cobol_parser_mcp.tools.parser import parse_cobol_file
import json

result = parse_cobol_file("/path/to/SS6001XX.cob")
print(json.dumps(result, indent=2))

Parse entire codebase with AI summaries

import asyncio
from cobol_parser_mcp.tools.batch import parse_all_programs

# Using Anthropic (Claude)
result = asyncio.run(parse_all_programs(
    manifest_path="./manifest.json",  # from mainframe-ingest-mcp
    output_dir="./parsed",
    ai_summaries=True,
    provider="anthropic",             # or openai, groq, ollama, custom
    api_key="sk-ant-...",             # or set ANTHROPIC_API_KEY env var
    max_ai_programs=20,
))
print(f"Parsed {result['parsed_ok']} of {result['total_programs']} programs")

Using other LLM providers

# OpenAI
result = asyncio.run(parse_all_programs(
    manifest_path="./manifest.json",
    output_dir="./parsed",
    provider="openai",
    api_key="sk-...",
    model="gpt-4o",
))

# Groq — fast and cheap
result = asyncio.run(parse_all_programs(
    manifest_path="./manifest.json",
    output_dir="./parsed",
    provider="groq",
    api_key="gsk_...",
))

# Ollama — fully local, no API key, no internet
result = asyncio.run(parse_all_programs(
    manifest_path="./manifest.json",
    output_dir="./parsed",
    provider="ollama",
    model="llama3",
))

# No AI at all — pure parsing only
result = asyncio.run(parse_all_programs(
    manifest_path="./manifest.json",
    output_dir="./parsed",
    ai_summaries=False,
))

Output format

Each program produces a <PROGRAM_NAME>.json file:

{
  "program": "SS6001XX",
  "line_count": 1048,
  "identification": {
    "program_id": "SS6001XX",
    "author": "STATE OF MARYLAND SDAT",
    "date_written": "1995-06-12"
  },
  "data_division": {
    "working_storage": [
      { "level": "1", "name": "WS-ENTITY-RECORD", "type": "GROUP" },
      { "level": "5", "name": "WS-ENTITY-ID",     "type": "PIC X(10)" }
    ],
    "copybooks_expanded": ["BUSENTIT", "SSCC03EQ", "DFHAID"]
  },
  "procedure_division": {
    "paragraph_count": 7,
    "paragraphs": [
      {
        "name": "0000-MAIN",
        "lines": "28-45",
        "calls_paragraphs": ["1000-INIT", "2000-PROCESS", "9999-EXIT"]
      }
    ]
  },
  "sql_statements": [
    {
      "type": "SELECT",
      "tables": ["BUSENTIT"],
      "columns": ["ENTITY_ID", "ENTITY_NAME", "STATUS_CD"],
      "where": "ENTITY_ID = :WS-ENTITY-ID",
      "line": 167
    }
  ],
  "cics_commands": [
    { "command": "RECEIVE MAP", "map": "SS6TMAP", "mapset": "SS6TMAP", "line": 152 },
    { "command": "LINK PROGRAM", "program": "SS6009XX", "line": 334 }
  ],
  "business_logic_summary": {
    "db2_tables_read":    ["BUSENTIT"],
    "db2_tables_written": ["BUSENTIT", "TRNSACTN"],
    "screens_used":       ["SS6TMAP", "SS6XMAP"],
    "programs_called":    ["SS6009XX"],
    "estimated_complexity": "HIGH",
    "ai_summary": {
      "purpose": "Handles online business entity inquiry and status update for CICS terminal users",
      "business_domain": "Business Entity Registration",
      "user_facing": true,
      "key_operations": [
        "Receive user input from SS6TMAP screen",
        "Query BUSENTIT table by entity ID",
        "Update entity status in BUSENTIT",
        "Log transaction to TRNSACTN",
        "Link to SS6009XX for downstream processing"
      ],
      "modernization_notes": "Maps cleanly to GET /api/entity/{id} and PUT /api/entity/{id}/status REST endpoints"
    }
  }
}

A _index.json summary file is also written with stats across all programs.


Supported LLM providers

Provider Default model API key env var Notes
anthropic claude-sonnet-4-20250514 ANTHROPIC_API_KEY Default
openai gpt-4o OPENAI_API_KEY
groq llama3-70b-8192 GROQ_API_KEY Fast and cheap
ollama llama3 none Fully local, free
custom gpt-4o OPENAI_API_KEY Any OpenAI-compatible endpoint, pass base_url

Part of the modernization pipeline

mainframe-ingest-mcp  →  manifest.json
        ↓
cobol-parser-mcp      →  per-program JSON files   ← you are here
        ↓
bms-to-angular-mcp    →  Angular components
db2-schema-mcp        →  PostgreSQL schema

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cobol_parser_mcp-0.1.2.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cobol_parser_mcp-0.1.2-py3-none-any.whl (18.0 kB view details)

Uploaded Python 3

File details

Details for the file cobol_parser_mcp-0.1.2.tar.gz.

File metadata

  • Download URL: cobol_parser_mcp-0.1.2.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for cobol_parser_mcp-0.1.2.tar.gz
Algorithm Hash digest
SHA256 92b6e6324743831d0387afc1001a0875e848531cc055ec3bb97994de69cdb2dd
MD5 6c626d63ca9ba5b9f59b6f2fb6dc67f0
BLAKE2b-256 4472461aecffa29ba73efe3fb0929ec6f6663d383d1d1f7f8e124734cffac71a

See more details on using hashes here.

File details

Details for the file cobol_parser_mcp-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for cobol_parser_mcp-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b1dcff9f36fbccf0b700d89f4f7e5d66a0e6c2534fe63884a3f960a8feeb3446
MD5 cb1b1d3255cf52aa1333c841b73e00d7
BLAKE2b-256 1faf6b4b1bf82778ff25759074ab97c94dc8b10d32734ad2b378c9608590fdc8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page