Skip to main content

Cobol IBM Mainframe syntax lexer for Pygments

Project description

This package contains a Pygments Lexer for mainframe cobol.

The lexer parses the Enterprise Cobol (V3R4) for z/OS dialect, including utilizing embedded Db2/Sql, Cics and DLi

mainframe cobol coding form

Many early programming languages, including PL/1, Fortran, Cobol and the various IBM assembler languages, used only the first 7-72 columns of a 80-column card

1- 6 Tags, Remarks or Sequence numbers identifying pages or lines of a program
  • * (asterisk) designates entire line as comment
  • / (slash) forces page break when printing source listing
  • - (dash) to indicate continuation of nonnumeric literal
  • D to indicate debug line cobol statements
8 - 72
COBOL program statements, divided into two areas :
  • Area A : columns 8 to 11
  • Area B : columns 12 to 72
73 - 80 Tags, Remarks or Sequence numbers (often garbage…)

Division, section and paragraph-names must all begin in Area A and end with a period.

CBL/PROCESS directives statement can start in columns 1 through 70


The lexer is available as a Pip package:

$ sudo pip install pygments_ibm_cobol_lexer

Or using easy_install:

$ sudo easy_install pygments_ibm_cobol_lexer


After installation the ibmcobol Lexer and ibmcobol Style automatically registers itself for files with the “.cbl” extensions.

Therefore, cmdline usage is easy:
  • Ascii input :
pygmentize -O full,style=ibmcobol,encoding=latin1 -o HORREUR.html HORREUR.ascii.cbl
  • Ebcdic input (in this case it’s necessary to specify outencoding value):
pygmentize -O full,style=ibmcobol,encoding=cp1147,outencoding=latin1 -o COB001.html COB001.cp1147.cbl

Or as library usage: ..

from pygments import highlight
from pygments.formatters import HtmlFormatter
from pygments_ibm_cobol_lexer import IBMCOBOLLexer, IBMCOBOLStyle
my_code = open("cobol_ebcdic.cbl",'rb').read()
            HtmlFormatter(style=IBMCOBOLStyle, full=True),

Also see the pygments_ibm_cobol_lexer-1.1/pygments_tests/ directory

About cp1147

I have files coded IBM1147 (EBCDIC french + euro sign), I was forced to write my own codec cp1147, very close to the cp500 (Canada, Belgium), it diverges on the characters “@°{}§ùµ£à[€`¨#]~éè¦ç” : ..

from pygments_ibm_cobol_lexer import cp1147
print "euro sign ?",chr(159).decode('cp1147')
print ''.join([ chr(i).decode('cp1147') for i in range(0,256)
          if chr(i).decode('cp1147') != chr(i).decode('cp500')])

I have added this import in IBMCOBOLLexer init method :


1.1 - (2012-11-19) Minor Fix + EBCDIC enhancements:

  • Fix : float regex detection before integer detection
  • Add inline-commentaire *> (not the IBM default)
  • Change cics/dli keywords color…
  • Extend CICS_KEYWORDS, remove EJECT/SKIP from COBOL_KEYWORDS (treated as comments)
  • each ASCII input lines is padded to 80 columns
  • Add EBCDIC features:
    • add my own french codec cp1147
    • if EBCDIC encoding is passed (cp500,cp1140,…) or detected,convert the binary input raw text in 80 columns fixed lines
    • encoding=chardet (slowly) does not detect EBCDIC chart,it’s override with encoding=guess
    • “guess EBCDIC” is defaulted to self.encoding='cp500'

1.0 - (2012-11-12) Initial release.

Online demo

This lexer can be tested online here (pygments).

Project details

Release history Release notifications

History Node


This version
History Node


History Node


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
pygments_ibm_cobol_lexer-1.1-py2.7.egg (32.0 kB) Copy SHA256 hash SHA256 Egg 2.7 Nov 20, 2012
pygments_ibm_cobol_lexer-1.1.tar.gz (78.2 kB) Copy SHA256 hash SHA256 Source None Nov 20, 2012

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page