Skip to main content

Cobol IBM Mainframe syntax lexer for Pygments

Project description

This package contains a Pygments Lexer for mainframe cobol.

The lexer parses the Enterprise Cobol (V3R4) for z/OS dialect, including utilizing embedded Db2/Sql, Cics and DLi

mainframe cobol coding form

Many early programming languages, including PL/1, Fortran, Cobol and the various IBM assembler languages, used only the first 7-72 columns of a 80-column card

Columns

1- 6

Tags, Remarks or Sequence numbers identifying pages or lines of a program

7

  • * (asterisk) designates entire line as comment

  • / (slash) forces page break when printing source listing

  • - (dash) to indicate continuation of nonnumeric literal

  • D to indicate debug line cobol statements

8 - 72

COBOL program statements, divided into two areas :
  • Area A : columns 8 to 11

  • Area B : columns 12 to 72

73 - 80

Tags, Remarks or Sequence numbers (often garbage…)

Division, section and paragraph-names must all begin in Area A and end with a period.

CBL/PROCESS directives statement can start in columns 1 through 70

Installation

The lexer is available as a Pip package:

$ sudo pip install pygments_ibm_cobol_lexer

Or using easy_install:

$ sudo easy_install pygments_ibm_cobol_lexer

Usage

After installation the ibmcobol Lexer and ibmcobol Style automatically registers itself for files with the “.cbl” extensions.

Therefore, cmdline usage is easy:
  • Ascii input :

pygmentize -O full,style=ibmcobol,encoding=latin1 -o HORREUR.html HORREUR.ascii.cbl

  • Ebcdic input (in this case it’s necessary to specify outencoding value):

pygmentize -O full,style=ibmcobol,encoding=cp1147,outencoding=latin1 -o COB001.html COB001.cp1147.cbl

Or as library usage: ..

from pygments import highlight
from pygments.formatters import HtmlFormatter
from pygments_ibm_cobol_lexer import IBMCOBOLLexer, IBMCOBOLStyle
my_code = open("cobol_ebcdic.cbl",'rb').read()
highlight(my_code,IBMCOBOLLexer(encoding='cp1140'),
            HtmlFormatter(style=IBMCOBOLStyle, full=True),
            open('test.html','w'))

Also see the pygments_ibm_cobol_lexer-1.1/pygments_tests/ directory

About cp1147

I have files coded IBM1147 (EBCDIC french + euro sign), I was forced to write my own codec cp1147, very close to the cp500 (Canada, Belgium), it diverges on the characters “@°{}§ùµ£à[€`¨#]~éè¦ç” : ..

from pygments_ibm_cobol_lexer import cp1147
print "euro sign ?",chr(159).decode('cp1147')
print ''.join([ chr(i).decode('cp1147') for i in range(0,256)
          if chr(i).decode('cp1147') != chr(i).decode('cp500')])

I have added this import in IBMCOBOLLexer init method :

Changelog

1.1 - (2012-11-19) Minor Fix + EBCDIC enhancements:

  • Fix : float regex detection before integer detection

  • Add inline-commentaire *> (not the IBM default)

  • Change cics/dli keywords color…

  • Extend CICS_KEYWORDS, remove EJECT/SKIP from COBOL_KEYWORDS (treated as comments)

  • each ASCII input lines is padded to 80 columns

  • Add EBCDIC features:

    • add my own french codec cp1147

    • if EBCDIC encoding is passed (cp500,cp1140,…) or detected,convert the binary input raw text in 80 columns fixed lines

    • encoding=chardet (slowly) does not detect EBCDIC chart,it’s override with encoding=guess

    • “guess EBCDIC” is defaulted to self.encoding='cp500'

1.0 - (2012-11-12) Initial release.

Online demo

This lexer can be tested online here (pygments).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pygments_ibm_cobol_lexer-1.1.tar.gz (78.2 kB view hashes)

Uploaded Source

Built Distribution

pygments_ibm_cobol_lexer-1.1-py2.7.egg (32.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page