Skip to main content

Groovy 3.0.x parser based on Pygments and Lark

Project description

python-groovy-parser

Python package which implements a Groovy 3.0.X parser, using both Pygments, Lark and the corresponding grammar.

The tokenizer, lexer and grammar have being tested, stressed and fine tuned to be able to properly parse both Nextflow (i.e. *.nf), nextflow.config-like files and real Groovy code from:

Install

You can install the development version of this package through pip just running:

pip install git+https://github.com/inab/python-groovy-parser.git

Test programs

This repo contains three test programs called translated-groovy3-parser.py, cached-translated-groovy3-parser.py and parser-groovy-writer.py, which demonstrate how to use the parser and digest it a bit.

All the programs take one or more files as input.

git pull https://github.com/nf-core/rnaseq.git
translated-groovy3-parser.py $(find rnaseq -type f -name "*.nf")

If an input file is for instance rnaseq/modules/local/bedtools_genomecov.nf, the program generates a log file rnaseq/modules/local/bedtools_genomecov.nf.lark, where the parsing traces are stored (emitted tokens, parsing errors, etc...).

Also, when the parsing task worked properly, it condenses and serializes the parse tree into a file with extension .lark.json (for instance, rnaseq/modules/local/bedtools_genomecov.nf.lark.json).

The first two programs try, as a proof of concept, to identify features from Nextflow files, like the declared process, include and workflow, and they are roughly printed at a file with extension .lark.result (for instance rnaseq/modules/local/bedtools_genomecov.nf.lark.result).

As parsing task is heavy, the parsing module also contains a method to be able to cache the parsed tree in JSON format in a persistent store, like a filesystem. So, next operation would be expensive the first time, but not the next ones:

GROOVY_CACHEDIR=/tmp/somecachedir cached-translated-groovy3-parser.py $(find rnaseq -type f -name "*.nf")

The caching directory contents depend on the grammar and the implementations, as well as versions of the dependencies. So, if this software is updated (due grammar is updated or a bug is fixed), cached contents from previous versions are not reused.

The third program parser-groovy-writer.py was written thinking on a request from an issue, where the issuer wanted to write back the parsed tree after some processing. So, this program writes in a new file with extension .mirrored what it survived the parsing. In the current implementation there are some elements, like comments and some combinations of whitespaces, which are not propagated from the tokenizer to the lexer and parser, so they are not reintegrated.

Acknowledgements

The tokenizer is an evolution from Pygments Groovy lexer https://github.com/pygments/pygments/blob/b7c8f35440f591c6687cb912aa223f5cf37b6704/pygments/lexers/jvm.py#L543-L618

The Lark grammar has been created from https://github.com/apache/groovy/blob/3b6909a3dbb574e66f5d0fb6aafb6e28316033a8/src/antlr/GroovyParser.g4 , converting it to EBNF using https://bottlecaps.de/convert/ , translating the EBNF representation to Lark format partially by hand.

Some fixes were inspired on https://github.com/daniellansun/groovy-antlr4-grammar-optimized/tree/master/src/main/antlr4/org/codehaus/groovy/parser/antlr4

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

groovy_parser-0.2.2.tar.gz (30.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

groovy_parser-0.2.2-py3-none-any.whl (31.6 kB view details)

Uploaded Python 3

File details

Details for the file groovy_parser-0.2.2.tar.gz.

File metadata

  • Download URL: groovy_parser-0.2.2.tar.gz
  • Upload date:
  • Size: 30.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for groovy_parser-0.2.2.tar.gz
Algorithm Hash digest
SHA256 c1ed37926794aa6f5c5ad28da60b6b6caeecd4c203aaaf71564647af76959653
MD5 4cefd885a899b20f6910ccb5324829e4
BLAKE2b-256 1587e24f82035abd8eb4c9a6f4296c35b31b43dc57cfb8b5630278d31bf14c7f

See more details on using hashes here.

File details

Details for the file groovy_parser-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: groovy_parser-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 31.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for groovy_parser-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e9b48b65cfabfb64ce7e7c728d39a7a54a69793d694ee94c3cd9eb05edaa8f24
MD5 4bb959d8bf353490df07599df26d526c
BLAKE2b-256 e8d5a2ffe1a8ca559c39b642fd506eb48b8115f4d32964d6ae1d83a0750bb019

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page