Skip to main content

A minimalist implementation of an Awk-like model in Python

Project description

μ-awk

This is a tiny Python implementation of a line processor with Awk-like semantics. You write a set of regex-based rules. The program loops through the lines of some input file, running the matching functions on lines that match.

This package is too small by any margin to qualify for the status of "package", but I keep finding myself copy-pasting this code,making small improvements every time otherwise.

Install

It is considered best practice to use a virtual environment. I recommend using poetry. If you do use Poetry, you can add it to your project by running:

poetry add mawk

Otherwise, using pip:

pip install mawk

Tutorial

A μ-awk routine is a set of methods that are triggered on regexes. The routine will recieve the re.Match object and is expected to return one of three things:

  • None: ignore that I was ever called, continue to find another rule
  • []: the rule completed successfully, but didn't generate any output
  • ["any", "number of", "strings"]: replace the given input line with these lines

Suppose we want to create an outline from a Markdown document, we may filter on lines starting with a # character. You then write a class that derives from mawk.RuleSet and decorate its methods with mawk.on_match.

from dataclasses import dataclass
import mawk
import re


@dataclass
class Outliner(mawk.RuleSet):
    ignore: bool = False

    @mawk.on_match(r"^#.*$")
    def on_header(self, m: re.Match):
        if self.ignore:
            return
        return [m[0]]

    @mawk.on_match(r"^```")
    def on_codeblock(self, _):
        self.ignore = not self.ignore
        return []

    @mawk.always
    def otherwise(self, _):
        return []


if __name__ == "__main__":
    with open("README.md", "r") as f:
        print(Outliner().run(f.read()))

This will output:

# μ-awk
## Install
## Tutorial
## License

Note that we had to ignore the content of code-blocks, so that the expected output above isn't included in the real output.

The mawk.always decorator always matches; the passed argument is therefore a str not re.Match. Rules are matched in order of definition; by default only the first match is used.

License

Copyright 2023, the Netherlands eScience Center. This package is distributed under the Apache 2 License, see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mawk-0.1.4.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

mawk-0.1.4-py3-none-any.whl (7.9 kB view details)

Uploaded Python 3

File details

Details for the file mawk-0.1.4.tar.gz.

File metadata

  • Download URL: mawk-0.1.4.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for mawk-0.1.4.tar.gz
Algorithm Hash digest
SHA256 4e115b2f7eae97406bf2360bafba22efee03d29ff298436a69db506b1535d2f1
MD5 6c6d4e4956cb0263e445aebd289f14b6
BLAKE2b-256 45420cb0d6d02649f50e2a8918a350152d8d139839cf74befeeeb83668302e48

See more details on using hashes here.

File details

Details for the file mawk-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: mawk-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 7.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.16

File hashes

Hashes for mawk-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 8ab7ce0808d10769f8aa05af8448046c290af5f529db874b44a8fd56056c4462
MD5 97b42cc006c23cbef77e57719e395c3b
BLAKE2b-256 5129b4ddddd1ae74ce2651e7eedf1871a6b2240433eeec070622f81b985d5e70

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page