Skip to main content

A minimalist implementation of an Awk-like model in Python

Project description

μ-awk

This is a tiny Python implementation of a line processor with Awk-like semantics. You write a set of regex-based rules. The program loops through the lines of some input file, running the matching functions on lines that match.

This package is too small by any margin to qualify for the status of "package", but I keep finding myself copy-pasting this code,making small improvements every time otherwise.

Install

It is considered best practice to use a virtual environment (I recommend using poetry).

pip install mawk

Tutorial

A μ-awk routine is a set of methods that are triggered on regexes. The routine will recieve the re.Match object and is expected to return one of three things:

  • None: ignore that I was ever called, continue to find another rule
  • []: the rule completed successfully, but didn't generate any output
  • ["any", "number of", "strings"]: replace the given input line with these lines

Suppose we want to create an outline from a Markdown document, we may filter on lines starting with a # character. You then write a class that derives from mawk.RuleSet and decorate its methods with mawk.on_match.

from dataclasses import dataclass
import mawk
import re


@dataclass
class Outliner(mawk.RuleSet):
    ignore: bool = False

    @mawk.on_match(r"^#.*$")
    def on_header(self, m: re.Match):
        if self.ignore:
            return
        return [m[0]]

    @mawk.on_match(r"^```")
    def on_codeblock(self, _):
        self.ignore = not self.ignore
        return []

    @mawk.always
    def otherwise(self, _):
        return []


if __name__ == "__main__":
    with open("README.md", "r") as f:
        print(Outliner().run(f.read()))

This will output:

# μ-awk
## Install
## Tutorial
## License

Note that we had to ignore the content of code-blocks, so that the expected output above isn't included in the real output.

The mawk.always decorator always matches; the passed argument is therefore a str not re.Match. Rules are matched in order of definition; by default only the first match is used.

License

Copyright 2023, the Netherlands eScience Center. This package is distributed under the Apache 2 License, see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mawk-0.1.0.tar.gz (3.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mawk-0.1.0-py3-none-any.whl (3.5 kB view details)

Uploaded Python 3

File details

Details for the file mawk-0.1.0.tar.gz.

File metadata

  • Download URL: mawk-0.1.0.tar.gz
  • Upload date:
  • Size: 3.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for mawk-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6b872a881cc51ff046c5dcfe85639df947ccbe09104c9b06e9c1c2445f2b44ef
MD5 df79112a1dfbcc343f5e7059079bef0e
BLAKE2b-256 8d41ab78f9270fb226f3ca9229bf7909757246dff7bcc9d4ebe69f515dbc768b

See more details on using hashes here.

File details

Details for the file mawk-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mawk-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 3.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for mawk-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a362b00b69b19cbc58d7a4e02348e20dfe3a819af3bc7335a9ffaaa7d3045f41
MD5 b4af75a285bd8529011453d53da6c2cd
BLAKE2b-256 afd8313eecc7168b4f63260a56745ce188bf650415577d1ea7a5baf5ea4d67b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page