Make it easier to use Python as an AWK replacement

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
Topic
- Text Processing

Project description

awking

Make it easier to use Python as an AWK replacement.

Basic usage

Extracting groups of lines

from awking import RangeGrouper

lines = '''
text 1
text 2
group start 1
text 3
group end 1
text 4
group start 2
text 5
group end 2
text 6
'''.splitlines()

for group in RangeGrouper('start', 'end', lines):
    print(list(group))

This will output:

['group start 1', 'text 3', 'group end 1']
['group start 2', 'text 5', 'group end 2']

Extracting fixed-width fields

from awking import records

ps_aux = '''
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         1  0.0  0.0  51120  2796 ?        Ss   Dec22   0:09 /usr/lib/systemd/systemd --system --deserialize 22
root         2  0.0  0.0      0     0 ?        S    Dec22   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S    Dec22   0:04 [ksoftirqd/0]
root         5  0.0  0.0      0     0 ?        S<   Dec22   0:00 [kworker/0:0H]
root         7  0.0  0.0      0     0 ?        S    Dec22   0:15 [migration/0]
root         8  0.0  0.0      0     0 ?        S    Dec22   0:00 [rcu_bh]
root         9  0.0  0.0      0     0 ?        S    Dec22   2:47 [rcu_sched]
saml      3015  0.0  0.0 117756   596 pts/2    Ss   Dec22   0:00 bash
saml      3093  0.9  4.1 1539436 330796 ?      Sl   Dec22  70:16 /usr/lib64/thunderbird/thunderbird
saml      3873  0.0  0.1 1482432 8628 ?        Sl   Dec22   0:02 gvim -f
root      5675  0.0  0.0 124096   412 ?        Ss   Dec22   0:02 /usr/sbin/crond -n
root      5777  0.0  0.0  51132  1068 ?        Ss   Dec22   0:08 /usr/sbin/wpa_supplicant -u -f /var/log/wpa_supplica
saml      5987  0.7  1.5 1237740 119876 ?      Sl   Dec26  14:05 /opt/google/chrome/chrome --type=renderer --lang=en-
root      6115  0.0  0.0      0     0 ?        S    Dec27   0:06 [kworker/0:2]
'''

for user, _, command in records(ps_aux.splitlines(), widths=[7, 58, ...]):
    print(user, command)

This will output:

USER    COMMAND
root    /usr/lib/systemd/systemd --system --deserialize 22
root    [kthreadd]
root    [ksoftirqd/0]
root    [kworker/0:0H]
root    [migration/0]
root    [rcu_bh]
root    [rcu_sched]
saml    bash
saml    /usr/lib64/thunderbird/thunderbird
saml    gvim -f
root    /usr/sbin/crond -n
root    /usr/sbin/wpa_supplicant -u -f /var/log/wpa_supplica
saml    /opt/google/chrome/chrome --type=renderer --lang=en-
root    [kworker/0:2]

The problem

Did you ever have to scan a log file for XMLs? How hard was it for you to extract a set of multi-line XMLs into separate files?

You can use re.findall or re.finditer but you need to read the entire log file into a string first. You can also use an AWK script like this one:

#!/usr/bin/awk -f

/^Payload: <([-_a-zA-Z0-9]+:)?Request/ {
    ofname = "request_" (++index) ".xml"
    sub(/^Payload: /, "")
}

/<([-_a-zA-Z0-9]+:)?Request/, /<\/([-_a-zA-Z0-9]+:)?Request/ {
    print > ofname
}

/<\/([-_a-zA-Z0-9]+:)?Request/ {
    if (ofname) {
        close(ofname)
        ofname = ""
    }
}

This works, and quite well. (Despite this being a Python module I encourage you to learn AWK if you don't already know it.)

But what if you want to build this kind of stuff into your Python application? What if your input is not lines in a file but a different type of objects?

Python equivalent using `awking`

The RangeGrouper class groups elements from the input iterable based on predicates for the start and end element. This is a bit like Perl's range operator or AWK's range pattern, except that your ranges get grouped into START..END iterables.

An equivalent of the above AWK script might look like this:

from awking import RangeGrouper
import re
import sys

g = RangeGrouper(r'^Payload: <([-_a-zA-Z0-9]+:)?Request',
                 r'</([-_a-zA-Z0-9]+:)?Request', sys.stdin)
for index, request in enumerate(g, 1):
    with open(f'request_{index}.xml', 'w') as f:
        for line in request:
            line = re.sub(r'^Payload: ', '', line)  # Not optimal
            print(line, file=f, end='')

The predicates may be regular expressions, either as re.compile() objects or strings; or they may be any callables that accept a single argument and return a true/false value.

Caveats

The grouping algorithm reads the input iterable lazily. You can still run out of memory if you keep references to previous groups without consuming them.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 5 - Production/Stable
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language
- Python :: 3
Topic
- Text Processing

Release history Release notifications | RSS feed

This version

1.1.2

Dec 9, 2021

1.1.1

Aug 10, 2019

1.1.0

Jul 6, 2019

1.0.0

Jul 5, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

awking-1.1.2.tar.gz (5.5 kB view details)

Uploaded Dec 9, 2021 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

awking-1.1.2-py3-none-any.whl (5.7 kB view details)

Uploaded Dec 9, 2021 Python 3

File details

Details for the file awking-1.1.2.tar.gz.

File metadata

Download URL: awking-1.1.2.tar.gz
Upload date: Dec 9, 2021
Size: 5.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for awking-1.1.2.tar.gz
Algorithm	Hash digest
SHA256	`1485141b8115c09e783d0fcb7c1910ca14bdc2a99332a84f64daceaef83aa576`
MD5	`574676cdbbd22d8df50f1bf1da1c1023`
BLAKE2b-256	`5431a2eb831d1ae73bc943425594140dea95653c12d07dbb3ad1da28e2075681`

See more details on using hashes here.

File details

Details for the file awking-1.1.2-py3-none-any.whl.

File metadata

Download URL: awking-1.1.2-py3-none-any.whl
Upload date: Dec 9, 2021
Size: 5.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0

File hashes

Hashes for awking-1.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`975cc336302144e8c61e0b5d2e609acb3281960151c059f6833e2b3a0b7562d2`
MD5	`153d55be82793fc3455d3c60c47b1127`
BLAKE2b-256	`6eb783c2aacfa2a72e756fb7d1950c43d84cdae138ebfea8f9ec18bbd3d556ac`

See more details on using hashes here.

awking 1.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

awking

Basic usage

Extracting groups of lines

Extracting fixed-width fields

The problem

Python equivalent using `awking`

Caveats

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

awking 1.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

awking

Basic usage

Extracting groups of lines

Extracting fixed-width fields

The problem

Python equivalent using awking

Caveats

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Python equivalent using `awking`