Skip to main content

UNIX command-line tool for python line-based stream processing

Project description

Author: Pahaz Blinov

Repo: https://github.com/pahaz/py3line/

Pyline is a UNIX command-line tool for line-based processing in Python with regex and output transform features similar to grep, sed, and awk.

This project inspired by: piep, pysed, pyline, pyp and Jacob+Mark recipe

requirements: Python3

WHY I MAKE IT?

I sometimes have to use sed / awk. Not often, and so I always forget the necessary options and sed / awk DSL. But I now python, I like it, and I want use it for data processing. Default python -c is hard to write the kind of one-liner that works well.

Why not a pyline?
  • Don`t support python3

  • Have many options (I want as much simple as possible solution)

  • Bad performance

  • Don`t support command chaining

Why not a pysed?

Installation

py3line is on PyPI, so simply run:

pip install py3line

or

easy_install py3line

to have it installed in your environment.

For installing from source, clone the repo and run:

python setup.py install

Tutorial

Lets start with two simple examples:

$ echo -e "Here are\nsome\nwords for you." | ./py3line.py "x.split()" -a "len(x)"
2
1
3

$ echo -e "Here are\nsome\nwords for you." | ./py3line.py "x.split()" -a "len(x)" -a "sum(xx)"
6

How it works?

Py3line produces a transform over the input data stream. Py3line transform is constructed from a sequence of python actions. Each action can be an action over an element of stream or an action over the stream.

First example overview

   echo -e "Here are\nsome\nwords for you." | ./py3line.py "x.split()" -a "len(x)"

* **echo -e "Here are\nsome\nwords for you."** -- create an input stream data consists of three lines
* **|** -- pipeline input stream to py3line
* **"x.split()" -a "len(x)"** -- define two actions: "x.split()" and "len(x)". Each of them is element based action

Py3line expects to get at least one transformation action as positional argument. You also can define additional action by using -a arguments, as shown in the example above.

The example above can be represented as the following python pseudo-code:

import sys

for x in sys.stdin.readlines():

    # 1) action "x.split()"
    x = x.split()

    # 2) action "len(x)"
    x = len(x)

    print(x)

Second example overview

echo -e "Here are\nsome\nwords for you." | ./py3line.py "x.split()" -a "len(x)" -a "sum(xx)"

Here we have stream based action “sum(xx)”.

It can be represented as python pseudo-code:

import sys

xx = [x for x in sys.stdin.readlines()]

for x in xx:

    # 1) action "x.split()"
    x = x.split()

    # 2) action "len(x)"
    x = len(x)

# 3) action "sum(xx)"
print(sum(xx))

What is order actions?

This commands are equal:

./py3line.py "x.split()" -a "len(x)" -a "sum(xx)"
./py3line.py -a "x.split()" "len(x)" -a "sum(xx)"
./py3line.py -a "x.split()" -a "len(x)" "sum(xx)"

But we recommend use:

./py3line.py "x.split()" -a "len(x)" -a "sum(xx)"

as the right actions ordering.

Why it so? Because you must pass one action as positional argument.

Actions chaining

Let us define some terminology. py3line action1 -a action2 -a action3

We have actions: action1, action2 and action3. Each of them may be element based or stream based.

Element based action can be represented as python pseudo-code:

xx = ...
new_xx = []

for x in xx:
    # DO ELEMENT BASED ACTION ON `x`
    result = eval(compile(action_x, ..., 'eval'), {'x': x})
    new_xx.append(result)

xx = new_xx

Stream based action can be represented as python pseudo-code:

xx = ...

# DO STREAM BASED ACTION ON `xx`
xx = eval(compile(action_xx, ..., 'eval'), {'xx': xx})

Pre-actions

Sometimes you want prepare some variables or import some modules.

You can use -m options for import module:

./py3line.py -m shlex "shlex.split(x)[13]"

You also can use -p options for run exec some actions before processing:

./py3line.py -p "rgx = re.compile(r' is ([A-Z]\w*)')" "rgx.search(x).group(1)"

Pseudo code example ./py3line.py -m module1 -m module2 -p pre-action1 -p pre-action2 …

import module1
import module2

pre-action1
pre-action2

...

Options ordering

Regardless of the sequence definition. First be made all imports (-m option), then be made all pre-action (-p option), and then actions (-a option + 1st positional argument).

# Print every line (null transform)
$ cat ./testsuit/test.txt | ./py3line.py x
This is my cat,
 whose name is Betty.
This is my dog,
 whose name is Frank.
This is my fish,
 whose name is George.
This is my goat,
 whose name is Adam.

# Number every line
$ cat ./testsuit/test.txt | ./py3line.py "i, x"
0 This is my cat,
1  whose name is Betty.
2 This is my dog,
3  whose name is Frank.
4 This is my fish,
5  whose name is George.
6 This is my goat,
7  whose name is Adam.

# Print every first and last word
$ cat ./testsuit/test.txt | ./py3line.py "x.split()[0], x.split()[-1]"
This cat,
whose Betty.
This dog,
whose Frank.
This fish,
whose George.
This goat,
whose Adam.

# Split into words and print (strip al non word char like comma, dot, etc)
$ cat ./testsuit/test.txt | ./py3line.py "re.findall(r'\w+', x)"
This is my cat
whose name is Betty
This is my dog
whose name is Frank
This is my fish
whose name is George
This is my goat
whose name is Adam

# Regex matching with groups
$ cat ./testsuit/test.txt | ./py3line.py "re.findall(r' is ([A-Z]\w*)', x) or False"
Betty
Frank
George
Adam

# cat ./testsuit/test.txt | ./py3line.py "re.search(r' is ([A-Z]\w*)', x).group(1)"
$ cat ./testsuit/test.txt | ./py3line.py -p "rgx = re.compile(r' is ([A-Z]\w*)')" "rgx.search(x).group(1)"
Betty
Frank
George
Adam

## Original Examples
# Print out the first 20 characters of every line
# cat ./testsuit/test.txt | ./py3line.py "i < 2"
$ cat ./testsuit/test.txt | ./py3line.py "list(xx)[:2]"
This is my cat,
 whose name is Betty.

# Print just the URLs in the access log
$ cat ./testsuit/nginx.log | ./py3line.py -m shlex "shlex.split(x)[13]"
HEAD / HTTP/1.0
HEAD / HTTP/1.0
HEAD / HTTP/1.0
HEAD / HTTP/1.0
HEAD / HTTP/1.0
GET /admin/moktoring/session/add/ HTTP/1.1
GET /admin/jsi18n/ HTTP/1.1
GET /static/admin/img/icon-calendar.svg HTTP/1.1
GET /static/admin/img/icon-clock.svg HTTP/1.1
HEAD / HTTP/1.0
HEAD / HTTP/1.0
HEAD / HTTP/1.0
HEAD / HTTP/1.0
HEAD / HTTP/1.0
GET /logout/?reason=startApplication HTTP/1.1
GET / HTTP/1.1
GET /login/?next=/ HTTP/1.1
POST /admin/customauth/user/?q=%D0%9F%D0%B0%D1%81%D0%B5%D1%87%D0%BD%D0%B8%D0%BA HTTP/1.1

# Print most common accessed urls and filter accessed more then 5 times
$ cat ./testsuit/nginx.log | ./py3line.py -m shlex -m collections  -a "shlex.split(x)[13]" -a "collections.Counter(xx).most_common()" "x[1] > 5 and x[0]"
HEAD / HTTP/1.0

HELP

usage: py3line.py [-h] [-a action] [-p pre_action] [-o OUTPUT] [-i]
                  [--in-place-suffix IS_INPLACE_SUFFIX] [-m MODULES] [-v] [-q]
                  [--version]
                  action [file [file ...]]

Py3line is a UNIX command-line tool for line-based processing in Python with
regex and output transform features similar to grep, sed, and awk.

positional arguments:
  action                <python_expression>
  file                  Input file #default: stdin

optional arguments:
  -h, --help            show this help message and exit
  -a action, --action action
                        <python_expression>
  -p pre_action, --pre-action pre_action
                        <python_expression>
  -o OUTPUT, --out OUTPUT, --output-file OUTPUT
                        Output file #default: '-' for stdout
  -i, --in-place        Output to editable file
  --in-place-suffix IS_INPLACE_SUFFIX
                        Output to editable file and provide a backup suffix
                        for keeping a copy of the original file
  -m MODULES, --modules MODULES
                        for m in modules: import m #default: []
  -v, --verbose
  -q, --quiet
  --version             Print the version string

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py3line-0.0.2.tar.gz (9.2 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page