Skip to main content

Generate temporary Python scripts to quickly process lines of text or whole text files.

Project description

temporython

Generate temporary Python scripts to quickly process lines of text or whole text files.

Synpopsis

temporython is both a command line tool and a Python library. It creates boilerplate Python scripts to help you quickly solve text processing problems.

If you want to build quick, one-off scripts in Python to process lines of text from files or standard input, this tool may speed up your productivity by setting you up with a boilerplate script that already has command line and input processing handled. All you need to do is edit the code to process the lines of input, which may be as little as one line of code.

Quick example

Let's say your manager emails you two text files in a random format and you need to analyze/convert/transform/correct/import that data in some manner. The files are named bounced_email_logs_yesterday.log and bounced_email_logs_today.log. She asks if you can get your processing work done quickly because an important customer is waiting. If you find yourself writing custom Python scripts or writing in an interactive shell, then temporython could help accelerate getting you set up. Let's see how that is done.

You type the command...

temporython lines process_bounced_email_logs.py

...and the file process_bounced_email_logs.py is created. Here is what is inside of that file.

#! /usr/bin/env python3

# a quick script initially generated by `temporython` that:
#   * reads in a list of filenames provided on the command line, or defaults to stdin.
#   * processes each line

# nice reference for Python 3 text processing: https://docs.python.org/3/library/text.html
import string
import textwrap
import re

from temporython import main


### custom processing #########################################################

class LineProcessor:
    def __init__(self):
        """
        called before processing begins
        """
        pass

    def process_line(self, filename, line_number, line):
        # NOTE: filename can be '-' if processing stdin.
        line = line.strip()
        print(line)

    def post_process(self):
        """
        called once after all lines in all files have been processed.
        """
        pass


if __name__ == '__main__':
    main(LineProcessor)

Nice! We have a pretty clear boilerplate script. Now let's customize it.

You open up your new script in your favorite text editor and edit the process_line() function to your liking. You write code to parse the line and print out the digested results if certain conditions are met (as per your boss's email).

Now let's process the files with the new script.

$ ./process_bounced_email_logs.py bounced_email_logs_yesterday.log bounced_email_logs_today.log

Or we can pipe the files in like this instead...

$ cat bounced_email_logs_yesterday.log bounced_email_logs_today.log | ./process_bounced_email_logs.py

And now the data is processed and you manager is happy because you completed the task and you completed it quickly. Yay! temporython set up a great boilerplate to get you processing the log data quickly and allowed you to focus on writing your custom logic.

Features

template types

temporython generates three main kinds of text processing scripts:

  • lines - boilerplate is set up so that you can process lines of text, and know which file and line number each line comes from. The lines can be piped in to stdin or filenames can be provided to the generated script via command line arguments.
  • pipe - boilerplate is set up so that you can process lines of text that only come from stdin.
  • files - boilerplate is set up so that you can process contents of whole files. Filenames are provided as command line arguments.

importing (default) vs. inlining

By default, the generated scripts will rely on the temporython library to provide functionality. This helps keep the generated scripts short. But if you do not want your scripts to depend on the temporython library, you can use the --inline option.

Installation

pip3 install temporython

Requirements

This software requires Python 3.5 or above.

Usage

Display help

You can display help with temporython --help or temporython -h.

$ temporython --help
usage: temporython [-h] [-i] {files,lines,pipe} [FILENAME]

positional arguments:
  {files,lines,pipe}
  FILENAME            Name of file to generate

optional arguments:
  -h, --help          show this help message and exit
  -i, --inline        Inline the temporython library in the generated code instead
                      of including it.

Command line switches

  • -i, --inline - Inline the required pieces of the temporython library into the generated script instead of relying on temporython being available for import when the script is run. This option is useful if you want to create a script that has no external dependencies.

Generate a script to process lines of text

This command generates a script called 'my_cool_script.py'

$ temporython lines my_cool_script.py

Note: If you do not provide a name, then a default name of process_lines.tmp will be chosen.

Now edit the code to your liking.

  • LineProcessor.__init__() - add code in here that you wish to run only once when your script begins.
  • LineProcessor.process_line(filename, line_number, line) - edit the code here to process each line as you wish.
  • LineProcessor.post_process() - add code here that you wish to run once after all files & lines have been processed.

You can process lines by running your new script like this...

$ ./my_cool_script.py file1.txt file2.text file3.text

Or you can process lines by piping them in like this...

$ cat file1.txt file2.text file3.text | ./my_cool_script.py

Generate a script to process text piped in via stdin

This command generates a script called 'my_cool_script.py'

$ temporython pipe my_cool_script.py

Note: If you do not provide a name, then a default name of process_pipe.tmp will be chosen.

Now edit the code to your liking.

  • LineProcessor.__init__() - add code in here that you wish to run only once when your script begins.
  • LineProcessor.process_line(filename, line_number, line) - edit the code here to process each line as you wish.
  • LineProcessor.post_process() - add code here that you wish to run once after all files & lines have been processed.

You can run your script by piping in content like this...

$ cat file1.txt file2.text file3.text | ./my_cool_script.py

Generate a script to process whole files

performance warning when processing large files: - temporython generates a script that will load entire files into memory at one time. If you are processing really large files, your script may run really slowly or run out of memory.

This command generates a script called 'my_cool_script.py'

$ temporython files my_cool_script.py

Note: If you do not provide a name, then a default name of process_files.tmp will be chosen.

Now edit the code to your liking.

  • FileProcessor.__init__() - add code in here that you wish to run only once when your script begins.
  • FileProcessor.process_file(filename, contents) - edit the code here to process the contents of each file as you wish.
  • FileProcessor.post_process() - add code here that you wish to run once after all files & lines have been processed.

You can process lines by running your new script like this...

$ ./my_cool_script.py file1.txt file2.text file3.text

You can also pipe data into your file processing script and it will process all input as on large file named -.

$ cat file1.txt file2.text file3.text | ./my_cool_script.py

What if my temporary script isn't temporary any more?

Perhaps after creating your script you find yourself reusing or maintaining what was supposed to be a one-off script. This is not a problem.

You could ensure your script is named well and then check it in to the code repo of the project it supports. You could install temporython in your environment if your script imports the libary, or you do not need to install it if you used the --inline option.

It is also possible that your script may outgrow temporython. You can refactor your script to remove the dependency on temporython. This may include writing your own command line argument parser using Python's built-in argparse library.

Alternatives

Text processing problems can be solved in a variety of ways other than using temporython.

Here is only a short list of possibilities...

  • shell scripts - using grep, sed, awk, etc.
  • spreadsheets
  • Jupyter notebooks
  • interactive Python shell
  • other text processing Python libraries
  • other code generators
  • text editor automation
  • manual editing

Each of these have have advantages and disadvantages, but in the end it the choice of tool(s) comes down to your personal preference, your comfort level, and the constraints/requirements of the environment that you work in.

If you are comfortable writing ad hoc code to slice and dice strings in Python, temporython may be a great tool to add to your toolbelt. You can use temporython instead of or or along with the alternatives listed above depending on the text processing problem you face.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

temporython-0.8.1.tar.gz (7.1 kB view hashes)

Uploaded Source

Built Distribution

temporython-0.8.1-py3-none-any.whl (10.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page