Skip to main content

aggnf: Aggregate Nth Field. A small console utility to count/group text data.

Project description

https://img.shields.io/travis/jwgalley/aggnf.svg https://img.shields.io/pypi/v/aggnf.svg

aggnf: Aggregate Nth Field. A small console utility to count/group text data.

Features

Generates aggregate counts of text data, using a specified field as a key.

Fields can be delimited by any string, the default is consecutive whitespace.

Key field can be any integer, with negative integers counting backwards. The default is the last field.

How-To

The --help option is descriptive:

~$ aggnf --help
Usage: aggnf [OPTIONS] [IN_DATA]

  Group text data based on a Nth field, and print the aggregate result.

  Works like SQL:
      `select field, count(*) from tbl group by field`

  Or shell:
      `cat file | awk '{print $NF}' | sort | uniq -c`

  Arguments:
      IN_DATA   Input file, if blank, STDIN will be used.

Options:
  -d, --sep TEXT          Field delimiter. Defaults to whitespace.
  -n, --fieldnum INTEGER  The field to use as the key, default: last field.
  -o, --sort              Sort result.
  -i, --ignore-err        Don't exit if field is specified and out of range.
  --help                  Show this message and exit.

Here we generate an example file of 1000 random numbers, and ask aggnf to group it for us, ordering the result by the most common occurrences:

~$ seq 1 1000 | while read -r l; do echo -e "line:${l}\t${RANDOM:0:1}"; done > rand.txt
~$ aggnf -o rand.txt
       1: 340
       2: 336
       3: 120
       8: 42
       6: 37
       5: 35
       7: 35
       4: 33
       9: 22

This might look familiar, as it’s the same result one might get from something like select field,count(*) as count from table group by field order by count desc, or even by the following bash one-liner:

~$ cat rand.txt | awk '{print $NF}' | sort | uniq -c | sort -nr
340 1
336 2
120 3
 42 8
 37 6
 35 7
 35 5
 33 4
 22 9

To-Do

  1. Output is mangled when using another delimiter, will fix.

  2. Add a --sum option, which will key on one field, and sum the contents of another.

  3. Speed optimizations.

Notes

The usefulness of this program is questionable. It’s functionality is already covered by existing console commands that are much faster.

This project is merely a quick example to learn the basics of packages which are unfamiliar to me, namely: cookiecutter, tox, and click.

History

April 4th: Released

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aggnf-0.2.3.tar.gz (31.4 kB view details)

Uploaded Source

Built Distribution

aggnf-0.2.3-py2.py3-none-any.whl (4.7 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file aggnf-0.2.3.tar.gz.

File metadata

  • Download URL: aggnf-0.2.3.tar.gz
  • Upload date:
  • Size: 31.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for aggnf-0.2.3.tar.gz
Algorithm Hash digest
SHA256 c33ed7245cc664c484622eab8c4b6bc9b73d93484d30f236595effe1f9a1a053
MD5 e70ad8a1d11a06fd9c79343b2d35b145
BLAKE2b-256 c468668f53718995c42da0bee788edbf4c544bc5a2b788a743607f720e92d96a

See more details on using hashes here.

File details

Details for the file aggnf-0.2.3-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for aggnf-0.2.3-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 76318183bac18419d33ad04a3dcc54dd10be569d1041ebbb80443dd6a0d47d1c
MD5 6ec655037ad146976fcf490c4aca7e19
BLAKE2b-256 fcc22c93046b7936f57a41bd056466bb8760b3da52866bf6e344175e78664764

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page