A command-line tool to parse multiple files for named

These details have not been verified by PyPI

Project links

Project description

parse2csv

parse2csv is a command-line tool to parse multiple files for named patterns in order to extract structured data from each file and then write the values in CSV format. For each input file, there would be a row in the generated CSV file containing the field values extracted from that file. The CSV header would be the names specified for each pattern.

The main motivation for writing this script is parsing the output of other tools and extract required information and put them in a CSV file for further analysis.

Installation

Using pip:

pip install parse2csv

Using setup script:

python setup.py install

Usage

The first step is preparing a configuration file. The config file should be in YAML format specifying the patterns for which the input files should be searched.

The patterns can be specified in the config file under patterns entry as a list. parse2csv uses parse package to extract data. Therefore, all patterns should comply with its format syntax (see format syntax). Since the output file is in CSV format, all fields in the patterns should be named; otherwise it cannot be determined which parsed value belongs to which column in the CSV file.

Apart from patterns, the order of the fields by which they should appear in the CSV file should also be specified in the config file under fields entry. All field names must be the same as the name used in the patterns. The fields entry does not require to include all named fields in the patterns list.

The entry missing_value in the config file indicates the value to be used in the output CSV in case a field cannot be found in the context. The default value is 'NA' in case it is not provided in the config file.

Here, is a sample config file:

---
missing_value: '-'
fields:
  - 'date'
  - 'first'
  - 'last'
  - 'address'
  - 'age'
patterns:
  - 'Date: {date:tg}'
  - 'Age: {age:d}'
  - 'Name: {first:w}{:s}{last:w}'
  - 'Name: {last:w},{:s}{first:w}'
  - 'Address: "{address}"'
...

Assume, there are two files:

$ cat file1
Date: 1/2/2011 11:00 PM
Name: Sherlock Holmes
Age: 38
Address: "221B Baker Street"

$ cat file2
Date: 6/1/2018 12:00 AM
Age: 42
Name: Watson, John

The output CSV file would be:

date,first,last,age
2011-02-01 23:00:00,Sherlock,Holmes,221B Baker Street,38
2018-01-06 12:00:00,John,Watson,-,42

In some cases, a field can be occurred multiple times in the context. These values can be reduced to one by specifying the reduce function in the config file under reduce entry as a mapping between field name and reduce function:

reduce:
  income: 'avg'
  children: 'count'

The above example maps the avg and count functions to 'income' and 'children' fields, respectively. In case, the income occurs more than once in the context, the average of them will be reported and for 'children' the number of the occurrences will be put in the generated CSV file.

The reduce functions can be one of these:

'first': use the first value.
'last': use the last value.
'avg': use the average of values (the values should be numerical).
'avg_tp': the same as 'avg' but preserves the original type (the values should be numerical).
'count': use the count of occurrences.
'min': use the minimum value.
'max': use the maximum value.
'sum': use the sum of values.
'concat': use the concatenation of the values (field should be str).

Once the configuration file is ready, using parse2csv is quite straightforward by providing the input and configuration files:

parse2csv -c config.yaml -o output.csv file1 file2...

The flag --help reveals more details about program usage:

$ python parse2csv.py --help
Usage: parse2csv.py [OPTIONS] [INPUTS]...

  Parse the input files for named patterns and dump their values to a file
  in CSV format.

Options:
  -o, --output FILENAME           Write to this file instead of stdout.
  -c, --configfile FILENAME       Use this configuration file.  [required]
  -d, --dialect [unix|excel|excel-tab]
                                  Use this CSV dialect.  [default: unix]
  --help                          Show this message and exit.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.5

Sep 25, 2024

0.1.4

Jun 5, 2018

0.1.3

May 28, 2018

0.1.2

May 21, 2018

0.1.1

May 18, 2018

0.1.0

May 18, 2018

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parse2csv-0.1.5.tar.gz (7.7 kB view details)

Uploaded Sep 25, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

parse2csv-0.1.5-py2.py3-none-any.whl (8.9 kB view details)

Uploaded Sep 25, 2024 Python 2Python 3

File details

Details for the file parse2csv-0.1.5.tar.gz.

File metadata

Download URL: parse2csv-0.1.5.tar.gz
Upload date: Sep 25, 2024
Size: 7.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for parse2csv-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`9fbd199214c0577d53015a3075cf91cdaa6d32064679942305c7a278ee265963`
MD5	`fa43a76ce69c90163898c8e0a1af5bb9`
BLAKE2b-256	`127e28950aa4839997c9b79c84b57cc36f533d307e57a033f4ab13b6d8464642`

See more details on using hashes here.

File details

Details for the file parse2csv-0.1.5-py2.py3-none-any.whl.

File metadata

Download URL: parse2csv-0.1.5-py2.py3-none-any.whl
Upload date: Sep 25, 2024
Size: 8.9 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for parse2csv-0.1.5-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`3d298d57a48a9b89f90be82d3dded3896cdb0030239fb3a317b479fe225c5117`
MD5	`e4ef6e4add3ddb86faa4d058a1bddf6d`
BLAKE2b-256	`62f84fd99ca1247f218ff25a89dd68fe683a3e7b3203d0a6d675884cc6e8439a`

See more details on using hashes here.

parse2csv 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

parse2csv

Installation

Usage

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes