Skip to main content

A tool for searching & extracting information from multiple text files.

Project description

This package contains two tools: Raptor & Reptar

Raptor

Raptor for extracting and displaying information from a set of files of the same type; and creating a single file with all the selected information.

The information in the files may be in multiple rows:

PC01.txt:
User=ms123
Name=Mayra Sanz
OS=GNU/Linux
IP=10.226.140.1

But, also, the information may be in several columns. It is possible to read data from multiple fields in a single line:

PC01.log:
User: ms123     Name: Mayra Sanz
OS: GNU/Linux   IP: 10.226.140.1

Example: data from the following files:

PC01.txt:
User=ms123
Name=Mayra Sanz
OS=GNU/Linux
IP=10.226.140.1

PC02.txt:
User=lt001
Name=Luis Toribio
OS=GNU/Linux
IP=10.226.140.2

PC03.txt:
User=co205
Name=Clara Osto
OS=Win
IP=10.226.140.3

… You can create a CSV file with the following information:

users.csv:
User,Name,OS,IP
MS123,Mayra Sanz,GNU/linux,10.226.140.1
LT001,Luis Toribio,GNU/linux,10.226.140.2
CO205,Clara Osto,Win,10.226.140.3

To achieve this you need to create a template (.rap) with Raptor, which is similar to an INI file with the following information:

users.rap:
[General]
description = Get list of users
extension = txt
prefix = PC
output_folder = txt
input_folder = txt
output_file = users.csv
delimiter = ,
quotechar = "
include_header = 1
include_file = 0
include_record_num = 0
include_empty_record = 0
search_multiple = 0
alternate_header =
search_multiple = 0

[Fields]
user = User=
name = Name=
os = OS=
ip = IP=

[Rules]
rule1 = ('user', 'UPPER')

To create .rap template (If the .rap template exists it is not saved). (Caution: field names must be lowercase):

from pysaurio import Raptor
rap1 = Raptor()
rap1.description = 'Get list of users'
rap1.extension = 'txt'
rap1.prefix = 'PC'
rap1.input_folder = 'txt'
rap1.output_folder = 'txt'
rap1.output_file = 'users.csv'
rap1.delimiter = ','
rap1.quotechar = '"'
rap1.include_header = '1'
rap1.include_file = '1'
rap1.include_record_num = '1'
rap1.include_empty_record = '0'
rap1.search_multiple = '0'
rap1.alternate_header = ''
rap1.fields['user'] = 'User='
rap1.fields['name'] = 'Name='
rap1.fields['os'] = 'OS='
rap1.fields['ip'] = 'IP='
rap1.rules.append(('user', 'UPPER'))
rap1.rules.append(('name', 'REMOVEFROM', ' '))
rap1.Save("users.rap")
del rap1

Attribute List:

  • description: short descripton of .rap template

  • extension: extension of the files to read

  • prefix: files must begin with this string

  • input_folder: folder of files to read

  • output_folder: output folder to save file with result

  • output_file: output filename

  • delimiter: delimiter character

  • quotechar: quote character

  • include_header: ‘0’ or ‘1’

  • include_file: ‘0’ or ‘1’

  • inclide_file_datetime: ‘0’ or ‘1’ (file creation & modification date/time)

  • include_record_num: ‘0’ or ‘1’

  • include_empty_record: ‘0’ or ‘1’

  • search_multiple: ‘0’ or ‘1’

  • alternate_header: alternative text of the report header

  • fields: dictionary with fieldnames and search string (read template)

  • record: dictionary with fieldnames and values (read template)

  • rules: list of rules (read template)

  • list_files: list of filenames to read (auto)

  • record_counter: number of records (auto)

  • errors: list of errors (auto)

  • number_errors: number of errors after you open or save a template

Functions available for rules:

  • rule1 = (fieldname, ‘SUBSTR’, postion_initial, lenght)

  • rule1 = (fieldname, ‘REPLACE’, ‘search_string’, ‘replace_string’)

  • rule1 = (fieldname, ‘REPLACEALL’, ‘search_string’, ‘replace_string’)

  • rule1 = (fieldname, ‘UPPER’)

  • rule1 = (fieldname, ‘LOWER’)

  • rule1 = (fieldname, ‘REVERSE’)

  • rule1 = (fieldname, ‘REMOVE’)

  • rule1 = (fieldname, ‘FIELDISDATA’)

  • rule1 = (fieldname, ‘REMOVEFROM’, ‘string’)

  • rule1 = (fieldname, ‘REMOVETO’, ‘string’)

Opens template (.rap) and creates (.csv) file from the data read from multiple text files:

from pysaurio import Raptor
import csv

rap2 = Raptor()
rap2.Open('users.rap')
if rap2.number_errors == 0:
    file_csv = open(rap2.output_file, 'w', newline='')
    csv_output = csv.writer(file_csv,
                            delimiter=rap2.delimiter,
                            quotechar=rap2.quotechar,
                            quoting=csv.QUOTE_MINIMAL)
    if rap2.include_header == '1':
        fields_list = rap2.BuildHeader()
        print(fields_list)
        csv_output.writerow(fields_list)

    for row in rap2.list_files:
        valid_record, new_record = rap2.BuildRow(row)
        new_record = rap2.ApplyRules(new_record)
        if valid_record:
            new_record = list(new_record.values())
            print(new_record)
            csv_output.writerow(new_record)
    file_csv.close()
else:
    print(rap2.ShowError())
del rap2

Reptar

Reptar allows merge files, including only the necessary lines.

Example: data from the following files:

PCS01.txt:
User,Name,OS,IP
ms123,Mayra Sanz,GNU/Linux,10.226.140.1
lt001,Luis Toribio,GNU/Linux,10.226.140.2
co205,Clara Osto,Win,10.226.140.3

PCS02.txt:
User,Name,OS,IP
nn345,Nadia Pacheco,Win,10.226.140.4
jm401,Juan Madrid,GNU/Linux,10.226.140.5

… You can create a file with the following information:

Linux.csv:
User,Name,OS,IP
MS124,MAYRA SANZ,GNU/LINUX,10.226.140.1
LT001,LUIS TORIBIO,GNU/LINUX,10.226.140.2
CO205,CLARA OSTO,WIN,10.226.140.3
JM401,JUAN MADRID,GNU/LINUX,10.226.140.5

In this example, lines that contain the text “Linux” or beginning with the text “co205” are included:

from pysaurio import Reptar
rep1 = Reptar()
rep1.description = 'Get list of Linux users'
rep1.extension = 'txt'
rep1.prefix = 'PCS'
rep1.input_folder = 'txt'
rep1.output_folder = 'txt'
rep1.output_file = 'Linux.csv'
rep1.include_header = '1'
rep1.include_file = '0'
rep1.include_record_num = '0'
rep1.alternate_header = ''
rep1.lines.append(('INCLUDE', 'Linux'))
rep1.lines.append(('INCLUDRE', '^co205'))
rep1.rules.append(('line', 'UPPER'))
rep1.Save("linux.rep")
del rep1

# Opens .rep template and create file with output information

rep2 = Reptar()
rep2.Open('linux.rep')
if rep2.number_errors == 0:
    file_csv = open(rep2.output_file, 'w')
    if rep2.include_header == '1':
        header = rep2.BuildHeader(rep2.list_files[0])
        print(header)
        file_csv.write(header + '\n')

    for row in rep2.list_files:
        current_file = open(rep2.input_folder + row, 'rb')
        while True:
            new_record = current_file.readline()
            new_record = new_record.decode("utf-8", "ignore")
            if not new_record: break
            valid_record, new_record = rep2.BuildRow(new_record, row)
            if valid_record:
                new_record = rep2.ApplyRules(new_record)
                print(new_record)
                file_csv.write(new_record + '\n')
        current_file.close()
    file_csv.close()
else:
    print(rep2.ShowError())
del rep2

Functions available for including and excluding lines:

  • line1 = (‘EXCLUDE’, ‘string’)

  • line1 = (‘INCLUDE’, ‘string’)

  • line1 = (‘EXCLUDEND’, ‘string’)

  • line1 = (‘INCLUDEND’, ‘string’)

  • line1 = (‘EXCLUDRE’, ‘regex’, ‘0’|’1’) # ‘1’ not case sensitive

  • line1 = (‘INCLUDRE’, ‘regex’, ‘0’|’1’) # (See module re)

The package contains more examples and data files to test.

Changelog

  • Pysaurio 0.2.6 - 2018-09-14 - Corrected error in function REMOVECOL

  • Pysaurio 0.2.5 - 2016-09-09 - new attribute: include_file_datetime = ‘0’ or ‘1’

  • Pysaurio 0.2.4 - Reptar include new rule: ‘REMOVECOL’, remove column

  • Pysaurio 0.2.3 - New functions: ‘INCLUDEND’ and ‘EXCLUDEND’

  • Pysaurio 0.2.2 - New argument in the ‘INCLUDRE’ and ‘EXCLUDRE’ functions

  • Pysaurio 0.2.1 - Reptar includes rules and the section ‘Lines’ you can use regular expressions.

  • Pysaurio 0.2.0 - Initial release (continued “Pyraptor”).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysaurio-0.2.6.tar.gz (31.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pysaurio-0.2.6-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file pysaurio-0.2.6.tar.gz.

File metadata

  • Download URL: pysaurio-0.2.6.tar.gz
  • Upload date:
  • Size: 31.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5

File hashes

Hashes for pysaurio-0.2.6.tar.gz
Algorithm Hash digest
SHA256 ecfe8aa1893c2fc7b825f3c42086c6be4171894ad4a3be4ea7b90ad65a3f6b5c
MD5 2087b22f2d566e6dc11a8a610a6a6d8f
BLAKE2b-256 dfbe981b14cc4d2d6d60f714d2516218714a70c460bcb9a7ae383612bd1f4849

See more details on using hashes here.

File details

Details for the file pysaurio-0.2.6-py3-none-any.whl.

File metadata

  • Download URL: pysaurio-0.2.6-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/39.0.1 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.6.5

File hashes

Hashes for pysaurio-0.2.6-py3-none-any.whl
Algorithm Hash digest
SHA256 a5219f5a7a92ed345bd3d7bd479cc54326ec06f4e0bbba5bf3f0cefe32d9e855
MD5 1be7fb257936a7f898fa2d91f8450b3b
BLAKE2b-256 a4d419d7ac8d04065d7ecaac48b772fbad67aab530fff01579a64790ba837518

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page