Skip to main content

A tool for searching & extracting information from multiple text files.

Project description

This package contains two tools: Raptor y Reptar

Raptor

Raptor for extracting and displaying information from a set of files of the same type; and creating a single file with all the selected information.

The information in the files may be in multiple rows:

PC01.txt:
User=ms123
Name=Mayra Sanz
OS=GNU/Linux
IP=10.226.140.1

But, also, the information may be in several columns. It is possible to read data from multiple fields in a single line:

PC01.log:
User: ms123     Name: Mayra Sanz
OS: GNU/Linux   IP: 10.226.140.1

Example: data from the following files:

PC01.txt:
User=ms123
Name=Mayra Sanz
OS=GNU/Linux
IP=10.226.140.1

PC02.txt:
User=lt001
Name=Luis Toribio
OS=GNU/Linux
IP=10.226.140.2

PC03.txt:
User=co205
Name=Clara Osto
OS=Win
IP=10.226.140.3

… You can create a CSV file with the following information:

users.csv:
User,Name,OS,IP
MS123,Mayra Sanz,GNU/linux,10.226.140.1
LT001,Luis Toribio,GNU/linux,10.226.140.2
CO205,Clara Osto,Win,10.226.140.3

To achieve this you need to create a template (.rap) with Raptor, which is similar to an INI file with the following information:

users.rap:
[General]
description = Get list of users
extension = txt
prefix = PC
output_folder = txt
input_folder = txt
output_file = users.csv
delimiter = ,
quotechar = "
include_header = 1
include_file = 0
include_record_num = 0
include_empty_record = 0
search_multiple = 0
alternate_header =
search_multiple = 0

[Fields]
user = User=
name = Name=
os = OS=
ip = IP=

[Rules]
rule1 = ('user', 'UPPER')

To create .rap template (If the .rap template exists it is not saved):

from pysaurio import Raptor
rap1 = Raptor()
rap1.description = 'Get list of users'
rap1.extension = 'txt'
rap1.prefix = 'PC'
rap1.input_folder = 'txt'
rap1.output_folder = 'txt'
rap1.output_file = 'users.csv'
rap1.delimiter = ','
rap1.quotechar = '"'
rap1.include_header = '1'
rap1.include_file = '1'
rap1.include_record_num = '1'
rap1.include_empty_record = '0'
rap1.search_multiple = '0'
rap1.alternate_header = ''
rap1.fields['user'] = 'User='
rap1.fields['name'] = 'Name='
rap1.fields['os'] = 'OS='
rap1.fields['ip'] = 'IP='
rap1.rules.append(('user', 'UPPER'))
rap1.rules.append(('name', 'REMOVEFROM', ' '))
rap1.Save("users.rap")
del rap1

Attribute List:

  • description: short descripton of .rap template

  • extension: extension of the files to read

  • prefix: files must begin with this string

  • input_folder: folder of files to read

  • output_folder: output folder to save file with result

  • output_file: output filename

  • delimiter: delimiter character

  • quotechar: quote character

  • include_header: ‘0’ or ‘1’

  • include_file: ‘0’ or ‘1’

  • include_record_num: ‘0’ or ‘1’

  • include_empty_record: ‘0’ or ‘1’

  • search_multiple: ‘0’ or ‘1’

  • alternate_header: alternative text of the report header

  • fields: dictionary with fieldnames and search string (read template)

  • record: dictionary with fieldnames and values (read template)

  • rules: list of rules (read template)

  • list_files: list of filenames to read (auto)

  • record_counter: number of records (auto)

  • errors: list of errors (auto)

  • number_errors: number of errors after you open or save a template

Functions available for rules:

  • rule1 = (fieldname, ‘SUBSTR’, postion_initial, lenght)

  • rule1 = (fieldname, ‘REPLACE’, search_string, replace_string)

  • rule1 = (fieldname, ‘REPLACEALL’, search_string, replace_string)

  • rule1 = (fieldname, ‘UPPER’)

  • rule1 = (fieldname, ‘LOWER’)

  • rule1 = (fieldname, ‘REVERSE’)

  • rule1 = (fieldname, ‘REMOVE’)

  • rule1 = (fieldname, ‘FIELDISDATA’)

  • rule1 = (fieldname, ‘REMOVEFROM’, ‘string’)

  • rule1 = (fieldname, ‘REMOVETO’, ‘string’)

Opens template (.rap) and creates (.csv) file from the data read from multiple text files:

from pysaurio import Raptor
import csv

rap2 = Raptor()
rap2.Open('users.rap')
if rap2.number_errors == 0:
    file_csv = open(rap2.output_file, 'w', newline='')
    csv_output = csv.writer(file_csv,
                            delimiter=rap2.delimiter,
                            quotechar=rap2.quotechar,
                            quoting=csv.QUOTE_MINIMAL)
    if rap2.include_header == '1':
        fields_list = rap2.BuildHeader()
        print(fields_list)
        csv_output.writerow(fields_list)

    for row in rap2.list_files:
        valid_record, new_record = rap2.BuildRow(row)
        new_record = rap2.ApplyRules(new_record)
        if valid_record:
            new_record = list(new_record.values())
            print(new_record)
            csv_output.writerow(new_record)
    file_csv.close()
else:
    print(rap2.ShowError())
del rap2

Reptar

Reptar allows merge files, including only the necessary lines.

Example: data from the following files:

PCS01.txt:
User,Name,OS,IP
ms123,Mayra Sanz,GNU/Linux,10.226.140.1
lt001,Luis Toribio,GNU/Linux,10.226.140.2
co205,Clara Osto,Win,10.226.140.3

PCS02.txt:
User,Name,OS,IP
nn345,Nadia Pacheco,Win,10.226.140.4
jm401,Juan Madrid,GNU/Linux,10.226.140.5

… You can create a file with the following information:

Linux.csv:
User,Name,OS,IP
MS124,MAYRA SANZ,GNU/LINUX,10.226.140.1
LT001,LUIS TORIBIO,GNU/LINUX,10.226.140.2
CO205,CLARA OSTO,WIN,10.226.140.3
JM401,JUAN MADRID,GNU/LINUX,10.226.140.5

In this example, lines that contain the text “Linux” or beginning with the text “co205” are included:

from pysaurio import Reptar
rep1 = Reptar()
rep1.description = 'Get list of Linux users'
rep1.extension = 'txt'
rep1.prefix = 'PCS'
rep1.input_folder = 'txt'
rep1.output_folder = 'txt'
rep1.output_file = 'Linux.csv'
rep1.include_header = '1'
rep1.include_file = '0'
rep1.include_record_num = '0'
rep1.alternate_header = ''
rep1.lines.append(('INCLUDE', 'Linux'))
rep1.lines.append(('INCLUDRE', '^co205'))
rep1.rules.append(('line', 'UPPER'))
rep1.Save("linux.rep")
del rep1

# Opens .rep template and create file with output information

rep2 = Reptar()
rep2.Open('linux.rep')
if rep2.number_errors == 0:
    file_csv = open(rep2.output_file, 'w')
    if rep2.include_header == '1':
        header = rep2.BuildHeader(rep2.list_files[0])
        print(header)
        file_csv.write(header + '\n')

    for row in rep2.list_files:
        current_file = open(rep2.input_folder + row, 'rb')
        while True:
            new_record = current_file.readline()
            new_record = new_record.decode("utf-8", "ignore")
            if not new_record: break
            valid_record, new_record = rep2.BuildRow(new_record, row)
            if valid_record:
                new_record = rep2.ApplyRules(new_record)
                print(new_record)
                file_csv.write(new_record + '\n')
        current_file.close()
    file_csv.close()
else:
    print(rep2.ShowError())
del rep2

Functions available for including and excluding lines:

  • line1 = (‘EXCLUDE’, ‘string’)

  • line1 = (‘INCLUDE’, ‘string’)

  • line1 = (‘EXCLUDRE’, ‘regex’) # See module re

  • line1 = (‘INCLUDRE’, ‘regex’) # See module re

The package contains more examples and data files to test.

Changelog

  • Pysaurio 0.2.0 - Initial release (continued “Pyraptor”).

  • Pysaurio 0.2.1 - Reptar includes rules and the section ‘Lines’ you can use regular expressions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pysaurio-0.2.1.tar.gz (28.0 kB view details)

Uploaded Source

Built Distribution

pysaurio-0.2.1-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file pysaurio-0.2.1.tar.gz.

File metadata

  • Download URL: pysaurio-0.2.1.tar.gz
  • Upload date:
  • Size: 28.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for pysaurio-0.2.1.tar.gz
Algorithm Hash digest
SHA256 f9378009a1ac21a1c65629e40bc8d4ab0a1fb24e9335f4fee88c0807314b6f86
MD5 f8b0a3beafa83fb5439d1c5e162966d5
BLAKE2b-256 ff7667cdc3eb48bfd2c24bdfc176b733a1565b558cffe5cd2ad13fff0ea79b05

See more details on using hashes here.

File details

Details for the file pysaurio-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pysaurio-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e117a14c1896b2e63b9bc63469c0abd0015c3d128a95b04a801501db3f382d38
MD5 a9d8ab0b8b847ab0d5d3caa35859401e
BLAKE2b-256 4edfab67a4eeddb87dc45c4e137bed07a36a9341809e6e265bdb12ba1147fd03

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page