Skip to main content

Python library providing tools to search files in parallel.

Project description

Searchkit

Python library providing tools to search files in parallel.

Search Types

Different types of search are supported. Add one or more search definition to a FileSearcher object, registering them against a file, directory or glob path. Results are collected and returned as a SearchResultsCollection which provides different ways to retrieve results.

Simple Search

The SearchDef class supports matching one or more patterns against each line in a file. Patterns are executed until the first match is found.

When defining a search, you can optionally specify field names so that result values can be retrieved by name rather than index e.g. for the following content:

    PID TTY          TIME CMD
 111024 pts/4    00:00:00 bash
 111031 pts/4    00:00:00 ps

You can define as search as follows:

SearchDef(r'.*(\S+)\s+(\S+)\s+(\S+)\s+(\S+)')

and retrieve results with:

for r in results:
    pid = r.get(1)
    tty = r.get(2)
    time = r.get(3)
    cmd = r.get(4)

or alternatively:

for r in results:
    pid, tty, time, cmd = r

or you can provide field names and types:

fields = ResultFieldInfo({'PID': int, 'TTY': str, 'TIME': str, 'CMD': str})
SearchDef(r'.*(\S+)\s+(\S+)\s+(\S+)\s+(\S+)', field_info=fields)

and retrieve results with:

for r in results:
    pid = r.PID
    tty = r.TTY
    time = r.TIME
    cmd = r.CMD

Sequence Search

The SequenceSearchDef class supports matching string sequences ("sections") over multiple lines by matching a start, end and optional body in between. These section components are each defined with their own SearchDef object.

Search Constraints

If searching e.g. a log file where each line starts with a timestamp and you only want results that match after a specific time then you can use search.constraints.SearchConstraintSearchSince and apply to either the whole file or each line in turn. The latter allows constraints to be associated with a SearchDef and therefore only apply within the context of that search.

Installation

searchkit is packaged in pypi and can be installed as follows:

sudo apt install python3-pip
pip install searchkit

Example Usage

An example simple search is as follows:

from searchkit import FileSearcher, SearchDef

fname = 'foo.txt'
open(fname, 'w').write('the quick brown fox')
fs = FileSearcher()
fs.add(SearchDef(r'.+ \S+ (\S+) .+'), fname)
results = fs.run()
for r in results.find_by_path(fname):
    print(r.get(1))

An example sequence search is as follows:

from searchkit import FileSearcher, SequenceSearchDef, SearchDef

content = """
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'foo'"""

fname = 'my.log'
open(fname, 'w').write(content)

start = SearchDef(r'Traceback')
body = SearchDef(r'.+')
# terminate sequence with start of next or EOF so no end def needed.

fs = FileSearcher()
fs.add(SequenceSearchDef(start, tag='myseq', body=body), fname)
results = fs.run()
for seq, results in results.find_sequence_by_tag('myseq').items():
    for r in results:
        if 'body' in r.tag:
            print(r.get(0))

An example search with constraints is as follows:

from searchkit import FileSearcher, SearchDef
from searchkit.constraints import SearchConstraintSearchSince, TimestampMatcherBase

class MyDateTimeMatcher(TimestampMatcherBase):
    @property
    def patterns(self):
        return [r'^(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2}) '
                r'(?P<hours>\d{2}):(?P<minutes>\d{2}):(?P<seconds>\d{2})']

fname = 'foo.txt'
with open(fname, 'w') as fd:
  fd.write('2023-01-01 12:34:24 feeling cold\n')
  fd.write('2023-06-01 12:34:24 feeling hot')

today = '2023-06-02 12:34:24'
constraint = SearchConstraintSearchSince(today, None,
                                         ts_matcher_cls=MyDateTimeMatcher)
fs = FileSearcher(constraint=constraint)
fs.add(SearchDef(r'\S+ \S+ \S+ (\S+)'), fname)
results = fs.run()
for r in results.find_by_path(fname):
    print(r.get(1) == 'hot')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

searchkit-0.4.1.post15.tar.gz (45.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

searchkit-0.4.1.post15-py3-none-any.whl (48.2 kB view details)

Uploaded Python 3

File details

Details for the file searchkit-0.4.1.post15.tar.gz.

File metadata

  • Download URL: searchkit-0.4.1.post15.tar.gz
  • Upload date:
  • Size: 45.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for searchkit-0.4.1.post15.tar.gz
Algorithm Hash digest
SHA256 b52e82ab609e05ee15ff1ee994fbd3b0f70e25afb0de6eed54b46e86821b4acd
MD5 7aba36b7fbb6ef52fc8aefe9ada004a2
BLAKE2b-256 9018e450b05fc7465ffaf6dcf3063d9000beff534b9947a8a4000cd09118732a

See more details on using hashes here.

File details

Details for the file searchkit-0.4.1.post15-py3-none-any.whl.

File metadata

File hashes

Hashes for searchkit-0.4.1.post15-py3-none-any.whl
Algorithm Hash digest
SHA256 34e9af676231fce311d41b16b449d3ed0806ff4a4f5009ac5b8356d81d477121
MD5 5dacb3a90b36d3cfa5f52d1a359de5dc
BLAKE2b-256 f0f8826c772bd0898c5f5123c358720094aafa27fa2659efdbf57fde551946e6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page