Python library providing tools to search files in parallel.
Project description
Searchkit
Python library providing tools to search files in parallel.
Search Types
Different types of search are supported. Add one or more search definition to a FileSearcher
object, registering them against a file, directory or glob path. Results are collected and returned as a SearchResultsCollection
which provides different ways to retrieve results.
Simple Search
The SearchDef
class supports matching one or more patterns against each line in a file. Patterns are executed until the first match is found.
When defining a search, you can optionally specify field names so that result values can be retrieved by name rather than index e.g. for the following content:
PID TTY TIME CMD
111024 pts/4 00:00:00 bash
111031 pts/4 00:00:00 ps
You can define as search as follows:
SearchDef(r'.*(\S+)\s+(\S+)\s+(\S+)\s+(\S+)')
and retrieve results with:
for r in results:
pid = r.get(1)
tty = r.get(2)
time = r.get(3)
cmd = r.get(4)
or alternatively:
for r in results:
pid, tty, time, cmd = r
or you can provide field names and types:
fields = ResultFieldInfo({'PID': int, 'TTY': str, 'TIME': str, 'CMD': str})
SearchDef(r'.*(\S+)\s+(\S+)\s+(\S+)\s+(\S+)', field_info=fields)
and retrieve results with:
for r in results:
pid = r.PID
tty = r.TTY
time = r.TIME
cmd = r.CMD
Sequence Search
The SequenceSearchDef
class supports matching string sequences ("sections") over multiple lines by matching a start, end and optional body in between. These section components are each defined with their own SearchDef
object.
Search Constraints
If searching e.g. a log file where each line starts with a timestamp and you only want results that match after a specific time then you can use search.constraints.SearchConstraintSearchSince
and apply to either the whole file or each line in turn. The latter allows constraints to be associated with a SearchDef and therefore only apply within the context of that search.
Installation
searchkit is packaged in pypi and can be installed as follows:
sudo apt install python3-pip
pip install searchkit
Example Usage
An example simple search is as follows:
from searchkit import FileSearcher, SearchDef
fname = 'foo.txt'
open(fname, 'w').write('the quick brown fox')
fs = FileSearcher()
fs.add(SearchDef(r'.+ \S+ (\S+) .+'), fname)
results = fs.run()
for r in results.find_by_path(fname):
print(r.get(1))
An example sequence search is as follows:
from searchkit import FileSearcher, SequenceSearchDef, SearchDef
content = """
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: invalid literal for int() with base 10: 'foo'"""
fname = 'my.log'
open(fname, 'w').write(content)
start = SearchDef(r'Traceback')
body = SearchDef(r'.+')
# terminate sequence with start of next or EOF so no end def needed.
fs = FileSearcher()
fs.add(SequenceSearchDef(start, tag='myseq', body=body), fname)
results = fs.run()
for seq, results in results.find_sequence_by_tag('myseq').items():
for r in results:
if 'body' in r.tag:
print(r.get(0))
An example search with constraints is as follows:
from searchkit import FileSearcher, SearchDef
from searchkit.constraints import SearchConstraintSearchSince, TimestampMatcherBase
class MyDateTimeMatcher(TimestampMatcherBase):
@property
def patterns(self):
return [r'^(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2}) '
r'(?P<hours>\d{2}):(?P<minutes>\d{2}):(?P<seconds>\d{2})']
fname = 'foo.txt'
with open(fname, 'w') as fd:
fd.write('2023-01-01 12:34:24 feeling cold\n')
fd.write('2023-06-01 12:34:24 feeling hot')
today = '2023-06-02 12:34:24'
constraint = SearchConstraintSearchSince(today, None,
ts_matcher_cls=MyDateTimeMatcher)
fs = FileSearcher(constraint=constraint)
fs.add(SearchDef(r'\S+ \S+ \S+ (\S+)'), fname)
results = fs.run()
for r in results.find_by_path(fname):
print(r.get(1) == 'hot')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for searchkit-0.3.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3448492261ecabeab92c98f4dae320c25697163fa48b5489da03140b2adc5b69 |
|
MD5 | 52290bcdaef115d31c1b408b7872ede1 |
|
BLAKE2b-256 | 7b887d987b039c5311ce6071377115aee4858e1522bfb4c861db0c654a13b1d2 |