Binary Search in plaintext and Gzip files.
Project description
BREP : Binary Search in plaintext and gzip files
Search large files in O(log n) time using binary search.
We support plaintext and Gzipped files.
Benchmark : 8x faster than grep
on a 2GB dataset !
brep
is usually faster than grep
for >1GB datasets.
Check tests/benchmark.py
to reproduce the results.
grep ^777 test.txt : 1.594 s (15 runs)
brep 777 test.txt : 206.8 ms (15 runs)
Installation
pip install brep
or pip install .
from this repo
Index your file
In order to conduct binary search, your file needs to be sorted.
We recommend GNU sort
, as it's multithreaded and supports large files.
LC_ALL=C sort -u -o output_file input_file
BREP supports compressed file in the GZIP format.
We recommend pigz
for quick multicore compression :
pigz file
Usage
Provide 1 prefix search term and 1 filepath
brep 77777 test/large.gz
You can also search from our Python class
from brep import Search
for result in Search("77777", "test/large.gz"):
print(result)
Contribute
PRs are welcome!
Install dev dependencies: pip install -e .[dev]
Test and lint before submitting: pytest && flake8
Todo
- Reimplement in Rust
- Faster gz size estimation
- Search multiple strings at once
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file brep-1.0.1.tar.gz
.
File metadata
- Download URL: brep-1.0.1.tar.gz
- Upload date:
- Size: 2.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.5.0 importlib_metadata/4.0.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.9.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 95fdea5f1dddfe328134fd9dc06a9593e3760dabb7b1b0c324ce4b6827a99a67 |
|
MD5 | f6e2b30929d3bd43535327802680db00 |
|
BLAKE2b-256 | 7d4314e85c52614e6601968969d2a7e590586982942d2c34e4e8757dbd1eec1a |