High-performance file search and analysis tool powered by ripgrep
Project description
RX (Regex Tracer)
A high-performance tool for searching and analyzing large files, powered by ripgrep.
Designed for large files.
RX is optimized for processing multi-GB files efficiently through parallel chunking and streaming.
If you need to process many files repeatedly, use the API server (rx serve) instead of running CLI commands in a loop. The server mode avoids Python startup overhead on each invocation.
Key Features
- Byte-Offset Based: Returns precise byte offsets for efficient large file processing (line-based indexing available)
- Parallel Processing: Automatic chunking and parallel execution for large files
- Samples output: Can show arbitrary parts of text files with context when you found interested offsets
- REST API Server: All CLI features available via async HTTP API
- File Analysis: Extract metadata, statistics, and metrics from files
- Regex Complexity Analysis: Detect ReDoS vulnerabilities before production use
- Security Sandbox: Restrict file access to specific directories in server mode
Prerequisites
ripgrep must be installed:
- macOS:
brew install ripgrep - Ubuntu/Debian:
apt install ripgrep - Windows:
choco install ripgrep
Installation
Option 1: Install from PyPI (Recommended)
# Requires Python 3.13+
pip install rx-tool
# Now use the rx command
rx /var/log/app.log "error.*"
rx --version
Note: Requires ripgrep to be installed separately (see Prerequisites).
Option 2: Development with uv
uv sync
uv run rx /var/log/app.log "error.*"
Option 3: Standalone Binary
./build.sh
./dist/rx /var/log/app.log "error.*"
Quick Start
Basic Examples
# Search a file (returns byte offsets)
rx /var/log/app.log "error.*"
# Search a directory
rx /var/log/ "error.*"
# Show context lines
rx /var/log/app.log "error" --samples --context=3
# Analyze file metadata
rx analyse /var/log/app.log
# Check regex complexity
rx check "(a+)+"
# Start API server
rx serve --port=8000
Why Byte Offsets?
RX returns byte offsets instead of line numbers for efficiency. Seeking to byte position is O(1), while counting lines is O(n). For large files, this matters significantly.
Need line numbers? Use the indexing feature:
# Create index for a large file
rx index /var/log/huge.log
# Now you can use line-based operations
rx samples /var/log/huge.log -l 1000,2000,3000 --context=5
The index enables fast line-to-offset conversion for files >50MB.
Server Mode (Recommended for Repeated Operations)
The CLI spawns a Python interpreter on each invocation. For processing multiple files or repeated operations, use the API server:
# Start server
rx serve --port=8000
# Use HTTP API (same endpoints as CLI)
curl "http://localhost:8000/v1/trace?path=/var/log/app.log®exp=error"
curl "http://localhost:8000/v1/analyse?path=/var/log/"
Benefits:
- No Python startup overhead per request
- Async processing with configurable workers
- Webhook support for event notifications
- Security sandbox with
--search-root
Security Sandbox
Restrict file access in server mode:
# Only allow access to /var/log
rx serve --search-root=/var/log
# Attempts to access other paths return 403 Forbidden
curl "http://localhost:8000/v1/trace?path=/etc/passwd®exp=root"
# => 403 Forbidden
Prevents directory traversal (../) and symlink escape attacks.
CLI Commands
rx (search)
Search files for regex patterns.
rx /var/log/app.log "error.*" # Basic search
rx /var/log/ "error.*" # Search directory
rx /var/log/app.log "error" --samples # Show context lines
rx /var/log/app.log "error" -i # Case-insensitive (ripgrep flags work)
rx /var/log/app.log "error" --json # JSON output
rx analyse
Extract file metadata and statistics.
rx analyse /var/log/app.log # Single file
rx analyse /var/log/ # Directory
rx analyse /var/log/ --max-workers=20 # Parallel processing
rx check
Analyze regex complexity and detect ReDoS vulnerabilities.
rx check "(a+)+" # Returns risk level and fixes
rx index
Create line-offset index for large files.
rx index /var/log/huge.log # Create index
rx index /var/log/huge.log --info # Show index info
rx samples
Extract context lines around byte offsets or line numbers.
rx samples /var/log/app.log -b 12345,67890 --context=3 # Byte offsets
rx samples /var/log/app.log -l 100,200 --context=5 # Line numbers (requires index)
rx serve
Start REST API server.
rx serve # Start on localhost:8000
rx serve --host=0.0.0.0 --port=8080 # Custom host/port
rx serve --search-root=/var/log # Restrict to directory
API Endpoints
Once the server is running, visit http://localhost:8000/docs for interactive API documentation.
Main Endpoints:
GET /v1/trace- Search files for patternsGET /v1/analyse- File metadata and statisticsGET /v1/complexity- Regex complexity analysisGET /v1/samples- Extract context linesGET /health- Server health and configuration
Example:
# Search
curl "http://localhost:8000/v1/trace?path=/var/log/app.log®exp=error&max_results=10"
# Analyse
curl "http://localhost:8000/v1/analyse?path=/var/log/"
# With webhooks
curl "http://localhost:8000/v1/trace?path=/var/log/app.log®exp=error&hook_on_complete=https://example.com/webhook"
Configuration
Environment Variables
| Variable | Description | Default |
|---|---|---|
RX_WORKERS |
Worker processes for server | 1 |
RX_LOG_LEVEL |
Log level (DEBUG, INFO, WARNING, ERROR) | INFO |
RX_MAX_SUBPROCESSES |
Max parallel workers for file processing | 20 |
RX_MIN_CHUNK_SIZE_MB |
Min chunk size for splitting files | 20 |
Server Configuration
# Production example (8-core, 16GB RAM)
RX_WORKERS=17 \
RX_LIMIT_CONCURRENCY=500 \
RX_LIMIT_MAX_REQUESTS=10000 \
rx serve --host=0.0.0.0 --port=8000 --search-root=/data
# Container/Kubernetes (1 worker per pod, scale with replicas)
RX_WORKERS=1 rx serve --host=0.0.0.0 --port=8000
Roadmap
- Gzip support: Process
.gzfiles without manual decompression (planned) - Additional formats: Support for more compressed formats
- Streaming API: WebSocket endpoint for real-time results
Development
# Run tests
uv run pytest -v
# Run with coverage
uv run pytest --cov=rx --cov-report=html
# Build binary
uv sync --group build
./build.sh
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rx_tool-1.1.0.tar.gz.
File metadata
- Download URL: rx_tool-1.1.0.tar.gz
- Upload date:
- Size: 141.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0feeac35e19a4770c09ac045ead47ee92064eb7c00008863ca769cc544227865
|
|
| MD5 |
a23c3d43ff60db09b3397da9fab36fd0
|
|
| BLAKE2b-256 |
028711782df4291ac15681196150b1d02dc68cc107e14735d2530b8068546c63
|
Provenance
The following attestation bundles were made for rx_tool-1.1.0.tar.gz:
Publisher:
pypi-release.yml on wlame/rx
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rx_tool-1.1.0.tar.gz -
Subject digest:
0feeac35e19a4770c09ac045ead47ee92064eb7c00008863ca769cc544227865 - Sigstore transparency entry: 739272285
- Sigstore integration time:
-
Permalink:
wlame/rx@b4dad355f7ea3f8f921d3bbc6ab0d73210cc1d00 -
Branch / Tag:
refs/tags/v1.1.1 - Owner: https://github.com/wlame
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-release.yml@b4dad355f7ea3f8f921d3bbc6ab0d73210cc1d00 -
Trigger Event:
push
-
Statement type:
File details
Details for the file rx_tool-1.1.0-py3-none-any.whl.
File metadata
- Download URL: rx_tool-1.1.0-py3-none-any.whl
- Upload date:
- Size: 87.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
638eb9ca874b60593d284287f2727b66c8762f9f4e08849effafa2e39b3f361a
|
|
| MD5 |
07c13edafea63215d8e348c4ee40703c
|
|
| BLAKE2b-256 |
349609073add008586007cb23420a370195b18db60349d072d3fc43a10d990df
|
Provenance
The following attestation bundles were made for rx_tool-1.1.0-py3-none-any.whl:
Publisher:
pypi-release.yml on wlame/rx
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
rx_tool-1.1.0-py3-none-any.whl -
Subject digest:
638eb9ca874b60593d284287f2727b66c8762f9f4e08849effafa2e39b3f361a - Sigstore transparency entry: 739272289
- Sigstore integration time:
-
Permalink:
wlame/rx@b4dad355f7ea3f8f921d3bbc6ab0d73210cc1d00 -
Branch / Tag:
refs/tags/v1.1.1 - Owner: https://github.com/wlame
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-release.yml@b4dad355f7ea3f8f921d3bbc6ab0d73210cc1d00 -
Trigger Event:
push
-
Statement type: