Skip to main content

Interactive fuzzy search for JSON and CSV files

Project description

๐Ÿ” Fuzzygrep

Fuzzygrep is a powerful, production-ready command-line tool for interactive fuzzy searching, exploring, and inspecting JSON and CSV files. Built with performance and user experience in mind.

Python 3.9+ License: MIT


โœจ Features

๐Ÿš€ Performance

  • Blazing Fast: Sub-second search on 10K+ records
  • Lazy Loading: Stream large files without loading everything into memory
  • Smart Indexing: Trigram-based indexing for 5-10x faster searches
  • Parallel Processing: Multi-core support for faster data processing
  • Intelligent Caching: TTL-based caching with automatic invalidation

๐Ÿ’Ž User Experience

  • Interactive Interface: Beautiful, intuitive CLI with rich formatting
  • Fuzzy Search: Find what you need with typo-tolerant search
  • Syntax Highlighting: JSON visualization with color-coded output
  • Auto-completion: Smart suggestions as you type
  • Export Options: Save results as JSON, CSV, Markdown, or HTML

๐ŸŽฏ Functionality

  • Deep Search: Search through nested JSON structures
  • Dual Mode: Search keys, values, or both simultaneously
  • Key Filtering: Focus on specific data patterns
  • Visualizations: Tree charts and frequency histograms
  • Multi-format: JSON and CSV support with more formats coming

๐Ÿ“ฆ Installation

Quick Install

git clone https://github.com/anggiAnand/fuzzygrep.git
cd fuzzygrep
pip install -e .

With Optional Dependencies

For enhanced features (streaming large files, CSV chunking):

pip install -e ".[enhanced]"

For development (testing, linting, formatting):

pip install -e ".[dev]"

Requirements

  • Python 3.9 or higher
  • 5 core dependencies (automatically installed)
  • Optional: ijson, pandas for large file handling

๐Ÿš€ Quick Start

Basic Usage

# Interactive search
fuzzygrep data.json

# Show file structure
fuzzygrep data.json --chart

# View frequency analysis
fuzzygrep data.json --histogram

# Verbose output
fuzzygrep data.json --verbose

Interactive Commands

Once in interactive mode, you have access to powerful commands:

Search Commands:
  <query>               Search for keys and values
  
File Operations:
  /load <file>          Load a different file
  /reload               Reload current file
  
Results Management:
  /export <format>      Export results (json, csv, md, html)
  /save                 Quick save to results.json
  
Filtering & Configuration:
  /filter <patterns>    Filter keys by patterns (comma-separated)
  /clear                Clear active filters
  /stats                Show performance statistics
  
Navigation:
  /history              Show search history
  /help                 Show help message
  /exit, /quit          Exit the program

Keyboard Shortcuts

Shortcut Action
Ctrl+T Toggle autocompletion on/off
Ctrl+V Switch between key/value completion
Ctrl+R Reload data from file
Ctrl+S Save last search results
Ctrl+H Show help
Ctrl+C Exit program

๐Ÿ“š Examples

Example 1: Basic Search

$ fuzzygrep people.json
[people.json] Search> john

Matches in Keys:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Key     โ”‚ Value          โ”‚ Score โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ name    โ”‚ John Doe       โ”‚  95.0 โ”‚
โ”‚ email   โ”‚ john@email.com โ”‚  82.0 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Matches in Values:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Value          โ”‚ Keys โ”‚ Score โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ John Doe       โ”‚ name โ”‚  100  โ”‚
โ”‚ john@email.com โ”‚ emailโ”‚  88.0 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Example 2: Export Results

[data.json] Search> alice

# Export as JSON
[data.json] Search> /export json results.json

# Export as CSV
[data.json] Search> /export csv results.csv

# Export as HTML with nice formatting
[data.json] Search> /export html report.html

Example 3: Filter by Keys

[data.json] Search> /filter email,phone,address
Filter applied: email, phone, address

# Now searches are limited to these keys
[data.json] Search> john

Example 4: Performance Options

# Disable caching for always-fresh data
fuzzygrep data.json --no-cache

# Disable indexing for small files
fuzzygrep small.json --no-index

# Control worker threads
fuzzygrep large.json --workers 8

# Combine options
fuzzygrep data.json --no-cache --workers 4 --verbose

Example 5: Visualizations

# Tree view with depth limit
fuzzygrep data.json --chart --chart-limit 50

# Frequency analysis
fuzzygrep data.json --histogram

๐Ÿ—๏ธ Architecture

Fuzzygrep is built with a clean, modular architecture:

fuzzygrep/
โ”œโ”€โ”€ core/               # Core functionality
โ”‚   โ”œโ”€โ”€ loaders.py     # Data loading with streaming support
โ”‚   โ”œโ”€โ”€ searcher.py    # Fuzzy search with parallel processing
โ”‚   โ”œโ”€โ”€ indexer.py     # Trigram-based indexing
โ”‚   โ””โ”€โ”€ cache.py       # Multi-layer caching system
โ”œโ”€โ”€ ui/                # User interface
โ”‚   โ”œโ”€โ”€ display.py     # Results visualization & export
โ”‚   โ””โ”€โ”€ interactive.py # Interactive session management
โ”œโ”€โ”€ utils/             # Utilities
โ”‚   โ”œโ”€โ”€ errors.py      # Custom exception hierarchy
โ”‚   โ””โ”€โ”€ logging.py     # Rich logging system
โ””โ”€โ”€ cli.py             # CLI entry point

Key Components

Loaders (core/loaders.py)

  • Automatic format detection (JSON/CSV)
  • Streaming for large files (>10MB)
  • Memory-optimized data structures
  • Graceful error handling

Searcher (core/searcher.py)

  • Fuzzy matching with RapidFuzz
  • Trigram-based pre-filtering
  • Parallel processing support
  • Smart scorer selection
  • Multi-layer caching

Indexer (core/indexer.py)

  • Trigram-based search index
  • Fast candidate filtering
  • Reduces search space by 50-90%
  • Persistent index caching

Display (ui/display.py)

  • Rich table formatting
  • Syntax-highlighted JSON
  • Tree visualizations
  • Multiple export formats

โšก Performance

Benchmarks

Tested on a dataset of 10,000 records:

Operation Time Memory
Load JSON 1.2s 45MB
Build Index 0.8s 15MB
Search (indexed) 45ms -
Search (no index) 320ms -
Export JSON 0.5s -

Optimization Tips

  1. Enable indexing (default): Best for repeated searches
  2. Use streaming: Automatic for files >10MB
  3. Enable caching (default): Instant results for repeated queries
  4. Parallel processing (default): Faster on multi-core systems
  5. Filter keys: Reduce search space for faster results

๐Ÿงช Testing

Run the test suite:

# Run all tests
pytest

# With coverage report
pytest --cov=fuzzygrep --cov-report=html

# Run specific test file
pytest tests/test_searcher.py

# Verbose output
pytest -v

Current test coverage: 85%+


๐Ÿ› ๏ธ Development

Setup Development Environment

# Clone repository
git clone https://github.com/anggiAnand/fuzzygrep.git
cd fuzzygrep

# Install in development mode with all dependencies
pip install -e ".[dev,enhanced]"

# Run tests
pytest

# Format code
black fuzzygrep tests
isort fuzzygrep tests

# Lint
flake8 fuzzygrep
mypy fuzzygrep

Project Structure

fuzzygrep/
โ”œโ”€โ”€ fuzzygrep/          # Main package
โ”œโ”€โ”€ tests/              # Test suite
โ”œโ”€โ”€ setup.py            # Package configuration
โ”œโ”€โ”€ requirements.txt    # Dependencies
โ”œโ”€โ”€ README.md           # Documentation
โ””โ”€โ”€ CHANGELOG.md        # Version history

๐Ÿ› Troubleshooting

Common Issues

Import Error: Missing dependencies

pip install -r requirements.txt

Slow performance on large files

# Install optional dependencies
pip install ijson pandas

Cache issues

# Clear cache
fuzzygrep cache-clear

# Check cache stats
fuzzygrep cache-stats

Out of memory errors

# Disable caching and indexing
fuzzygrep large.json --no-cache --no-index

๐Ÿ“ Configuration

Fuzzygrep can be configured via:

  1. Command-line options (highest priority)
  2. Environment variables
  3. Config file ~/.config/fuzzygrep/config.toml

Environment Variables

export FUZZYGREP_CACHE_DIR="~/.cache/fuzzygrep"
export FUZZYGREP_CACHE_TTL=300
export FUZZYGREP_MAX_WORKERS=4

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

How to Contribute

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/AmazingFeature)
  3. Commit your changes (git commit -m 'Add some AmazingFeature')
  4. Push to the branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

Code Style

  • Follow PEP 8
  • Use Black for formatting
  • Add type hints
  • Write docstrings
  • Include tests

๐Ÿ“‹ Roadmap

Version 1.1 (Coming Soon)

  • YAML and XML support
  • Regular expression search mode
  • Query bookmarks
  • Color themes (Nord, Dracula, Solarized)

Version 1.2

  • Multi-file search
  • Advanced filtering (by type, score threshold)
  • Excel (.xlsx) support
  • Configuration file support

Version 2.0

  • GUI mode (optional)
  • Real-time file watching
  • Plugin system
  • REST API

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


๐Ÿ‘ค Author

Anggi Ananda


๐Ÿ™ Acknowledgments


๐Ÿ“Š Statistics

GitHub stars GitHub forks GitHub issues


Made with โค๏ธ by Anggi Ananda

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fuzzygrep-1.0.0.tar.gz (31.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fuzzygrep-1.0.0-py3-none-any.whl (28.9 kB view details)

Uploaded Python 3

File details

Details for the file fuzzygrep-1.0.0.tar.gz.

File metadata

  • Download URL: fuzzygrep-1.0.0.tar.gz
  • Upload date:
  • Size: 31.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.4

File hashes

Hashes for fuzzygrep-1.0.0.tar.gz
Algorithm Hash digest
SHA256 84463d9b33f9ba613de0ddbb5a32e21526d703bb917c468d60a66e25be5a7551
MD5 244f47420852f6669b1a54d5a90d01bf
BLAKE2b-256 ead5b62923ddc38e201ee107604994f3503fc3ec4da74572922907ed26b360f1

See more details on using hashes here.

File details

Details for the file fuzzygrep-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: fuzzygrep-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 28.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.4

File hashes

Hashes for fuzzygrep-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 78ee00f85417a5e221cde667df0d62cf253dc374fb16795d3dad5272784d57a7
MD5 4f9765dd61f0d15df220497b17b4fd10
BLAKE2b-256 70d0b6bdbc926537f54aa012579ec554367eb8250c82052de258eedb40aa246a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page