Skip to main content

Library to search for files with content that most closely match the lines of a reference string

Project description

busca

Build PyPI version

busca logo

CLI and library to search for files with content that most closely match the lines of a reference string.

https://user-images.githubusercontent.com/49008873/235590754-efdeb134-feb1-44ec-bbac-44ccb737261a.mov

Table of Contents

Python Library

🐍 The Python library is renamed to busca_py due to a name conflict with an existing (possibly abandoned) project.

pip install busca_py
import busca_py as busca


reference_file_path = "./sample_dir_hello_world/file_1.py"
with open(reference_file_path, "r") as file:
    reference_string = file.read()

# Perform search with required parameters
all_file_matches = busca.search_for_lines(reference_string=reference_string, search_path="./sample_dir_hello_world")

# File matches are returned in descending order of percent match
closest_file_match = all_file_matches[0]
assert closest_file_match.path == reference_file_path
assert closest_file_match.percent_match == 1.0
assert closest_file_match.lines == reference_string

# Perform search for top 5 matches with additional filters to speed up runtime by skipping files that will not match
relevant_file_matches = busca.search_for_lines(
    reference_string=reference_string,
    search_path="./sample_dir_hello_world",
    max_lines=10_000,
    include_globs=["*.py"],
    count=5,
)

assert len(relevant_file_matches) < len(all_file_matches)

# Create new file match object
new_file_match = busca.FileMatch("file/path", 1.0, "file\ncontent")

Command Line Interface

CLI Usage

🧑‍💻️ To see usage documentation, run

busca -h

Output for v2.1.1

Simple utility to search for files with content that most closely match the lines of a reference string

Usage: busca --ref-file-path <REF_FILE_PATH> [OPTIONS]
       <SomeCommand> | busca [OPTIONS]

Options:
  -r, --ref-file-path <REF_FILE_PATH>  Local or absolute path to the reference comparison file. Overrides any piped input
  -s, --search-path <SEARCH_PATH>      Directory or file in which to search. Defaults to CWD
  -m, --max-lines <MAX_LINES>          The number of lines to consider when comparing files. Files with more lines will be skipped [default: 10000]
  -i, --include-glob <INCLUDE_GLOB>    Globs that qualify a file for comparison
  -x, --exclude-glob <EXCLUDE_GLOB>    Globs that disqualify a file from comparison
  -c, --count <COUNT>                  Number of results to display [default: 10]
  -h, --help                           Print help
  -V, --version                        Print version

Examples

Find files that most closely match the source file_5.py file in a search directory
 busca --ref-file-path sample_dir_mix/file_5.py --search-path sample_dir_mix

? Select a file to compare:  
  sample_dir_mix/file_5.py                  ++++++++++  100.0%
> sample_dir_mix/file_5v2.py                ++++++++++   97.5%
  sample_dir_mix/nested_dir/file_7.py       ++++         42.3%
  sample_dir_mix/aldras/aldras_settings.py  ++           24.1%
  sample_dir_mix/aldras/aldras_core.py      ++           21.0%
  sample_dir_mix/file_3.py                  +            13.2%
  sample_dir_mix/file_1.py                  +            11.0%
  sample_dir_mix/file_2.py                  +             9.4%
  sample_dir_mix/aldras/aldras_execute.py   +             7.5%
  sample_dir_mix/file_4.py                  +             6.9%
[↑↓ to move, enter to select, type to filter]
Find files that most closely match the source path_to_reference.json file in a search directory
busca --ref-file-path path_to_reference.json --search-path path_to_search_dir
Change search to scan the current working directory
busca --ref-file-path path_to_reference.json
Narrow search to only consider .json files whose paths include the substring "foo" and that contain fewer than 1,000 lines
busca --ref-file-path path_to_reference.json --include-glob '*.json' --include-glob '**foo**' --max-lines 1000
Piped input mode to search the output of a command
# <SomeCommand> | busca [OPTIONS]
echo 'String to find in files.' | busca
MacOS piped input mode

📝 There is an open issue for MacOS in crossterm, one of busca's dependencies, that does not allow prompt interactivity when using piped input. Therefore, when a non interactive mode is detected, the file matches will be displayed but not interactively.

This can be worked around by adding the following aliases to your shell .bashrc or .zshrc file:

# Wrap commands for busca search
busca_cmd_output() {
    eval "$* > /tmp/busca_search.tmp" && busca -r /tmp/busca_search.tmp
}

One-liners to add the wrapper function:

Shell Command
Bash echo -e 'busca_cmd_output() {\n\teval "$* > /tmp/busca_search.tmp" && busca -r /tmp/busca_search.tmp\n}' >> ~/.bashrc
Zsh echo -e 'busca_cmd_output() {\n\teval "$* > /tmp/busca_search.tmp" && busca -r /tmp/busca_search.tmp\n}' >> ~/.zshrc

Reload your shell for the function to become available:

# busca_cmd_output <SomeCommand>
busca_cmd_output echo 'String to find in files.'

CLI Installation

Mac OS

Homebrew
brew tap noahbaculi/busca
brew install busca

To update, run

brew update
brew upgrade busca

All platforms (Windows, MacOS, Linux)

Compile from source
  1. Install Rust using rustup.

  2. Clone this repo.

  3. In the root of this repo, run

    cargo build --release
    
  4. Add to path. For example, by copying the compiled binary to your local bin directory.

    cp target/release/busca $HOME/bin/
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

busca_py-2.1.0.tar.gz (6.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

busca_py-2.1.0-cp39-cp39-macosx_11_0_arm64.whl (315.1 kB view details)

Uploaded CPython 3.9macOS 11.0+ ARM64

File details

Details for the file busca_py-2.1.0.tar.gz.

File metadata

  • Download URL: busca_py-2.1.0.tar.gz
  • Upload date:
  • Size: 6.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.2.3

File hashes

Hashes for busca_py-2.1.0.tar.gz
Algorithm Hash digest
SHA256 496c4fc62abd18e7cff25de937393aa2796e5623b2194e624c68376f3dca4faa
MD5 2c2db34476b1bd20ccb5d9e6334264c5
BLAKE2b-256 a2136f31e6ce2d3f7c7232fe8ba92f0f449f85f279b87cff3485c6d04364705b

See more details on using hashes here.

File details

Details for the file busca_py-2.1.0-cp39-cp39-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for busca_py-2.1.0-cp39-cp39-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 5f0bf2db9116a96b0cbb41f44d153371804dfef66ea752e8e24f4137b7d8eeb6
MD5 b32f42434f84c75e5088399063618549
BLAKE2b-256 d076f860497111e2960c2ba5c65a4417112e5ca4beaf9fd59292a7eb2c840496

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page