Skip to main content

Library to search for files with content that most closely match the lines of a reference string

Project description

busca

CICD PyPI version

See CHANGELOG.md for release history.

busca logo

CLI and library to search for files with content that most closely match the lines of a reference string.

Animated demo: busca ranks files by similarity and shows a colored line-diff

Table of contents

Python library

🐍 The Python library is renamed to busca_py due to a name conflict with an existing (possibly abandoned) project.

pip install busca_py
from pathlib import Path
import busca_py as busca


reference_file_path = "./sample_dir_hello_world/file_1.py"
with open(reference_file_path, "r") as file:
    reference_string = file.read()

# Perform a search with required parameters
all_file_comparisons = busca.search(
    reference_string=reference_string,
    search_path="./sample_dir_hello_world",
)

# Comparisons are returned in descending order of similarity_ratio
closest_file_comparison = all_file_comparisons[0]
assert closest_file_comparison.path == Path(reference_file_path)
assert closest_file_comparison.similarity_ratio == 1.0
assert closest_file_comparison.content == reference_string

# Perform a search for the top 5 comparisons with additional filters
# to speed up runtime by skipping files that will not match
relevant_file_comparisons = busca.search(
    reference_string=reference_string,
    search_path="./sample_dir_hello_world",
    max_file_lines=10_000,
    include_glob=["*.py"],
    count=5,
)

assert len(relevant_file_comparisons) < len(all_file_comparisons)

# Perform a search that drops candidates below a similarity floor
strong_file_comparisons = busca.search(
    reference_string=reference_string,
    search_path="./sample_dir_hello_world",
    include_glob=["*.py"],
    min_similarity_ratio=0.5,
)

assert all(fc.similarity_ratio >= 0.5 for fc in strong_file_comparisons)

# Create a new FileComparison object
new_file_comparison = busca.FileComparison("file/path", 1.0, "file\ncontent")

Command line interface

CLI usage

🧑‍💻️ To see usage documentation, run

busca -h

Output for v3.0.0

Simple utility to search for files with content that most closely match the lines of a reference string

Usage: busca --ref-file-path <REF_FILE_PATH> [OPTIONS]
       <SomeCommand> | busca [OPTIONS]

Options:
  -r, --ref-file-path <REF_FILE_PATH>
          Local or absolute path to the reference comparison file. Overrides any piped input
  -s, --search-path <SEARCH_PATH>
          Directory or file in which to search. Defaults to CWD
  -m, --max-file-lines <MAX_FILE_LINES>
          The maximum number of lines a candidate file may have. Candidates with more lines (or zero lines) are skipped entirely [default: 10000]
  -i, --include-glob <INCLUDE_GLOB>
          Globs that qualify a file for comparison
  -x, --exclude-glob <EXCLUDE_GLOB>
          Globs that disqualify a file from comparison
  -c, --count <COUNT>
          Number of results to display [default: 10]
      --min-similarity-ratio <MIN_SIMILARITY_RATIO>
          Drop comparisons whose similarity ratio is below this value (in [0.0, 1.0]). Applied during the search, before the --count limit
      --format <FORMAT>
          Output format for the ranked results [default: human] [possible values: human, json]
      --with-content
          Include each file's content in JSON output. Ignored for the human format
      --no-interactive
          Print the ranked list instead of launching the interactive picker
  -h, --help
          Print help
  -V, --version
          Print version

Examples

Find files that most closely match the source file_5.py file in a search directory
 busca --ref-file-path sample_dir_mix/file_5.py --search-path sample_dir_mix

? Select a file to compare:  
  sample_dir_mix/file_5.py                  ++++++++++  100.0%
> sample_dir_mix/file_5v2.py                ++++++++++   97.2%
  sample_dir_mix/nested_dir/file_7.py       +++++        45.8%
  sample_dir_mix/aldras/aldras_core.py      ++           21.7%
  sample_dir_mix/aldras/aldras_settings.py  ++           21.2%
  sample_dir_mix/file_3.py                  ++           16.8%
  sample_dir_mix/file_1.py                  +            14.1%
  sample_dir_mix/file_2.py                  +            13.7%
  sample_dir_mix/aldras/aldras_execute.py   +            11.9%
  sample_dir_mix/file_4.py                  +             9.0%
[↑↓ to move, enter to select, type to filter]
Find files that most closely match the source path_to_reference.json file in a search directory
busca --ref-file-path path_to_reference.json --search-path path_to_search_dir
Change search to scan the current working directory
busca --ref-file-path path_to_reference.json
Narrow search to files under 1,000 lines that are either .json files or match the glob **/*foo*
busca --ref-file-path path_to_reference.json --include-glob '*.json' --include-glob '**/*foo*' --max-file-lines 1000
Piped input mode to search the output of a command
# <SomeCommand> | busca [OPTIONS]
echo 'String to find in files.' | busca
macOS piped input mode

📝 crossterm, one of busca's dependencies, has an open issue on macOS that blocks prompt interactivity with piped input. When busca detects a non-interactive mode, it prints the file comparisons without the interactive picker.

This can be worked around by adding the following aliases to your shell .bashrc or .zshrc file:

# Wrap commands for busca search
busca_cmd_output() {
    eval "$* > /tmp/busca_search.tmp" && busca -r /tmp/busca_search.tmp
}

One-liners to add the wrapper function:

Shell Command
Bash echo -e 'busca_cmd_output() {\n\teval "$* > /tmp/busca_search.tmp" && busca -r /tmp/busca_search.tmp\n}' >> ~/.bashrc
Zsh echo -e 'busca_cmd_output() {\n\teval "$* > /tmp/busca_search.tmp" && busca -r /tmp/busca_search.tmp\n}' >> ~/.zshrc

Reload your shell for the function to become available:

# busca_cmd_output <SomeCommand>
busca_cmd_output echo 'String to find in files.'
Structured output for scripts
busca --ref-file-path path_to_reference.py --include-glob '*.py' --format json
[
  {
    "path": "src/file_5.py",
    "similarity_ratio": 1.0
  },
  {
    "path": "src/file_5v2.py",
    "similarity_ratio": 0.8888889
  }
]

Add --with-content to include each file's body. --format json is always non-interactive; for the human grid without the picker, use --no-interactive.

busca uses these exit codes so scripts can branch on the result:

Exit code Meaning
0 At least one comparison survived --min-similarity-ratio and --count
1 No comparisons matched
2 An error occurred (bad glob, missing search path, unreadable reference)

On an empty result busca writes nothing to stdout, prints No files found to stderr, and exits 1, so scripts should branch on the exit code rather than parse stdout for an empty array.

Versioning

  • Rust MSRV: 1.85 (enforced via Cargo.toml rust-version).
  • Python: 3.11 or later.
  • Semver: breaking changes ship on major version bumps. The Rust public surface covered by semver is Args, FileComparison, Error, run_search, run_search_with_progress, get_similarity_ratio, and format_file_comparisons. Items not in this list are implementation details and may change in any release.
  • Python public surface: busca_py.search and busca_py.FileComparison as declared in busca_py.pyi.

Migrating from 2.x to 3.x

Python callers should rename the kwargs and the result type:

# 2.x
results = busca_py.search_for_lines(
    reference_string=ref,
    search_path="./src",
    max_lines=10_000,
    include_globs=["*.py"],
    exclude_globs=["*.yml"],
)
top = results[0]
top.percent_match  # float
top.lines          # str

# 3.x
results = busca_py.search(
    reference_string=ref,
    search_path="./src",
    max_file_lines=10_000,
    include_glob=["*.py"],   # also accepts a bare string
    exclude_glob=["*.yml"],
)
top = results[0]
top.similarity_ratio  # float, see ADR-0001 for the metric change
top.content           # str

See CHANGELOG.md for the full rename table and the Rust-side migration notes.

CLI installation

macOS

Homebrew
brew tap noahbaculi/busca
brew install busca

To update, run

brew update
brew upgrade busca

All platforms (Windows, macOS, Linux)

Compile from source
  1. Install Rust using rustup.

  2. Clone this repo.

  3. In the root of this repo, run

    cargo build --release
    
  4. Add to path. For example, by copying the compiled binary to your local bin directory.

    cp target/release/busca $HOME/bin/
    

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

busca_py-3.0.0.tar.gz (3.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

busca_py-3.0.0-cp314-cp314-macosx_11_0_arm64.whl (390.3 kB view details)

Uploaded CPython 3.14macOS 11.0+ ARM64

File details

Details for the file busca_py-3.0.0.tar.gz.

File metadata

  • Download URL: busca_py-3.0.0.tar.gz
  • Upload date:
  • Size: 3.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.3

File hashes

Hashes for busca_py-3.0.0.tar.gz
Algorithm Hash digest
SHA256 c114b29e389568bfa69ad905ace7d3242825240ed8ebeef3350c04c6b8ef4eb4
MD5 4183ed0daad6c42fcd834e68265986ec
BLAKE2b-256 c888ca089e310ad45e2197dee5db9dd23ddd2e117ac7db4f30290c40f09dc0f9

See more details on using hashes here.

File details

Details for the file busca_py-3.0.0-cp314-cp314-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for busca_py-3.0.0-cp314-cp314-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 9347f927d9e2267c0eb6b124be1250cf8f3c633443698fab91ce54dcf8cd1852
MD5 36427c0e7625ef6436dad559451e818d
BLAKE2b-256 2bc6d14c32593fb57a29e78cab2cafffe0840ad89e7356b7286c8641463722a4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page