Skip to main content

Index and search Bikeshed (.bs) source documents from URLs with efficient change detection

Project description

search-bs

Index and search Bikeshed (.bs) source documents from URLs with efficient change detection.

Features

  • Efficient change detection: Only re-indexes when content actually changes
    • Uses HTTP conditional requests (ETag, Last-Modified)
    • Falls back to SHA256 content hashing
  • Full-text search: Powered by SQLite FTS5 with BM25 ranking
  • Context-aware search: Show N lines around each match with --around flag
  • Exact line retrieval: Pull specific line ranges for agent consumption
  • Batch indexing: Index multiple documents from a config file with --all
  • Markdown context: Tracks current heading for each search result
  • GitHub integration: Automatically converts GitHub blob URLs to raw URLs
  • JSON output: Machine-readable output for all commands

Installation

pip install search-bikeshed

Or install from source:

git clone https://github.com/tarekziade/search-bikeshed
cd search-bs
pip install -e .

Usage

Index a document

# Index from a URL
search-bs index https://github.com/webmachinelearning/webnn/blob/main/index.bs --name webnn

# GitHub blob URLs are automatically converted to raw URLs
search-bs index https://raw.githubusercontent.com/w3c/webrtc-pc/main/webrtc.bs --name webrtc

# Index all documents from config file
search-bs index --all

Batch indexing with config file

Create a config file at ~/.config/search-bs/sources.json:

[
  {
    "name": "webnn",
    "url": "https://github.com/webmachinelearning/webnn/blob/main/index.bs"
  },
  {
    "name": "webrtc",
    "url": "https://raw.githubusercontent.com/w3c/webrtc-pc/main/webrtc.bs"
  }
]

Then index all at once:

search-bs index --all

Search indexed documents

# Basic search
search-bs search --name webnn "MLTensor"

# Phrase search
search-bs search --name webnn "graph builder"

# Search with context lines (show 3 lines around each match)
search-bs search --name webnn "MLContext" --around 3

# With JSON output
search-bs search --name webnn "MLContext" --json

# Show URLs in results
search-bs search --name webnn "operator" --show-url --max-results 10

Get exact line ranges

Retrieve specific line ranges from indexed documents (useful for agents):

# Get 40 lines starting from line 1234
search-bs get --name webnn --line 1234 --count 40

# JSON output
search-bs get --name webnn --line 1234 --count 40 --json

List indexed documents

# Human-readable list
search-bs docs

# JSON output
search-bs docs --json

How it works

  1. Indexing: Fetches .bs files from URLs and indexes them line-by-line with heading context
  2. Change detection: Uses HTTP conditional requests and content hashing to skip unchanged documents
  3. Search: Uses SQLite FTS5 full-text search with BM25 ranking for relevance

Data storage

By default, the index is stored in ~/.cache/search-bs/search-bs.sqlite3

You can override this location with the BIKESEARCH_HOME environment variable:

export BIKESEARCH_HOME=/custom/path
search-bs index ...

Requirements

  • Python 3.8 or later
  • SQLite 3 with FTS5 support (included in Python 3.6+)
  • No external dependencies (stdlib only)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

search_bikeshed-0.1.0.tar.gz (9.1 kB view details)

Uploaded Source

File details

Details for the file search_bikeshed-0.1.0.tar.gz.

File metadata

  • Download URL: search_bikeshed-0.1.0.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for search_bikeshed-0.1.0.tar.gz
Algorithm Hash digest
SHA256 4346ff1c844746eef2919d429d8a6711994bce7fa69632089f070538f6f825d6
MD5 ae8ee95a8c6ecfd8e6a867b6a3a151c2
BLAKE2b-256 00be9c3e750755b6771db50244d27d7164d50e8180ff17a957bb3bdd028e557d

See more details on using hashes here.

Provenance

The following attestation bundles were made for search_bikeshed-0.1.0.tar.gz:

Publisher: publish-pypi.yml on tarekziade/search-bikeshed

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page