Semantic embedding validation tool for ADRs and code

These details have not been verified by PyPI

Project description

gundog

Gundog is a local semantic retrieval engine for your high volume corpus. It finds relevant code and documentation by understanding what you mean, not just matching keywords.

Point it at your docs and code. It embeds everything into vectors, builds a similarity graph connecting related files, and combines semantic search with keyword matching. Ask "how does auth work?" and it retrieves the login handler, session middleware, and the ADR that explains why you chose JWT even if none of them contain the word "auth".

Use it for LLM context retrieval, exploring unfamiliar codebases, or as a dynamic documentation explorer. Runs entirely on your machine.

gundog web UI

The Problem

Your codebase is full of implicit connections that aren't explicit. The ADR explaining your auth strategy relates to the login handler, which relates to the session middleware but nothing links them. Docs drift from implementation. Knowledge lives in silos.

There are some tools that solves this problem. Especially, credit where it's due - the core idea of gundog is based on the much more mature SeaGOAT project. But my particular needs were ever so slightly different.

I wanted a clean map of data chunks from wide spread data sources and their correlation based on a natural language query. SeaGOAT provides rather a flat but more accurate pointer to a specific data chunk from a single git repository. Basically, I wanted a Obsidian graph view of my docs controlled based on a natural language query without having to go through the pain of using.. well.. Obsidian. And wrapping SeaGOAT with some scripts was limiting and also hard to distribute.

Gundog builds these connections automatically. Vector search finds semantically related content, BM25 catches exact keyword matches, and graph expansion surfaces files you didn't know to look for.

Install

pip install gundog

Optional extras:

pip install gundog[viz]    # for query graph visualization
pip install gundog[lance]  # for larger codebases (10k+ files)
pip install gundog[serve]  # for web UI server

Or from source

git clone https://github.com/adhityaravi/gundog.git
cd gundog
uv sync
uv run gundog --help

Quick Start

1. Create a config file (default: .gundog/config.yaml):

sources:
  - path: ./docs
    glob: "**/*.md"
  - path: ./src
    glob: "**/*.py"

storage:
  backend: numpy
  path: .gundog/index

2. Index your stuff:

gundog index

First run downloads the embedding model (~130MB for the default). You can use any sentence-transformers model. Subsequent runs are incremental—only re-indexes changed files.

3. Search:

gundog query "database connection pooling"

Returns ranked results with file paths and relevance scores.

Commands

You can use gundog with a config file OR with CLI flags directly. but config files are recommended:

# With config file (default: .gundog/config.yaml)
gundog index
gundog index -c /path/to/config.yaml

# Without config file
gundog index --source ./docs:*.md --source ./src:*.py
gundog query "auth" --index .gundog/index

`gundog index`

Scans your configured sources, embeds the content, and builds a searchable index.

gundog index                                      # uses config file
gundog index --rebuild                            # fresh index from scratch
gundog index -s ./docs:*.md -s ./src:*.py         # no config needed
gundog index -s ./docs -i ./my-index              # custom index location
gundog index -s ./src:*.py -e '**/test_*'         # exclude test files

Source format: path, path:glob, or path:type:glob

Exclude patterns use fnmatch syntax (e.g., **/test_*, **/__pycache__/*).

Exclusion templates provide predefined patterns for common languages:

gundog index -s ./src:*.py --exclusion-template python  # ignores __pycache__, .venv, etc.
gundog index -s ./src:*.ts --exclusion-template typescript

Available templates: python, javascript, typescript, go, rust, java

`gundog query`

Finds relevant files for a natural language query.

gundog query "error handling strategy"
gundog query "authentication" --top 5        # limit results
gundog query "caching" --no-expand           # skip graph expansion
gundog query "auth" --index .gundog/index    # specify index directly
gundog query "api design" --type docs        # filter by type (if sources have types)
gundog query "auth flow" --graph             # opens visual graph of results

`gundog serve`

Starts a web UI for interactive queries with a visual graph.

gundog serve                              # starts at http://127.0.0.1:8000
gundog serve --port 3000                  # custom port
gundog serve --title "My Project"         # custom title

File links are auto-detected from git repos. Files in a git repo with a remote get clickable links to GitHub/GitLab. Non-git files show the path on hover.

Requires the serve extra: pip install gundog[serve]

How It Works

Embedding: Files are converted to vectors using sentence-transformers. Similar concepts end up as nearby vectors.
Hybrid Search: Combines semantic (vector) search with keyword (BM25) search using Reciprocal Rank Fusion. Queries like "UserAuthService" find exact matches even when embeddings might miss them.
Storage: Vectors stored locally via numpy+JSON (default) or LanceDB for scale. No external services.
Graph: Documents above a similarity threshold get connected, enabling traversal from direct matches to related files.
Query: Your query gets embedded, compared against stored vectors, fused with keyword results, and ranked. Scores are rescaled so 0% = baseline, 100% = perfect match. Irrelevant queries return nothing.

Configuration

Full config options:

sources:
  - path: ./docs
    glob: "**/*.md"
  - path: ./src
    glob: "**/*.py"
    type: code                    # optional - for filtering with --type
    exclusion_template: python    # optional - predefined excludes
    exclude:                      # optional - additional patterns to skip
      - "**/test_*"

embedding:
  # Any sentence-transformers model works: https://sbert.net/docs/sentence_transformer/pretrained_models.html
  model: BAAI/bge-small-en-v1.5  # default (~130MB), good balance of speed/quality

storage:
  backend: numpy      # or "lancedb" for larger corpora
  path: .gundog/index

graph:
  similarity_threshold: 0.7  # min similarity to create edge
  expand_threshold: 0.5      # min edge weight for query expansion
  max_expand_depth: 2        # how far to traverse during expansion

hybrid:
  enabled: true       # combine vector + keyword search (default: on)
  bm25_weight: 0.5    # keyword search weight
  vector_weight: 0.5  # semantic search weight

chunking:
  enabled: false      # split files into chunks (opt-in)
  max_tokens: 512     # tokens per chunk
  overlap_tokens: 50  # overlap between chunks

The type field is optional. If you want to filter results by category (e.g., --type code), assign types to your sources. Any string works. To use a type without a config file, use the format path:type:glob when specifying sources.

Chunking

For large files, enable chunking to get better search results. Instead of embedding whole files (which dilutes signal), chunking splits files into overlapping segments:

chunking:
  enabled: true
  max_tokens: 512   # ~2000 characters per chunk
  overlap_tokens: 50

Results are automatically deduplicated by file, showing the best-matching chunk.

What Gundog Doesn't Do

Chat: It's retrieval, not generation. Feed results to your LLM of choice.
Cloud anything: Everything runs locally. Your code stays on your machine.

Why "Gundog"?

Gundogs retrieve things. That's the whole job. Point at what you want, they fetch it. Small, focused, good at one thing.

Development

uv sync --extra dev      # install dev dependencies
uv run ruff check .      # lint
uv run ruff format .     # format
uv run pyright src       # type check
uv run pytest            # test
uv run tox               # run all checks

Small. Lightweight. Ferocious.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.4.1

Dec 20, 2025

0.4.0

Dec 20, 2025

0.3.1

Dec 18, 2025

0.3.0

Dec 18, 2025

0.2.0

Dec 17, 2025

This version

0.1.2

Dec 14, 2025

0.1.1

Dec 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gundog-0.1.2.tar.gz (318.6 kB view details)

Uploaded Dec 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

gundog-0.1.2-py3-none-any.whl (50.4 kB view details)

Uploaded Dec 14, 2025 Python 3

File details

Details for the file gundog-0.1.2.tar.gz.

File metadata

Download URL: gundog-0.1.2.tar.gz
Upload date: Dec 14, 2025
Size: 318.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gundog-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`558ef57358eb1ec0e171b092e570475322ffa8ee53bf52709b544daae12721f9`
MD5	`b608e29b39c85555fd468f36da8c30a6`
BLAKE2b-256	`1b91f7c0a8b1f3c11205edb863ba434688aa6d3df6e7c4abcf9425476c7c10b6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gundog-0.1.2.tar.gz:

Publisher: release.yaml on adhityaravi/gundog

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gundog-0.1.2.tar.gz
- Subject digest: 558ef57358eb1ec0e171b092e570475322ffa8ee53bf52709b544daae12721f9
- Sigstore transparency entry: 763944636
- Sigstore integration time: Dec 14, 2025
Source repository:
- Permalink: adhityaravi/gundog@8ce002417311f0b04443b5bcb00fb1503a3e1885
- Branch / Tag: refs/heads/main
- Owner: https://github.com/adhityaravi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yaml@8ce002417311f0b04443b5bcb00fb1503a3e1885
- Trigger Event: workflow_dispatch

File details

Details for the file gundog-0.1.2-py3-none-any.whl.

File metadata

Download URL: gundog-0.1.2-py3-none-any.whl
Upload date: Dec 14, 2025
Size: 50.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gundog-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`38ad19c998a0530545144ce58651a094ca703f68ce3c2b583f3b1fa2dc073709`
MD5	`9377738e64edb259350b6f5b2cf14def`
BLAKE2b-256	`42b25c126cb9fefe7fe30c1f20cbcbf4ee5055075318d247dd9aed5fbc292412`

See more details on using hashes here.

Provenance

The following attestation bundles were made for gundog-0.1.2-py3-none-any.whl:

Publisher: release.yaml on adhityaravi/gundog

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: gundog-0.1.2-py3-none-any.whl
- Subject digest: 38ad19c998a0530545144ce58651a094ca703f68ce3c2b583f3b1fa2dc073709
- Sigstore transparency entry: 763944637
- Sigstore integration time: Dec 14, 2025
Source repository:
- Permalink: adhityaravi/gundog@8ce002417311f0b04443b5bcb00fb1503a3e1885
- Branch / Tag: refs/heads/main
- Owner: https://github.com/adhityaravi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: release.yaml@8ce002417311f0b04443b5bcb00fb1503a3e1885
- Trigger Event: workflow_dispatch

gundog 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

gundog

The Problem

Install

Or from source

Quick Start

Commands

gundog index

gundog query

gundog serve

How It Works

Configuration

Chunking

What Gundog Doesn't Do

Why "Gundog"?

Development

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`gundog index`

`gundog query`

`gundog serve`