Skip to main content

Full-stack AI enablement platform

Project description

๐Ÿฌ Dolphin

Hybrid search across all your repositories.

PyPI npm MIT License


Dolphin indexes your repositories and lets you perform hybrid (semantic + keyword) search across them.

Quickstart

# Install
uv pip install pb-dolphin

# Set your OpenAI key (used for embeddings)
export OPENAI_API_KEY="sk-..."

# Initialize, add a repo, and search
dolphin init
dolphin add-repo my-project /path/to/project
dolphin index my-project
dolphin search "database connection pooling"

Dolphin indexes your code with language-aware chunking, embeds it, and returns ranked results.

Want live re-indexing as you edit files? Start the server:

dolphin serve

Agent Integration

A small companion MCP server is available at bunx dolphin-mcp. Add this to your AI app's MCP config:

{
  "mcpServers": {
    "dolphin": {
      "command": "bunx",
      "args": ["dolphin-mcp"]
    }
  }
}

Make sure dolphin serve is running, and your agent can now search, retrieve chunks, and read files from your indexed repos.

Additionally, a Claude skill is available in this repo's marketplace as a personal Plugin.

How it works

  You / Agent
        |
        v
  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
  โ”‚             Dolphin                   โ”‚
  โ”‚                                       โ”‚
  โ”‚   CLI โ”€โ”€โ”€ REST API โ”€โ”€โ”€ MCP Bridge     โ”‚
  โ”‚               |                       โ”‚
  โ”‚        โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”                โ”‚
  โ”‚        v             v                โ”‚
  โ”‚    LanceDB       SQLite               โ”‚
  โ”‚   (vectors)    (metadata + BM25)      โ”‚
  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Indexing: Your code is scanned, split into semantic chunks using language-aware AST parsers, embedded via OpenAI, and stored in LanceDB (vectors) and SQLite (metadata + full-text).

Searching: Your query is embedded and matched against both vector similarity and BM25 keyword relevance. Results are fused with Reciprocal Rank Fusion, optionally reranked with a cross-encoder, and returned as structured snippets with file paths, line numbers, and scores.

Features

Intelligent hybrid search

  • Hybrid vector + BM25 keyword search with RRF fusion
  • Optional cross-encoder reranking for +20-30% ranking improvement
  • MMR diversity to reduce redundant results
  • Filter by repo, language, path, or glob pattern

Language-aware indexing

  • AST-based chunking for Python, TypeScript, JavaScript, Markdown, SQL, and Svelte
  • Fallback text chunking for everything else
  • Respects .gitignore and an optional repo-specific Dolphin config (dolphin init --repo)

Live sync

  • File-watching built into dolphin serve so edits are re-indexed automatically
  • Git-aware: handles branch switches gracefully

Multiple interfaces

  • dolphin CLI with compact, verbose, and JSON output modes
  • FastAPI server with full search and retrieval endpoints
  • MCP server for integration via bunx dolphin-mcp

CLI reference

Command What it does
dolphin init Create config at ~/.dolphin/config.toml
dolphin add-repo <name> <path> Register a repository
dolphin index <name> Index (or re-index) a repository
dolphin search <query> Search across indexed repos
dolphin serve Start API server with file-watching
dolphin status Show indexed repos and stats
dolphin repos List registered repositories
dolphin rm-repo <name> Remove a repo and its data
dolphin config --show Display current config

Search options

dolphin search "error handling" \
  --repo myapp \
  --lang py \
  --path src/ \
  --top-k 10 \
  --verbose          # or --json for scripting

Configuration

Dolphin auto-creates its config at ~/.dolphin/config.toml when you run dolphin init. The defaults work well out of the box.

default_embed_model = "small"   # "small" (faster) or "large" (better)

[retrieval]
top_k = 8

[retrieval.hybrid_search]
enabled = true
fusion_method = "rrf"

For per-repo overrides (custom ignore patterns, chunking settings), run dolphin init --repo inside a repository.

Full config reference: docs/ARCHITECTURE.md

Optional: cross-encoder reranking

For the best possible search quality, enable cross-encoder reranking. This re-scores results pairwise against your query using an ML model.

uv pip install "pb-dolphin[reranking]"

Then in ~/.dolphin/config.toml:

[retrieval.reranking]
enabled = true

Trade-offs: ~2GB disk for model weights, 2-3x slower searches.

Requirements

Dependency Purpose
Python 3.12+ Core runtime
uv Python package management
OpenAI API key Embedding generation
Bun MCP bridge runtime (optional)
Git Repository scanning

Troubleshooting

Server not responding?

curl http://127.0.0.1:7777/v1/health   # check health
lsof -i :7777                           # check port
dolphin serve                            # start it

No search results?

dolphin status                                    # verify repos are indexed
dolphin index <repo-name> --full --force          # force re-index

MCP not connecting?

  • Make sure dolphin serve is running
  • Check that Bun is installed: bun --version
  • Set DOLPHIN_API_URL if the server isn't at http://127.0.0.1:7777

License

MIT โ€” Plastic Beach, LLC

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pb_dolphin-0.2.4.tar.gz (245.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pb_dolphin-0.2.4-py3-none-any.whl (286.6 kB view details)

Uploaded Python 3

File details

Details for the file pb_dolphin-0.2.4.tar.gz.

File metadata

  • Download URL: pb_dolphin-0.2.4.tar.gz
  • Upload date:
  • Size: 245.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pb_dolphin-0.2.4.tar.gz
Algorithm Hash digest
SHA256 4cec7d6bc374d046ddaa930e1b0fd5b9f611bbd63b295360398c197d7380f41f
MD5 437f0767583c91bbcf0f3aba86207067
BLAKE2b-256 f59fd04bbb8ed6828ac40229d3a84cfe160e6867274461d9b0b875dfc49d7efa

See more details on using hashes here.

Provenance

The following attestation bundles were made for pb_dolphin-0.2.4.tar.gz:

Publisher: publish-kb.yml on plasticbeachllc/dolphin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pb_dolphin-0.2.4-py3-none-any.whl.

File metadata

  • Download URL: pb_dolphin-0.2.4-py3-none-any.whl
  • Upload date:
  • Size: 286.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pb_dolphin-0.2.4-py3-none-any.whl
Algorithm Hash digest
SHA256 2aebf9a7d6280466a4e6c139392d049a3c241825f7d639664713168b3c0ba1cb
MD5 6fce7e77d03374d22c581e688f2fe04f
BLAKE2b-256 7ce563e8a50e56aeb6893a17493a5a26bcee7a34e42ee54aeee4314c276140d0

See more details on using hashes here.

Provenance

The following attestation bundles were made for pb_dolphin-0.2.4-py3-none-any.whl:

Publisher: publish-kb.yml on plasticbeachllc/dolphin

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page