Skip to main content

Full-stack AI enablement platform

Project description

๐Ÿฌ dolphin

PyPi Version License: MIT

A semantic code indexing and search system with multiple interfaces. This repository currently ships the Knowledge Bank (Python) and MCP server (TypeScript/Bun) as stable, release-targeted components.

Quick Start

Installation

Core Installation (~200MB)

# install with uv (recommended)
uv pip install pb-dolphin

# ensure OPENAI_API_KEY is set as env var
export OPENAI_API_KEY="sk-your-key-here"

The accompanying MCP server is available at bunx dolphin-mcp.

Optional: Cross-Encoder Reranking (~2GB additional)

For advanced search quality improvement (+20-30% MRR):

uv pip install "pb-dolphin[reranking]"

See Advanced Features for more information.

Basic Usage

We recommend using uv run to execute all commands for maximum compatibility.

# Initialize global knowledge store and index a repository
dolphin init
dolphin add-repo my-project /path/to/project
dolphin index my-project

# Search your indexed code
dolphin search "authentication logic"

# Start API server
dolphin serve

Core Commands

  • dolphin init - Initialize configuration (auto-creates ~/.dolphin/config.toml)
  • dolphin init --repo - Create repo-specific config in current directory
  • dolphin add-repo <name> <path> - Register a repository for indexing
  • dolphin index <name> - Index a repository with language-aware chunking
  • dolphin search <query> - Search indexed code semantically
  • dolphin serve - Start REST API server (port 7777)
  • dolphin config --show - Display current configuration

Architecture

High-Level Overview

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   AI Interfaces (Claude, Continue, etc)  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ”‚ MCP Protocol
               โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚          Dolphin Knowledge Base          โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€-โ”  โ”‚
โ”‚  โ”‚ MCP Bridge  โ”‚โ—„โ”€โ”€โ–บโ”‚ REST API        โ”‚  โ”‚
โ”‚  โ”‚ (TypeScript)โ”‚    โ”‚ (Python/FastAPI)โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                               โ”‚
               โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
               โ–ผ                            โ–ผ
          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
          โ”‚LanceDB  โ”‚                โ”‚ SQLite   โ”‚
          โ”‚(Vectors)โ”‚                โ”‚(Metadata)โ”‚
          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Repository Layout & Tooling

  • Python backend (Knowledge Bank) (kb/)
    • Tooling: uv (pyproject.toml, uv.lock)
    • Commands: uv run dolphin ..., uv run pytest ...
  • MCP bridge (mcp-bridge/)
    • Tooling: Bun (package.json, bun.lock)
    • Commands: cd mcp-bridge && bun install && bun test
  • Shared telemetry/IPC (shared/)
    • Tooling: npm (package.json, node_modules/)
    • Commands: cd shared && npm install && npm test

At the repo root:

  • package.json acts as a workspace aggregator with convenience scripts (npm run build:all, npm run lint:all, npm run format).
  • Use just targets (just test-all, just check) for the canonical, cross-project workflows.

Key Features

  • File-Watch Indexing - Indexing is triggered automatically when files change by default
  • Language-Aware Chunking - Code parsing for Python, TypeScript, JavaScript, Markdown
  • Semantic Search
    • OpenAI embeddings with LanceDB vector storage
    • Hybrid approximate nn vector + BM25 keyword search with RRF scoring
    • Re-ranking with cross-encoder
    • MMR relevancy enhancement
  • Interfaces
    • dolphin CLI app
    • FastAPI server with search, retrieval, and metadata endpoints
    • MCP server implementation available at bunx dolphin-mcp
  • Configuration - Per-repo chunking and ignore configuration

Configuration

Dolphin uses a multi-level configuration system:

  1. Repo-specific (./.dolphin/config.toml) - Optional per-repository chunking settings
  2. User-global (~/.dolphin/config.toml) - Auto-created on first use

Configuration TOMLs

You can use dolphin init to initialize your global config and edit from there.

# ~/.dolphin/config.toml
default_embed_model = "large"  # or "small"

[embedding]
provider = "openai"
batch_size = 100

[retrieval]
top_k = 8
score_cutoff = 0.0

To generate a repo-specific config, use dolphin init --repo at the repository root.

Environment Variables

# Required when using OpenAI embeddings (recommended for production)
export OPENAI_API_KEY="sk-your-openai-api-key-here"

API Key Management

For security and future-proofing,Dolphin automatically manages a KB API key for securing Knowledge Base HTTP endpoints.

Auto-Provisioning:

  • Running dolphin init or dolphin serve automatically creates ~/.dolphin/kb_api_key
  • The MCP bridge (bunx dolphin-mcp) auto-provisions the key on startup
  • The key is a 64-character hex string with file permissions set to 0600 (user-only)

Environment Variable Override (Advanced):

For CI/CD, testing, or remote deployments, you can override the auto-provisioned key:

export DOLPHIN_API_KEY="your-custom-key-here"
# OR
export DOLPHIN_KB_API_KEY="your-custom-key-here"

Environment variables take precedence over the file-based key.

MCP Configuration

The small companion MCP interface can be run via bun without install. Add to your favorite AI application's config:

{
  "mcpServers": {
    "dolphin": {
      "command": "bunx",
      "args": ["dolphin-mcp"]
    }
  }
}

Set DOLPHIN_API_URL if your KB server is not running at http://127.0.0.1:7777.

Note: Make sure you are running the HTTP retrieval server: uv run dolphin serve

Available MCP tools: search, chunk_get, file_lines, store_info, metadata_get, repos_list, health

REST API

# Start server
dolphin serve

# Health check (unauthenticated)
curl http://127.0.0.1:7777/v1/health

# Most v1 endpoints require an API key
export DOLPHIN_API_KEY="$(cat ~/.dolphin/kb_api_key)"

# List repositories
curl -H "X-API-Key: $DOLPHIN_API_KEY" http://127.0.0.1:7777/v1/repos

# Search "authentication"
curl -X POST http://127.0.0.1:7777/v1/search \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $DOLPHIN_API_KEY" \
  -d '{"query": "authentication", "top_k": 5}'

Advanced Features

Cross-Encoder Reranking

Cross-encoder reranking improves search result relevance by re-scoring each result pairwise against the query using an ML model, leading to 20-30% improvements in search result ranking quality (Nogueira & Cho, 2019).

Performance Impact:

  • โš ๏ธ 2-3x slower searches - cross-encoder is compute-intensive
  • โš ๏ธ ~2GB install size - requires torch and sentence-transformers

Installation

uv pip install "pb-dolphin[reranking]"

Configuration

Enable in your ~/.dolphin/config.toml:

[retrieval.reranking]
enabled = true  # Enable cross-encoder reranking
model = "cross-encoder/ms-marco-MiniLM-L-6-v2"  # HuggingFace model
device = ""  # Auto-detect (CPU or CUDA if available)
batch_size = 32  # Higher = faster but more memory
candidate_multiplier = 4  # Rerank top_k ร— multiplier candidates
score_threshold = 0.3  # Minimum relevance score (0-1)

Restart the API server to apply changes:

uv run dolphin serve

File-Watching

The Dolphin server includes an integrated file watcher that keeps your Knowledge Bank synchronized in real-time.

  • Automatic: When you run dolphin serve, it automatically starts watching all registered repositories.
  • Git-Aware: The indexer respects .gitignore rules. The watcher handles Git branch switching, updating the index to match the new working tree.
  • Custom Control: You can explicitly specify which repos to watch with --watch <repo-name> or disable watching via --no-watch. If watching is disabled, indexing can be manually triggered via dolphin index <name>.

Configuring Embedding Models

Dolphin uses a consistent embedding model across your repositories to simplify global search. The embedding model can be configured globally in your config.toml:

default_embed_model = "large"  # Options: "small" or "large"

Currently only OpenAI embeddings are supported.

Development Status

Current: Release candidate (v0.2.0) for Knowledge Bank + MCP

  • โœ… Core indexing and search pipeline
  • โœ… Language-aware chunking (Python, TS, JS, Markdown)
  • โœ… REST API with MCP bridge available at bunx dolphin-mcp
  • โœ… Cross-encoder reranking support
  • โœ… Hybrid search (BM25 + Vector)

Requirements

  • Python โ‰ฅ3.12
  • OpenAI API key (for embeddings)
  • Bun (for MCP bridge)
  • Git (for repository scanning)
  • uv (for Python dependencies)

Testing

just test

See docs/TESTING.md for complete testing procedures.

Documentation

  • High-level architecture: docs/ARCHITECTURE.md
  • Testing guide: docs/TESTING.md
  • Benchmarking: docs/BENCHMARKING.md
  • Profiling: docs/PROFILING.md

Troubleshooting

Quick Diagnostics

# Check API server
curl http://127.0.0.1:7777/v1/health

# Check indexed repositories
dolphin kb status

# Re-index a repository
dolphin kb index <repo-name> --full --force

Common Issues

API not responding:

  • Start the server: dolphin serve
  • Check port conflicts: lsof -i :7777

No search results:

  • Verify repositories are indexed: dolphin kb status
  • Try with lower score cutoff in search parameters
  • Re-index: dolphin kb index <repo-name> --full --force

MCP not connecting:

  • Verify API server is running: curl http://127.0.0.1:7777/v1/health
  • Check MCP bridge logs: tail -f mcp-bridge/logs/mcp.log
  • Verify Bun is installed: bun --version

For detailed troubleshooting, performance tips, and development workflows, see AGENTS.md.

Publication

Versions

Current versions:

  • Python Package (PyPI): 0.2.0 - pb-dolphin
  • MCP Bridge (npm): 0.2.0 - dolphin-mcp

License

MIT License

Acknowledgments

Built with LanceDB, OpenAI, FastAPI, Bun, and lots of other tech.


Experimental Components (WIP)

The following components are under active development and not part of the stable release scope.

Agent Core

An LLM orchestrator which directly leverages the Knowledge Bank to improve discovery, planning, and execution for AI agents.

VS Code Extension

Provides an interface for Agent Core and the Knowledge Bank capability. The extension manages the KB server lifecycle automatically.

Features

  • AI Chat Interface: Interact with Claude AI directly in VS Code
  • Knowledge Bank Integration: Automatically searches your indexed codebase for context
  • Real-time Streaming: See AI responses as they're generated
  • Tool Call Visualization: Monitor Knowledge Bank searches and other tool executions

Installation (Development)

# 1. Build the extension
cd vscode-extension
npm install
npm run compile

# 2. Build the webview
cd webview
bun install
bun run build
cd ../..

# 3. Launch Extension Development Host
# Open vscode-extension folder in VS Code and press F5

โš ๏ธ Note: Knowledge Bank + MCP are release-candidate quality; experimental components remain under active development. Use at your own risk.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pb_dolphin-0.2.0.tar.gz (223.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pb_dolphin-0.2.0-py3-none-any.whl (257.3 kB view details)

Uploaded Python 3

File details

Details for the file pb_dolphin-0.2.0.tar.gz.

File metadata

  • Download URL: pb_dolphin-0.2.0.tar.gz
  • Upload date:
  • Size: 223.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pb_dolphin-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b3de798a26ee300f5443930117387b6b7a2ea12555d07a050952b7d490e570f8
MD5 104afccc9b1456b47cebe40325cbf6d3
BLAKE2b-256 fbe57dd960c9de0cf080d8b18d543da177773c3a70e94b52d0791389b16436cb

See more details on using hashes here.

File details

Details for the file pb_dolphin-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: pb_dolphin-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 257.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for pb_dolphin-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8179df4c8ddfe0636918662800210d33eb1b8fc5b1d591538a775547e47ddfa7
MD5 723a0aa09ed7159ee2555b4bbffd545c
BLAKE2b-256 8a3a458f3f8c344664fcae216eedfdbe96f062368dd8e04c53a267b57ce4e22a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page