Skip to main content

Full-stack AI enablement platform

Project description

๐Ÿฌ dolphin

PyPi Version License: MIT

โš ๏ธ EXPERIMENTAL - This is a developmental library under active development. APIs and interfaces are unstable and subject to change without notice.

A semantic code search and knowledge management system with AI-native interfaces (MCP, REST API, CLI).

Quick Start

Installation

Core Installation (~200MB)

# install with uv (recommended)
uv pip install pb-dolphin

# โš ๏ธ IMPORTANT: Ensure OPENAI_API_KEY is set as env var
export OPENAI_API_KEY="sk-your-key-here"

Optional: Cross-Encoder Reranking (~2GB additional)

For advanced search quality improvement (+20-30% MRR):

uv pip install pb-dolphin[reranking]

Trade-off: Better relevance but 2-3x slower searches. See Advanced Features for configuration.

Basic Usage

# Initialize global knowledge store and index a repository
dolphin init
dolphin add-repo my-project /path/to/project
dolphin index my-project

# Search your indexed code
dolphin search "authentication logic"

# Start API server
dolphin serve

Core Commands

  • dolphin init - Initialize configuration (auto-creates ~/.dolphin/config.toml)
  • dolphin init --repo - Create repo-specific config in current directory
  • dolphin add-repo <name> <path> - Register a repository for indexing
  • dolphin index <name> - Index a repository with language-aware chunking
  • dolphin search <query> - Search indexed code semantically
  • dolphin serve - Start REST API server (port 7777)
  • dolphin config --show - Display current configuration

Architecture

High-Level Overview

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   AI Interfaces (Claude, Continue, etc)  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
               โ”‚ MCP Protocol
               โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚          Dolphin Knowledge Base          โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€-โ”  โ”‚
โ”‚  โ”‚ MCP Bridge  โ”‚โ—„โ”€โ”€โ–บโ”‚ REST API        โ”‚  โ”‚
โ”‚  โ”‚ (TypeScript)โ”‚    โ”‚ (Python/FastAPI)โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                               โ”‚
               โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
               โ–ผ                            โ–ผ
          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”                โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
          โ”‚LanceDB  โ”‚                โ”‚ SQLite   โ”‚
          โ”‚(Vectors)โ”‚                โ”‚(Metadata)โ”‚
          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜                โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Key Features

  • Language-Aware Chunking - Code parsing for Python, TypeScript, JavaScript, Markdown
  • Semantic Search - OpenAI embeddings with LanceDB vector storage
  • REST API - FastAPI server with search, retrieval, and metadata endpoints
  • Unified CLI - Single dolphin command for all operations
  • Configuration - Per-repo chunking and ignore configuration
  • MCP Support - MCP server implementation available at bunx dolphin-mcp

Environment Variables

# Required when using OpenAI embeddings (recommended for production)
export OPENAI_API_KEY="sk-your-openai-api-key-here"

Configuration

Dolphin uses a multi-level configuration system:

  1. Repo-specific (./.dolphin/config.toml) - Optional per-repository chunking settings
  2. User-global (~/.dolphin/config.toml) - Auto-created on first use

Configuration

You can use dolphin init to initialize your config and edit from there.

# ~/.dolphin/config.toml
default_embed_model = "large"  # or "small"

[embedding]
provider = "openai"
batch_size = 100

[retrieval]
top_k = 8
score_cutoff = 0.0

MCP Configuration

The small companion MCP interface can be run via bun without install. Add to your favorite AI application's config:

{
  "mcpServers": {
    "dolphin": {
      "command": "bunx",
      "args": ["dolphin-mcp"]
    }
  }
}

Make sure you are running the HTTP retrieval server: uv run dolphin serve

Available MCP tools: search_knowledge, fetch_chunk, fetch_lines, get_vector_store_info

REST API

# Start server
dolphin serve

# Search
curl -X POST http://127.0.0.1:7777/v1/search \
  -H "Content-Type: application/json" \
  -d '{"query": "authentication", "top_k": 5}'

# List repositories
curl http://127.0.0.1:7777/v1/repos

# Health check
curl http://127.0.0.1:7777/v1/health

Advanced Features

Cross-Encoder Reranking

Cross-encoder reranking improves search result relevance by re-scoring each result pairwise against the query using an ML model, leading to 20-30% improvements in search result ranking quality (Nogueira & Cho, 2019).

Performance Impact:

  • โš ๏ธ 2-3x slower searches - cross-encoder is compute-intensive
  • โš ๏ธ ~2GB install size - requires torch and sentence-transformers

Installation

uv pip install pb-dolphin[reranking]

Configuration

Enable in your ~/.dolphin/config.toml:

[retrieval.reranking]
enabled = true  # Enable cross-encoder reranking
model = "cross-encoder/ms-marco-MiniLM-L-6-v2"  # HuggingFace model
device = ""  # Auto-detect (CPU or CUDA if available)
batch_size = 32  # Higher = faster but more memory
candidate_multiplier = 4  # Rerank top_k ร— multiplier candidates
score_threshold = 0.3  # Minimum relevance score (0-1)

Restart the API server to apply changes:

uv run dolphin serve

Development Status

Current: Beta (0.1.x)

  • โœ… Core indexing and search pipeline
  • โœ… Language-aware chunking (Python, TS, JS, Markdown)
  • โœ… REST API with MCP bridge available at bunx dolphin-mcp
  • โš ๏ธ Developmental stage

Upcoming:

  • Performance optimization
  • Production hardening
  • Evaluation framework
  • Expanded language support

Requirements

  • Python โ‰ฅ3.12
  • OpenAI API key (for embeddings)
  • Bun (for MCP bridge)
  • Git (for repository scanning)

Testing

# Run all tests
uv run pytest

# Run specific test suite
uv run pytest tests/unit/
uv run pytest tests/integration/

License

MIT License

Acknowledgments

Built with LanceDB, OpenAI, FastAPI, and Bun


โš ๏ธ Remember: This is experimental software under active development. Use at your own risk.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pb_dolphin-0.1.12.tar.gz (104.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pb_dolphin-0.1.12-py3-none-any.whl (134.5 kB view details)

Uploaded Python 3

File details

Details for the file pb_dolphin-0.1.12.tar.gz.

File metadata

  • Download URL: pb_dolphin-0.1.12.tar.gz
  • Upload date:
  • Size: 104.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for pb_dolphin-0.1.12.tar.gz
Algorithm Hash digest
SHA256 37282fbb171055681fc586efc06521018bfe7dac43ee93d484d28a96dd2f9ed1
MD5 759badd34ff5f4a46d6ba9b3f9cc4bd6
BLAKE2b-256 5a1fc5e70eeb17bec03abd781f6a2df507783a1b2f4789827163e93b9559185d

See more details on using hashes here.

File details

Details for the file pb_dolphin-0.1.12-py3-none-any.whl.

File metadata

  • Download URL: pb_dolphin-0.1.12-py3-none-any.whl
  • Upload date:
  • Size: 134.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for pb_dolphin-0.1.12-py3-none-any.whl
Algorithm Hash digest
SHA256 86eda469d5ea9fa75a65b4392ea7258836c7df65fd285da61f0998afca5aadab
MD5 1092ed6421400664b36d17b9a7ac02fa
BLAKE2b-256 8d141f2a5115bddb3853bb7187d476233750fe7700590da720584113c99c2122

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page