Skip to main content

Local-first code search via MCP/CLI

Project description

Coco[-S]earch โ€” Local-first hybrid semantic code search

PyPI Python >= 3.11 License: MIT Ruff uv pytest MCP

Bash C C++ C# CSS Dockerfile DTD Fortran Go Groovy HCL HTML Java JavaScript JSON Kotlin Markdown Pascal PHP Python R Ruby Rust Scala Solidity SQL Swift TOML TypeScript XML YAML

Docker Compose GitHub Actions GitLab CI Helm Template Helm Values Kubernetes

Coco[-S]earch is a local-first hybrid semantic code search tool. It combines vector similarity and keyword matching (via RRF fusion) to find code by meaning, not just text. Powered by CocoIndex for indexing, Tree-sitter for syntax-aware chunking and symbol extraction, PostgreSQL with pgvector for storage, and Ollama for local embeddings. No external APIs โ€” everything runs on your machine.

Available as a CLI, MCP server, or interactive REPL. Incremental indexing, .gitignore-aware. Supports 31+ languages with symbol-level filtering for 14+, plus domain-specific grammars for structured config files.

๐Ÿ“‘ Table of Contents

Disclaimer

This project was originally built for personal use โ€” a solo experiment in local-first, privacy-focused code search to accelerate self-onboarding to new codebases and explore spec-driven development. Initially scaffolded with GSD and refined by hand. Ships with a CLI, MCP tools, dashboards (TUI/WEB), a status API, reusable Claude SKILLS, and a Claude Code plugin for one-command setup.

Quick Start

  • Services:

    # 1. Clone this repository and start infrastructure:
    git clone https://github.com/VioletCranberry/coco-s.git && cd coco-s
    # Docker volumes are bind-mounted to ./docker_data/ inside the repository,
    # so infrastructure must be started from the cloned repo directory.
    docker compose up -d
    # 2. Verify services are ready.
    uvx cocosearch config check
    
  • Indexing your projects:

    # 3.1 Use WEB Dashboard:
    uvx cocosearch dashboard
    # 3.2 Use CLI:
    uvx cocosearch index .
    # 3.3 Use AI and MCP - see below.
    
  • Register with your AI assistant (pick one):

    Option A โ€” Plugin (recommended):

    claude plugin marketplace add VioletCranberry/coco-s
    claude plugin install cocosearch@cocosearch
    # All 7 skills + MCP server configured automatically
    

    CocoSearch plugin skills in Claude Code

    Option B โ€” Manual MCP registration:

    claude mcp add --scope user cocosearch -- \
      uvx cocosearch mcp --project-from-cwd
    

    Note: The MCP server automatically opens a web dashboard in your browser on a random port. Set COCOSEARCH_DASHBOARD_PORT=8080 to pin it to a fixed port, or COCOSEARCH_NO_DASHBOARD=1 to disable it.

    Install skills manually (for development):

    mkdir -p .claude/skills
    for skill in cocosearch-onboarding cocosearch-refactoring cocosearch-debugging cocosearch-quickstart cocosearch-explore cocosearch-new-feature cocosearch-subway; do
        ln -sfn "../../skills/$skill" ".claude/skills/$skill"
    done
    

Features

  • ๐Ÿ” Hybrid search -- combines semantic similarity and keyword matching via RRF fusion to find code by meaning and by text.
  • ๐Ÿท๏ธ Symbol filtering -- narrow results to functions, classes, methods, or interfaces; match symbol names with glob patterns.
  • ๐Ÿ“ Context expansion -- results automatically expand to enclosing function/class boundaries using Tree-sitter, so you see complete units of code.
  • โšก Query caching -- exact and semantic cache for fast repeated queries (0.95 cosine threshold).
  • ๐Ÿฉบ Parse health tracking -- per-language parse status, failure details, and staleness warnings when the index drifts from your branch.
  • ๐Ÿ”ฌ Pipeline analysis -- cocosearch analyze runs the search pipeline with full diagnostics: see identifier detection, mode selection, RRF fusion breakdown, definition boost effects, and per-stage timings. Available as CLI and MCP tool.
  • ๐Ÿ”’ Privacy-first -- everything runs locally. No external API calls, no telemetry.

Interfaces

Search your code four ways โ€” pick what fits your workflow:

Interface Best for How to start
CLI One-off searches, scripting, CI cocosearch search "auth flow"
Interactive REPL Exploratory sessions โ€” tweak filters, switch indexes, iterate on queries without restarting cocosearch search --interactive
Web Dashboard Visual search + index management in the browser โ€” filters, syntax-highlighted results, charts, dark/light theme cocosearch dashboard
MCP Server AI assistant integration (Claude Code, Claude Desktop, OpenCode) cocosearch mcp --project-from-cwd

CLI

# Index a project
uvx cocosearch index /path/to/project

# Search with natural language
uvx cocosearch search "authentication flow" --pretty

# Serve CocoSearch WEB dashboard
uvx cocosearch dashboard

# Analyze search pipeline (debug why results rank the way they do)
uvx cocosearch analyze "getUserById"

# Start interactive REPL
uvx cocosearch search --interactive

# View index stats with parse health
uvx cocosearch stats --pretty

โฏ uv run cocosearch stats --pretty

Index: cocosearch
Source: GIT/personal/coco-s
Branch: main (0b6050b) ยท up to date
Status: Indexed
Files: 192 | Chunks: 2,023 | Size: 15.0 MB
Created: 2026-02-09 18:30
Last Updated: 2026-02-14 12:36 (0 days ago)

                        Language Distribution
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Language     โ”ƒ  Files โ”ƒ   Chunks โ”ƒ Distribution                   โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ py           โ”‚    162 โ”‚     1648 โ”‚ โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ โ”‚
โ”‚ md           โ”‚     22 โ”‚      267 โ”‚ โ–ˆโ–ˆโ–ˆโ–ˆโ–Š                          โ”‚
โ”‚ html         โ”‚      1 โ”‚      100 โ”‚ โ–ˆโ–Š                             โ”‚
โ”‚ json         โ”‚      3 โ”‚        3 โ”‚                                โ”‚
โ”‚ toml         โ”‚      1 โ”‚        2 โ”‚                                โ”‚
โ”‚ yaml         โ”‚      2 โ”‚        2 โ”‚                                โ”‚
โ”‚ docker-compโ€ฆ โ”‚      1 โ”‚        1 โ”‚                                โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

                             Grammar Distribution
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Grammar              โ”ƒ Base Language  โ”ƒ  Files โ”ƒ   Chunks โ”ƒ  Recognition % โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ docker-compose       โ”‚ yaml           โ”‚      1 โ”‚        1 โ”‚         100.0% โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

 Symbol Statistics
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Type     โ”ƒ Count โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ function โ”‚   927 โ”‚
โ”‚ class    โ”‚   229 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Parse health: 100.0% clean (162/162 files)
                Parse Status by Language
โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Language โ”ƒ Files โ”ƒ  OK โ”ƒ Partial โ”ƒ Error โ”ƒ No Grammar โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ python   โ”‚   162 โ”‚ 162 โ”‚       0 โ”‚     0 โ”‚          0 โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

# View index stats with parse health live
uvx cocosearch stats --live

# List all indexes
uvx cocosearch list --pretty

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Name       โ”ƒ Table                                      โ”ƒ Branch                  โ”ƒ Status  โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ cocosearch โ”‚ codeindex_cocosearch__cocosearch_chunks     โ”‚ main (ed00733)          โ”‚ Indexed โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

For the full list of commands and flags, see CLI Reference.

Web Dashboard

cocosearch dashboard opens a browser UI at http://localhost:8080 with:

  • Code search โ€” natural language queries with language, symbol type, and hybrid search filters. Results show syntax-highlighted snippets, score badges, match type, and symbol metadata.
  • Index management โ€” create, reindex (incremental or fresh), and delete indexes from the browser.
  • Observability โ€” language distribution charts, parse health breakdown, staleness warnings, storage metrics.
Dashboard screenshots

CocoSearch dashboard โ€” dark theme ย ย  CocoSearch dashboard โ€” light theme

CocoSearch dashboard โ€” search results with file actions ย ย  CocoSearch dashboard โ€” file viewer modal

Interactive REPL

cocosearch search --interactive starts a persistent search session:

cocosearch> authentication middleware
  [results...]
cocosearch> :lang python
  Language filter: python
cocosearch> error handling in views
  [results filtered to Python...]
cocosearch> :index other-project
  Switched to index: other-project

Settings persist across queries โ€” change :limit, :lang, :context, or :index without restarting. Supports command history (up/down arrows) and inline filters (lang:python directly in queries).

Where MCP wins

For codebases of meaningful size, CocoSearch reduces the number of MCP tool calls needed to find relevant code โ€” often from 5-15 iterative grep/read cycles down to 1-2 semantic searches. This means fewer round-trips, less irrelevant content in the context window, and lower token consumption for exploratory and intent-based queries.

  • Exploratory/semantic queries: "how does authentication work", "where is error handling done", "find the caching logic".
    • Native approach: Claude does 5-15 iterative grep/glob/read cycles, each adding results to context. Lots of trial-and-error, irrelevant matches, and full-file reads.
    • CocoSearch: 1 search_code call returns ranked, pre-chunked results with smart context expansion to function/class boundaries. Dramatically fewer tokens in context.
  • Identifier search with fuzzy intent: "find the function that handles user signup".
    • Native grep requires Claude to guess the exact name (grep "signup", grep "register", grep "create_user"...). Each miss costs a round-trip + tokens.
    • CocoSearch's hybrid RRF (vector + keyword) handles this in 1 call.
  • Filtered searches: language/symbol type/symbol name filtering is built-in. Native tools require Claude to manually assemble glob patterns and filter results.

Useful Documentation

Components

  • Ollama -- runs the embedding model (nomic-embed-text) locally.
  • PostgreSQL + pgvector -- stores code chunks and their vector embeddings for similarity search.
  • CocoSearch -- CLI and MCP server that coordinates indexing and search.

Available MCP Tools

  • index_codebase -- index a directory for semantic search
  • search_code -- search indexed code with natural language queries
  • analyze_query -- pipeline diagnostics: understand why a query returns specific results
  • list_indexes -- list all available indexes
  • index_stats -- get statistics and parse health for an index
  • clear_index -- remove an index from the database

Available Skills

  • cocosearch-quickstart (SKILL.md): Use when setting up CocoSearch for the first time or indexing a new project. Guides through infrastructure check, indexing, and verification in under 2 minutes.
  • cocosearch-debugging (SKILL.md): Use when debugging an error, unexpected behavior, or tracing how code flows through a system. Guides root cause analysis using CocoSearch semantic and symbol search.
  • cocosearch-onboarding (SKILL.md): Use when onboarding to a new or unfamiliar codebase. Guides you through understanding architecture, key modules, and code patterns step-by-step using CocoSearch.
  • cocosearch-refactoring (SKILL.md): Use when planning a refactoring, extracting code into a new module, renaming across the codebase, or splitting a large file. Guides impact analysis and safe step-by-step execution using CocoSearch.
  • cocosearch-new-feature (SKILL.md): Use when adding new functionality โ€” a new command, endpoint, module, handler, or capability. Guides placement, pattern matching, and integration using CocoSearch.
  • cocosearch-explore (SKILL.md): Use for codebase exploration โ€” answering questions about how code works, tracing flows, or researching a topic. Autonomous mode for subagent/plan mode research; interactive mode for user-facing "how does X work?" explanations.
  • cocosearch-subway (SKILL.md): Use when the user wants to visualize codebase structure as an interactive London Underground-style subway map. AI-generated visualization using CocoSearch tools for exploration.

How Search Works

 Query: "authentication flow"
 โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
                              โ”‚
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚   Query Analysis   โ”‚  Detect identifiers
                    โ”‚  (camelCase, etc.) โ”‚  โ†’ auto-enable hybrid
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚
                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                    โ”‚  Ollama Embedding  โ”‚  nomic-embed-text
                    โ”‚   768-dim vector   โ”‚  (runs locally)
                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                              โ”‚
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ”‚                               โ”‚
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”          โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚  Vector Similarity โ”‚          โ”‚  Keyword Search    โ”‚
    โ”‚  (pgvector cosine) โ”‚          โ”‚  (tsvector FTS)    โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜          โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
              โ”‚                               โ”‚
              โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                          โ”‚
                โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                โ”‚    RRF Fusion      โ”‚  Reciprocal Rank Fusion
                โ”‚  + Definition 2x   โ”‚  merges both ranked lists
                โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                          โ”‚
                โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                โ”‚  Symbol & Language  โ”‚  --symbol-type function
                โ”‚     Filtering       โ”‚  --language python
                โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                          โ”‚
                โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                โ”‚ Context Expansion  โ”‚  Expand to enclosing
                โ”‚ (Tree-sitter)      โ”‚  function/class boundaries
                โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                          โ”‚
                โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                โ”‚   Query Cache      โ”‚  Exact hash + semantic
                โ”‚   (LRU + 0.95)     โ”‚  similarity fallback
                โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                          โ”‚
                          โ–ผ
                   Ranked Results
 โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€

Supported Languages

CocoSearch indexes 31 programming languages. Symbol-aware languages (โœ“) support --symbol-type and --symbol-name filtering.

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Language   โ”ƒ Extensions                  โ”ƒ Symbols โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ C          โ”‚ .c, .h                      โ”‚    โœ“    โ”‚
โ”‚ C++        โ”‚ .cpp, .cc, .cxx, .hpp, .hxx โ”‚    โœ“    โ”‚
โ”‚ C#         โ”‚ .cs                         โ”‚    โœ—    โ”‚
โ”‚ CSS        โ”‚ .css, .scss                 โ”‚    โœ“    โ”‚
โ”‚ DTD        โ”‚ .dtd                        โ”‚    โœ—    โ”‚
โ”‚ Fortran    โ”‚ .f, .f90, .f95, .f03        โ”‚    โœ—    โ”‚
โ”‚ Go         โ”‚ .go                         โ”‚    โœ“    โ”‚
โ”‚ Groovy     โ”‚ .groovy, .gradle            โ”‚    โœ—    โ”‚
โ”‚ HTML       โ”‚ .html, .htm                 โ”‚    โœ—    โ”‚
โ”‚ Java       โ”‚ .java                       โ”‚    โœ“    โ”‚
โ”‚ Javascript โ”‚ .js, .mjs, .cjs, .jsx       โ”‚    โœ“    โ”‚
โ”‚ JSON       โ”‚ .json                       โ”‚    โœ—    โ”‚
โ”‚ Kotlin     โ”‚ .kt, .kts                   โ”‚    โœ—    โ”‚
โ”‚ Markdown   โ”‚ .md, .mdx                   โ”‚    โœ—    โ”‚
โ”‚ Pascal     โ”‚ .pas, .dpr                  โ”‚    โœ—    โ”‚
โ”‚ Php        โ”‚ .php                        โ”‚    โœ“    โ”‚
โ”‚ Python     โ”‚ .py, .pyw, .pyi             โ”‚    โœ“    โ”‚
โ”‚ R          โ”‚ .r, .R                      โ”‚    โœ—    โ”‚
โ”‚ Ruby       โ”‚ .rb                         โ”‚    โœ“    โ”‚
โ”‚ Rust       โ”‚ .rs                         โ”‚    โœ“    โ”‚
โ”‚ Scala      โ”‚ .scala                      โ”‚    โœ“    โ”‚
โ”‚ Solidity   โ”‚ .sol                        โ”‚    โœ—    โ”‚
โ”‚ SQL        โ”‚ .sql                        โ”‚    โœ—    โ”‚
โ”‚ Swift      โ”‚ .swift                      โ”‚    โœ—    โ”‚
โ”‚ TOML       โ”‚ .toml                       โ”‚    โœ—    โ”‚
โ”‚ Typescript โ”‚ .ts, .tsx, .mts, .cts       โ”‚    โœ“    โ”‚
โ”‚ XML        โ”‚ .xml                        โ”‚    โœ—    โ”‚
โ”‚ YAML       โ”‚ .yaml, .yml                 โ”‚    โœ—    โ”‚
โ”‚ Bash       โ”‚ .sh, .bash, .zsh            โ”‚    โœ“    โ”‚
โ”‚ Dockerfile โ”‚ Dockerfile                  โ”‚    โœ—    โ”‚
โ”‚ HCL        โ”‚ .tf, .hcl, .tfvars          โ”‚    โœ“    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
How chunking works

Chunking strategy depends on the language:

  • Tree-sitter chunking (~20 languages): CocoIndex's SplitRecursively uses Tree-sitter internally to split at syntax-aware boundaries (function/class edges). Covers Python, JavaScript, TypeScript, Go, Rust, Java, C, C++, C#, Ruby, PHP, and others in CocoIndex's built-in list.
  • Custom handler chunking (6 languages): HCL, Dockerfile, Bash, Go Template, Scala, and Groovy use regex-based CustomLanguageSpec separators tuned for their syntax โ€” no Tree-sitter grammar available for these in CocoIndex.
  • Text fallback: Languages not recognized by either tier (Markdown, JSON, YAML, TOML, etc.) are split on blank lines and whitespace boundaries.

In short: CocoIndex's Tree-sitter tells you where to cut; the .scm files tell you what's inside each piece.

Independently of chunking, CocoSearch runs its own Tree-sitter queries (.scm files in src/cocosearch/indexer/queries/) to extract symbol metadata โ€” function, class, method, and interface names and signatures. This powers --symbol-type and --symbol-name filtering. Symbol extraction is available for 14 languages.

See Adding Languages for details on how these tiers work and how to add new languages or grammars.

Supported Grammars

Beyond language-level support, CocoSearch recognizes grammars โ€” domain-specific schemas within a base language. A language is matched by file extension (e.g., .yaml -> YAML), while a grammar is matched by file path and content patterns (e.g., .github/workflows/ci.yml containing on: + jobs: -> GitHub Actions). Grammars provide structured chunking and richer metadata compared to generic text chunking.

โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“
โ”ƒ Grammar        โ”ƒ File Format โ”ƒ Path Patterns                                                                    โ”ƒ
โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ
โ”‚ docker-compose โ”‚ yaml        โ”‚ docker-compose*.yml, docker-compose*.yaml, compose*.yml, compose*.yaml           โ”‚
โ”‚ github-actions โ”‚ yaml        โ”‚ .github/workflows/*.yml, .github/workflows/*.yaml                                โ”‚
โ”‚ gitlab-ci      โ”‚ yaml        โ”‚ .gitlab-ci.yml                                                                   โ”‚
โ”‚ helm-template  โ”‚ gotmpl      โ”‚ **/templates/*.yaml, **/templates/**/*.yaml, **/templates/*.yml,                 โ”‚
โ”‚                โ”‚             โ”‚ **/templates/**/*.yml                                                            โ”‚
โ”‚ helm-values    โ”‚ yaml        โ”‚ **/values.yaml, **/values-*.yaml                                                 โ”‚
โ”‚ kubernetes     โ”‚ yaml        โ”‚ *.yaml, *.yml                                                                    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
How grammar matching works

Priority: Grammar match > Language match > TextHandler fallback.

A grammar is matched by file path patterns and optionally by content patterns. For example, a YAML file at .github/workflows/ci.yml containing on: + jobs: is recognized as GitHub Actions, not generic YAML. This enables structured chunking by job/step and richer metadata extraction (job names, service names, stages).

Configuration

Create cocosearch.yaml in your project root to customize indexing:

indexing:
  # See also https://cocoindex.io/docs/ops/functions#supported-languages
  include_patterns:
    - "*.py"
    - "*.js"
    - "*.ts"
    - "*.go"
    - "*.rs"
  exclude_patterns:
    - "*_test.go"
    - "*.min.js"
  chunk_size: 1000 # bytes
  chunk_overlap: 300 # bytes

Testing

Tests use pytest. All tests are unit tests, fully mocked, and require no infrastructure. Markers are auto-applied based on directory -- no need to add them manually.

uv run pytest                                          # Run all unit tests
uv run pytest tests/unit/search/test_cache.py -v       # Single file
uv run pytest -k "test_rrf_double_match" -v            # Single test by name
uv run pytest tests/unit/handlers/ -v                  # Handler tests

Troubleshooting

Dashboard shows "Indexing" but CLI shows "Indexed"

The web dashboard and CLI now share a status sync mechanism: when the dashboard detects a live indexing thread, it corrects the database status so both interfaces agree. If you still see a discrepancy, check whether indexing is genuinely running (CPU usage, docker stats for Ollama activity).

Index appears stuck in "Indexing" status

After 1 hour with no progress updates, the status auto-recovers to "Indexed". You can also run cocosearch index . again to force a fresh index, which will reset the status.

High CPU after indexing appears complete

Ollama may still be processing embeddings in its queue. Check with docker stats or ps aux | grep ollama. CocoIndex may also perform background cleanup after the main indexing loop finishes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cocosearch-0.1.4.tar.gz (161.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cocosearch-0.1.4-py3-none-any.whl (195.8 kB view details)

Uploaded Python 3

File details

Details for the file cocosearch-0.1.4.tar.gz.

File metadata

  • Download URL: cocosearch-0.1.4.tar.gz
  • Upload date:
  • Size: 161.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.3 {"installer":{"name":"uv","version":"0.10.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for cocosearch-0.1.4.tar.gz
Algorithm Hash digest
SHA256 127f27d5a8e5ea4c7319ea46ecc074f4313ef7b9b447011f5a5e57b7d7ff9a84
MD5 a9e7da8aa4008f310408adc27453f600
BLAKE2b-256 758a015245821dd5dade9f06ab7c75ad6d6f62410312e8eb35f5c86a841173ce

See more details on using hashes here.

File details

Details for the file cocosearch-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: cocosearch-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 195.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.3 {"installer":{"name":"uv","version":"0.10.3","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for cocosearch-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 09e2746455b36b7dba4111cc1a3d4e33dc1f733eb7442132c81c0b71042e6ffd
MD5 e6a61eff1719e2534e4c298d7926105c
BLAKE2b-256 5663e96ee8577ee7c3d1847caed9b794ec335b8d070cd7ce28c4477e2aca070e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page