Codebase indexing and semantic search using tree-sitter parsing, vector embeddings, and SQLite graph storage.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

trailhead

Command-line code indexing and semantic search tool. It parses source files into a property graph (modules, classes, functions, and their relationships) stored in SQLite, generates text embeddings with sentence-transformers, and exposes everything through a CLI and HTTP API.

Single CLI command: th
Text embeddings powered by sentence-transformers (models cached locally)
Polyglot code indexing via tree-sitter (Python built-in; 12 additional languages optional)
Property graph persisted in a single SQLite file with optional vector search
Warm-model FastAPI server keeps the embedding model loaded in memory
Background file watcher incrementally re-indexes on change
Interactive browser UI for querying and visualizing the code graph

Requirements

Python 3.10+

Install

pip install trailhead

Language support

Python is supported out of the box. Additional languages are installed as optional extras. Only the packages you install will be active; missing ones are silently skipped at startup.

Install individual languages:

pip install "trailhead[javascript]"
pip install "trailhead[typescript]"
pip install "trailhead[rust]"
pip install "trailhead[go]"
pip install "trailhead[java]"
pip install "trailhead[csharp]"
pip install "trailhead[c]"
pip install "trailhead[cpp]"
pip install "trailhead[ruby]"
pip install "trailhead[php]"
pip install "trailhead[bash]"
pip install "trailhead[html]"

Or install everything at once:

pip install "trailhead[all-languages]"

Development install

To get the latest unreleased changes, install directly from the repository:

git clone https://github.com/McIndi/trailhead.git
cd trailhead
python -m venv .venv
source .venv/bin/activate  # Windows: .\.venv\Scripts\Activate.ps1
pip install -e ".[dev]"

Extra	Language	File extensions
`python` (built-in)	Python	`.py`
`javascript`	JavaScript	`.js` `.mjs` `.cjs`
`typescript`	TypeScript / TSX	`.ts` `.tsx`
`rust`	Rust	`.rs`
`go`	Go	`.go`
`java`	Java	`.java`
`csharp`	C#	`.cs`
`c`	C	`.c` `.h`
`cpp`	C++	`.cpp` `.cc` `.cxx` `.hpp` `.hxx` `.h++`
`ruby`	Ruby	`.rb`
`php`	PHP	`.php`
`bash`	Bash / Shell	`.sh` `.bash`
`html`	HTML	`.html` `.htm`

Quick start

The typical workflow is: index your source tree once, then serve and query it.

# 1. Index a project (writes .trailhead/db.sqlite by default)
th index . --sqlite-db ./.trailhead/graph.db --embed-model sentence-transformers/all-MiniLM-L6-v2

# 2. Start the server (watches for changes, keeps embeddings warm)
th serve . --sqlite-db ./.trailhead/graph.db --model sentence-transformers/all-MiniLM-L6-v2

# 3. Open the browser UI
start http://localhost:8000

# 4. Or query from the CLI or another terminal
th query similar "HTTP route registration"
curl "http://localhost:8000/api/query/similar?text=HTTP+route+registration"

The server re-indexes changed files automatically in the background. You do not need to re-run th index while the server is running.

Usage

embed

Generate an embedding for a piece of text:

th embed "A short sentence to embed"
th embed "A short sentence to embed" --model sentence-transformers/all-mpnet-base-v2

The command prints the embedding as a JSON array of floats.

Optional cache override:

$env:TRAILHEAD_CACHE_DIR = "C:\models\cache"
th embed "A short sentence to embed"
th embed "A short sentence to embed" --cache-dir "C:\another\cache"

index

Index a directory of source files. The graph is persisted to .trailhead/db.sqlite by default (smart sync: full build on first run, incremental on subsequent runs):

th index .

Use --in-memory to build the graph without writing to disk and print a summary:

th index . --in-memory
th index . --in-memory --output json

Preview which files would be indexed without parsing or writing any SQLite state:

th index . --dry-run
th index . --dry-run --output json

Watch for file changes and reindex incrementally (Ctrl-C to stop):

th index . --watch
th index . --sqlite-db ./.trailhead/graph.db --embed-model sentence-transformers/all-MiniLM-L6-v2 --watch

Use a custom database path or add embeddings:

th index . --sqlite-db ./.trailhead/graph.db
th index . --sqlite-db ./.trailhead/graph.db --embed-model sentence-transformers/all-MiniLM-L6-v2
th index . --sqlite-db ./.trailhead/graph.db --embed-model sentence-transformers/all-MiniLM-L6-v2 --embed-cache-dir C:\models\cache

When sqlite-vector can be loaded, trailhead also initializes vector search for the vertex_embeddings.embedding column. If extension loading is unavailable on your platform build, embeddings are still stored as Float32 BLOBs in SQLite.

Source discovery respects .gitignore and .trailheadignore in the indexed directory. Both use gitignore-style patterns, and .trailheadignore is applied after .gitignore so Trailhead-specific rules take precedence. If you change either ignore file, delete .trailhead/db.sqlite and run th index again to rebuild the index with the new rules.

th index --dry-run --output json returns a file-preview payload with this schema:

{
  "root": "C:/path/to/project",
  "count": 2,
  "files": ["src/app.py", "src/lib/util.py"]
}

serve

Run the warm-model API server with a background indexer. The server watches the source tree, keeps the SQLite graph fresh, and reuses the loaded embedding model across index updates. The database defaults to .trailhead/db.sqlite under the watched directory:

th serve .
th serve . --model sentence-transformers/all-MiniLM-L6-v2
th serve . --sqlite-db ./.trailhead/graph.db --model sentence-transformers/all-MiniLM-L6-v2

The browser UI is available at http://localhost:8000 once the server starts.

query

Run a read-only SQL query against the SQLite database (defaults to ./.trailhead/db.sqlite):

th query sql --sql "SELECT label, COUNT(*) AS n FROM vertices GROUP BY label ORDER BY label"
th query sql --sqlite-db ./.trailhead/graph.db --sql "SELECT label, COUNT(*) AS n FROM vertices GROUP BY label ORDER BY label"

Run a semantic similarity query against stored vertex embeddings:

th query similar "find sqlite vector initialization code"
th query similar "find sqlite vector initialization code" --sqlite-db ./.trailhead/graph.db
th query similar "graph persistence" --sqlite-db ./.trailhead/graph.db --label function --k 5 --output json

HTTP API

When the server is running, the full API schema is available at:

http://localhost:8000/openapi.json
http://localhost:8000/docs

The schema documents every endpoint, parameter name, type, default, and constraint. Check it first before probing endpoints manually.

Endpoints

Method	Path	Description
`GET`	`/`	Browser UI
`GET`	`/api/health`	Server status and configuration
`POST`	`/api/embed`	Embed a single text string
`POST`	`/api/embed/batch`	Embed multiple texts
`POST`	`/api/query/sql`	Run a read-only SQL query
`GET`	`/api/query/templates`	List built-in query templates
`GET`	`/api/query/templates/{name}`	Get a template's SQL
`POST`	`/api/query/templates/{name}/run`	Run a template against the database
`GET`	`/api/query/similar`	Semantic similarity search (parameter: `text`, `k`)
`GET`	`/api/graph/vertices`	Search vertices by name, label, or path
`GET`	`/api/graph/traverse`	Traverse the graph from a vertex

SQL schema

The two core tables are:

vertices — one row per code symbol:

Column	Type	Notes
`id`	TEXT	UUID, used for graph traversal
`label`	TEXT	`module`, `class`, `function`, `external`
`name`	TEXT	Symbol name
`path`	TEXT	Absolute file path
`line`	INTEGER	Line number (null for modules)
`complexity`	INTEGER	McCabe complexity (functions only)
`properties_json`	TEXT	JSON blob with `source`, `docstring`, and all other properties

edges — relationships between vertices:

Column	Type	Notes
`id`	TEXT	UUID
`label`	TEXT	`defines`, `has_method`, `imports`, `calls`
`out_v_id`	TEXT	Source vertex id
`in_v_id`	TEXT	Target vertex id
`properties_json`	TEXT	Always `{}` currently

Edge labels and their meaning:

Label	Meaning
`defines`	Module → class or function it defines
`has_method`	Class → method
`imports`	Module → external symbol it imports
`calls`	Function → function it calls

source and docstring live inside properties_json rather than as top-level columns. To filter on them in SQL, use json_extract:

-- Functions whose source mentions "HTTPException"
SELECT name, path, line
FROM vertices
WHERE label = 'function'
  AND json_extract(properties_json, '$.source') LIKE '%HTTPException%'

-- Functions with a docstring
SELECT name, path
FROM vertices
WHERE label = 'function'
  AND json_extract(properties_json, '$.docstring') IS NOT NULL

HTTP query examples

# Health check
curl http://localhost:8000/api/health

# Embed text
curl -X POST http://localhost:8000/api/embed \
  -H "Content-Type: application/json" \
  -d '{"text": "hello world"}'

# Semantic search — note the parameter is "k", not "limit"
curl "http://localhost:8000/api/query/similar?text=route+registration&k=5"

# Filter semantic search to functions only
curl "http://localhost:8000/api/query/similar?text=route+registration&k=5&label=function"

# SQL query
curl -X POST http://localhost:8000/api/query/sql \
  -H "Content-Type: application/json" \
  -d '{"sql": "SELECT label, COUNT(*) AS n FROM vertices GROUP BY label"}'

# Find a vertex by name, then get its id for traversal
curl "http://localhost:8000/api/graph/vertices?name=ui_dashboard&label=function"

# Traverse outward along call edges only (shows what a function calls)
curl "http://localhost:8000/api/graph/traverse?vertex_id=<id>&direction=out&depth=2&edge_labels=calls"

# Traverse inward along call edges only (shows what calls a function)
curl "http://localhost:8000/api/graph/traverse?vertex_id=<id>&direction=in&depth=2&edge_labels=calls"

# Run a built-in query template
curl http://localhost:8000/api/query/templates
curl -X POST http://localhost:8000/api/query/templates/function_complexity/run

Built-in query templates

Templates are pre-built SQL queries runnable without writing any SQL. Categories:

Category	Templates
`quality`	`function_complexity`, `missing_docstrings`, `undocumented_public_api`, `todo_fixme_inventory`
`testing`	`symbols_not_represented_by_tests`, `test_coverage_ratio_by_file`, `largest_untested_symbols`
`architecture`	`duplicate_symbol_names`, `dependency_hotspots`, `external_dependency_pressure`
`calls`	`most_called_functions`, `call_graph_hubs`
`data_health`	`missing_source_for_functions`, `orphan_edges`

Typical workflow for code exploration

Find a starting point — use semantic search or /api/graph/vertices?name=... to locate a vertex and grab its id.
Understand its call chain — traverse outward with edge_labels=calls to see what it calls; inward to see its callers.
Understand its structure — traverse with edge_labels=defines,has_method to see what a module or class contains.
Run quality checks — use the built-in templates for complexity, missing docs, or dependency hotspots without writing SQL.
Ad-hoc queries — use /api/query/sql with json_extract to filter on source content, docstrings, or any property.

Tests

pytest

Project Layout

.
|-- pyproject.toml
|-- README.md
|-- src/
|   `-- trailhead/
|       |-- __init__.py
|       |-- __main__.py
|       |-- cli/
|       |   |-- __init__.py
|       |   |-- __main__.py
|       |   |-- app.py
|       |   `-- commands/
|       |       |-- __init__.py
|       |       |-- embed.py
|       |       |-- index.py
|       |       |-- query.py
|       |       `-- serve.py
|       |-- server/
|       |   |-- __init__.py
|       |   |-- __main__.py
|       |   |-- app.py
|       |   `-- templates/
|       |       `-- query_ui.html
|       `-- services/
|           |-- config/
|           |   `-- cache.py
|           |-- indexing/
|           |   |-- __init__.py
|           |   |-- graph.py
|           |   |-- graph_query.py
|           |   |-- live_indexer.py
|           |   |-- parser.py          # re-exports parse_python_file (backwards compat)
|           |   |-- query.py
|           |   |-- sqlite_store.py
|           |   |-- walker.py
|           |   `-- adapters/          # language adapter registry
|           |       |-- __init__.py    # auto-registers available adapters
|           |       |-- base.py        # LanguageAdapter ABC + shared utilities
|           |       |-- registry.py    # extension → adapter map, parse_file()
|           |       |-- python.py      # Python (built-in)
|           |       |-- javascript.py  # JavaScript (optional)
|           |       |-- typescript.py  # TypeScript / TSX (optional)
|           |       |-- rust.py        # Rust (optional)
|           |       |-- go.py          # Go (optional)
|           |       |-- java.py        # Java (optional)
|           |       |-- csharp.py      # C# (optional)
|           |       |-- c.py           # C (optional)
|           |       |-- cpp.py         # C++ (optional)
|           |       |-- ruby.py        # Ruby (optional)
|           |       |-- php.py         # PHP (optional)
|           |       |-- bash.py        # Bash / Shell (optional)
|           |       `-- html.py        # HTML (optional)
|           `-- embeddings/
|               |-- generator.py
|               `-- model_store.py
`-- tests/
    |-- conftest.py
    |-- test_indexing.py
    |-- test_query.py
    |-- test_server.py
    `-- test_smoke.py

Adding a custom language adapter

Any language with a tree-sitter Python binding can be supported in three steps:

# 1. Create your adapter (e.g. my_adapters/kotlin.py)
from trailhead.services.indexing.adapters.base import LanguageAdapter, _node_text, _complexity
from trailhead.services.indexing.graph import PropertyGraph, Vertex
from pathlib import Path

class KotlinAdapter(LanguageAdapter):
    extensions = frozenset({".kt", ".kts"})

    @classmethod
    def is_available(cls) -> bool:
        try:
            import tree_sitter_kotlin  # noqa: F401
            return True
        except ImportError:
            return False

    def parse(self, path: Path, graph: PropertyGraph) -> Vertex:
        import tree_sitter_kotlin as tskotlin
        from tree_sitter import Language, Parser
        source = path.read_bytes()
        language = Language(tskotlin.language())
        parser = Parser(language)
        tree = parser.parse(source)
        module_v = graph.add_vertex("module", name=path.stem, path=str(path))
        # ... walk tree and add vertices/edges ...
        return module_v

# 2. Register it at startup (e.g. in your app's __init__ or conftest)
from trailhead.services.indexing.adapters import register
register(KotlinAdapter())

# 3. Done — th index, serve, and query all pick it up automatically.

What each adapter should produce:

Vertex label	Meaning	Required properties
`module`	one per source file	`name`, `path`
`class`	class / struct / interface / trait	`name`, `path`, `line`
`function`	function / method	`name`, `path`, `line`, `source`, `complexity`
`external`	imported module name	`name`

Edges: defines (module→class, module→function), has_method (class→function), imports (module→external), calls (function→function).

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

notesofcliff

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.2

Apr 7, 2026

0.1.1

Apr 2, 2026

0.1.0

Apr 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

trailhead-0.1.2.tar.gz (67.4 kB view details)

Uploaded Apr 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

trailhead-0.1.2-py3-none-any.whl (93.0 kB view details)

Uploaded Apr 7, 2026 Python 3

File details

Details for the file trailhead-0.1.2.tar.gz.

File metadata

Download URL: trailhead-0.1.2.tar.gz
Upload date: Apr 7, 2026
Size: 67.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for trailhead-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`94f09db7367f4f7b4699652bef2ec32c9987c66f86d0ebfaaa08d9eb3979f409`
MD5	`f107de85102dd4c1674615426b4cd991`
BLAKE2b-256	`3b84a93d5fe26a8d8426d29af7a3164b8bf5cd887816bdc112f89aa0375e62c8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for trailhead-0.1.2.tar.gz:

Publisher: publish.yml on McIndi/trailhead

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: trailhead-0.1.2.tar.gz
- Subject digest: 94f09db7367f4f7b4699652bef2ec32c9987c66f86d0ebfaaa08d9eb3979f409
- Sigstore transparency entry: 1247101846
- Sigstore integration time: Apr 7, 2026
Source repository:
- Permalink: McIndi/trailhead@e91ba0ff32460357785eee4fcf73f46db518ec93
- Branch / Tag: refs/heads/main
- Owner: https://github.com/McIndi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@e91ba0ff32460357785eee4fcf73f46db518ec93
- Trigger Event: workflow_dispatch

File details

Details for the file trailhead-0.1.2-py3-none-any.whl.

File metadata

Download URL: trailhead-0.1.2-py3-none-any.whl
Upload date: Apr 7, 2026
Size: 93.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for trailhead-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e966de326c0bf494a2571af3df7f8ff81e1aca289aff53f6cf4eefea4f44b1f7`
MD5	`f1a26e1d406f98758184704a50b29d27`
BLAKE2b-256	`395fbd678b2eac4f7d44ca3974d1ec1b1f0436e9f11fcc07a22a2d609eb080a8`

See more details on using hashes here.

Provenance

The following attestation bundles were made for trailhead-0.1.2-py3-none-any.whl:

Publisher: publish.yml on McIndi/trailhead

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: trailhead-0.1.2-py3-none-any.whl
- Subject digest: e966de326c0bf494a2571af3df7f8ff81e1aca289aff53f6cf4eefea4f44b1f7
- Sigstore transparency entry: 1247101852
- Sigstore integration time: Apr 7, 2026
Source repository:
- Permalink: McIndi/trailhead@e91ba0ff32460357785eee4fcf73f46db518ec93
- Branch / Tag: refs/heads/main
- Owner: https://github.com/McIndi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@e91ba0ff32460357785eee4fcf73f46db518ec93
- Trigger Event: workflow_dispatch

trailhead 0.1.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

trailhead

Requirements

Install

Language support

Development install

Quick start

Usage

embed

index

serve

query

HTTP API

Endpoints

SQL schema

HTTP query examples

Built-in query templates

Typical workflow for code exploration

Tests

Project Layout

Adding a custom language adapter

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance