Semantic code indexing and memory for AI coding agents
Project description
ultrasync
Semantic indexing and search for codebases. Exposes an MCP server for integration with coding agents.
TLDR
(not written by AI)
- Semantic, lexical, and RRF indexing and search for codebases.
- Results in sub-second response times for most search queries on indexed codebases.
- Index and blob data stored in
.ultrasyncdirectory, be sure to gitignore.
- Structured memory and recall tools, without:
- Additional LLM calls
- Convoluted, vibe slopped together abstractions or interfaces
- Pollution of repository with arbitrary "human readable" Markdown slop files
- Network-speed pattern recognition based heuristics for classification
of contexts and codebase insights (like inferring TODOs, comments,
code smells, etc.).
- Intentionally fuzzy heuristics to improve p50 performance dramatically and reduce token spend for some of the most common codebase understanding tasks.
- Exposes an MCP server for integration with coding agents, virtually no configuration required. No additional processes. Fully local.
Quickstart
uv tool install "ultrasync-mcp[cli,lexical,secrets]"
# or, if you have sync:
uv tool install "ultrasync-mcp[cli,lexical,secrets,sync]"
Then update your MCP server configuration for your coding agent of (currently Claude Code and Codex supported), e.g.:
{
"ultrasync": {
"type": "stdio",
"command": "uv",
"args": [
"tool",
"run",
"--from",
"ultrasync-mcp",
"ultrasync",
"mcp"
],
"env": {
// Uncomment below if using remote sync
// "ULTRASYNC_REMOTE_SYNC": "true",
// "ULTRASYNC_SYNC_URL": "https://mcp.ultrasync.dev",
// "ULTRASYNC_SYNC_TOKEN": "uss_...,
// ULTRASYNC_TOOLS defaults to search,memory
// If using remote sync, add "sync" to the list
"ULTRASYNC_TOOLS": "search,memory,sync"
}
}
}
Features
Indexing Architecture
Two-layer JIT + AOT system:
- JIT (Just-In-Time): On-demand file indexing with change detection via mtime/content hash. Lazy embedding computation with persistent vector caching. Checkpointed progress for resumable large jobs.
- AOT (Ahead-Of-Time): Rust-based
GlobalIndexwith mmappedindex.dat(open-addressing hash table, 24-byte buckets) andblob.dat(raw contiguous bytes). Zero-copy slices for Hyperscan pattern scanning.
Storage Layer
- LMDB Tracker: Persists file/symbol/memory metadata with blob offsets, vector offsets, content hashes, context classifications, and session thread associations.
- Blob Storage: Append-only source code storage with atomic file locking (fcntl) for multi-process safety.
- Vector Storage: Persistent append-only
vectors.datwith compaction support. Waste diagnostics track live/dead bytes and auto-compact at >25% waste and >1MB reclaimable. - Lexical Index: Optional Tantivy-backed BM25 full-text search with code-aware tokenization (snake_case, camelCase, PascalCase, kebab-case).
Search
Multi-strategy search engine:
- AOT index lookup (exact key match, sub-millisecond)
- Semantic vector search (cosine similarity)
- Lexical BM25 search (keyword/exact symbol)
- Grep fallback
- Opportunistic JIT indexing of discovered files
Search modes:
semantic: Vector similarity, best for conceptual querieshybrid: RRF combining semantic + lexical resultslexical: BM25 keyword matching, best for exact symbol names
Additional features: Recency biasing, threshold filtering, grep cache (semantic search over previous grep/glob results), memory integration (prior decisions/constraints).
Memory System
- Structured memories with taxonomy tagging (task, insights, context, symbol_keys) and semantic embeddings.
- Auto-extraction from transcripts via Hyperscan pattern matching (269 patterns across 18 categories: decision, bug, fix, constraint, tradeoff, pitfall, assumption, discovery, architecture, etc.).
- Deduplication prevents storing duplicate memories.
- LRU eviction with configurable max_memories (default 1000). Eviction scoring combines access frequency + recency + age.
Pattern Matching
- Hyperscan integration for high-performance bulk regex scanning on mmapped content.
- Context detection (24 types, no LLM required):
- Application: auth, frontend, backend, api, data, testing, ui, billing
- Infrastructure: iac, k8s, cloud-aws/azure/gcp, cicd, containers, gitops, observability, service-mesh, secrets, serverless, config-mgmt
- Anchor detection: Structural entry points (routes, models, schemas, validators, handlers, services, repositories, events, jobs, middleware) with line-level granularity.
- Insight extraction: Auto-detected markers (TODO, FIXME, HACK, BUG, NOTE, INVARIANT, ASSUMPTION, DECISION, CONSTRAINT, PITFALL, OPTIMIZE, DEPRECATED, SECURITY).
Classification
- Taxonomy-based classification with cosine similarity scoring between content and category keywords.
- 17 default categories: models, serialization, validation, core, handlers, services, config, logging, errors, caching, utils, io, networking, indexing, embedding, tests.
- Context index for ~1ms LMDB lookups via
files_by_context.
Graph Memory
- Nodes: file, symbol, decision, constraint, memory types with scoped storage (repo, session, task).
- Edges: Adjacency lists with O(1) neighbor lookup. Builtin relations: DEFINES, USES, DERIVES_FROM, CALLS, etc.
- Policy storage: Decisions, constraints, procedures as versioned key-value entries with temporal diff queries.
- Bootstrap: Auto-creates nodes/edges from FileTracker data.
Conventions
- Convention storage with semantic search, 10 categories (naming, style, pattern, security, performance, testing, architecture, documentation, accessibility, error-handling).
- Priority levels: required, recommended, optional.
- Pattern-based violation checking with line numbers.
- Auto-discovery from linter configs (eslint, biome, ruff, prettier, oxlint, etc.).
- Export/import (YAML/JSON) for team sharing.
Call Graph & IR Extraction
- Static call graph: Symbol nodes with definition locations, call sites with line numbers, caller/callee relationships.
- Stack-agnostic IR extraction:
- Entities: data models with fields, types, relationships
- Endpoints: HTTP routes with auth, schemas, business rules
- Flows: feature flows from route to data layer
- Jobs: background tasks and scheduled work
- Services: external integrations (Stripe, Resend, etc.)
- Flow tracing from endpoint through call graph to data layer.
- Markdown export for LLM consumption.
Session Threads
- Thread routing via centroid embeddings with similarity- based assignment.
- Context tracking: files accessed, user queries, tool usage.
- Transcript watching with multi-agent support (Claude Code, Codex). Leader election (fcntl lock) for single watcher per project.
- Search learning: Tracks weak searches, indexes files from grep/glob fallbacks, builds query-file associations.
MCP Server
70+ tools exposed via Model Context Protocol:
- Indexing:
index_file,index_directory,full_index,add_symbol,reindex_file,delete_file/symbol/memory - Search:
search,memory_search,search_grep_cache,list_contexts,files_by_context,list_insights,insights_by_type - Memory:
memory_write_structured,memory_search_structured,memory_get,memory_list_structured - Session threads:
session_thread_list/get/search_queries,session_thread_for_file,session_thread_stats - Patterns:
pattern_load,pattern_scan,pattern_scan_memories,pattern_list - Anchors:
anchor_list_types,anchor_scan_file,anchor_scan_indexed,anchor_find_files - Conventions:
convention_add/list/search/get/delete,convention_for_context,convention_check,convention_discover,convention_export/import - IR:
ir_extract,ir_trace_endpoint,ir_summarize - Graph:
graph_put/get_node,graph_put/delete_edge,graph_get_neighbors,graph_put/get/list_kv,graph_diff_since,graph_bootstrap,graph_relations - Utilities:
get_stats,recently_indexed,compute_hash,get_source,compact_vectors,watcher_start/stop/reprocess
Tool Categories
By default, only essential tools are loaded to reduce noise in agent
tool lists. Control which tools are exposed via the ULTRASYNC_TOOLS
environment variable:
# Default: search + memory only (recommended for most use cases)
ULTRASYNC_TOOLS=search,memory
# Enable all 70+ tools
ULTRASYNC_TOOLS=all
# Enable specific categories
ULTRASYNC_TOOLS=search,memory,index,sync
Available categories:
| Category | Tools |
|---|---|
search |
search, get_source |
memory |
memory_write, memory_search, memory_get, ... |
index |
index_file, index_directory, full_index, ... |
watcher |
watcher_stats, watcher_start/stop/reprocess |
sync |
sync_connect, sync_status, sync_push_*, ... |
session |
session_thread_list/get/search_queries, ... |
patterns |
pattern_load, pattern_scan, pattern_list |
anchors |
anchor_list_types, anchor_scan_*, ... |
conventions |
convention_add/list/search/get/delete, ... |
ir |
ir_extract, ir_trace_endpoint, ir_summarize |
graph |
graph_put/get_node, graph_*_edge, ... |
context |
search_grep_cache, list_contexts, ... |
Installation
We recommend installing ultrasync as a tool with uv tool or uvx.
# install with CLI and lexical+hybrid search support (recommended)
uv tool install ultrasync[cli,lexical]
Currently Supported Agents
- Claude Code
- OpenAI Codex
- Others coming soon
MCP Installation
Add the following to your mcpServers or equivalent configuration:
{
"ultrasync": {
"type": "stdio",
"command": "/path/to/uv",
"args": [
"tool",
"run",
"ultrasync",
"mcp"
]
}
}
To enable additional tool categories, add the env field:
{
"ultrasync": {
"type": "stdio",
"command": "/path/to/uv",
"args": ["tool", "run", "ultrasync", "mcp"],
"env": {
"ULTRASYNC_TOOLS": "search,memory,index,sync"
}
}
}
Usage
# Start MCP server
uv tool run ultrasync serve
# Index a directory
uv tool run ultrasync index .
# Interactive TUI
uv tool run ultrasync voyager
Development
# Install with dev dependencies
uv sync --group dev
# Build Rust extension
uv run maturin develop -m crates/ultrasync_index/Cargo.toml
# Install pre-commit hooks (using prek - faster rust-based runner)
cargo install prek
prek install
# Run hooks manually
prek run --all-files
# Lint and format (also runs via pre-commit)
ruff check src/ultrasync
ruff format src/ultrasync
cargo fmt --manifest-path crates/ultrasync_index/Cargo.toml
cargo clippy --manifest-path crates/ultrasync_index/Cargo.toml
# Run tests
uv run pytest tests/ -v
cargo test --manifest-path crates/ultrasync_index/Cargo.toml
Team Sync
Private, self-hosted memory and context sharing for development teams. Centralized convention management, shared decision/constraint policies, and cross-session knowledge persistence without external dependencies.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ultrasync_mcp-1.0.0.tar.gz.
File metadata
- Download URL: ultrasync_mcp-1.0.0.tar.gz
- Upload date:
- Size: 354.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d91ed829ad66e04c7886a1d00081a6ebf838ba45185273e0b235e72c130b67bc
|
|
| MD5 |
a2b917025f5c05fdc4fb2e5fc9632baf
|
|
| BLAKE2b-256 |
88592c9a578857534dcd861bae3c551e30e25dd85b26a16c4869695e4c86806a
|
Provenance
The following attestation bundles were made for ultrasync_mcp-1.0.0.tar.gz:
Publisher:
publish.yml on darvid/ultrasync
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ultrasync_mcp-1.0.0.tar.gz -
Subject digest:
d91ed829ad66e04c7886a1d00081a6ebf838ba45185273e0b235e72c130b67bc - Sigstore transparency entry: 786336113
- Sigstore integration time:
-
Permalink:
darvid/ultrasync@6332a554265987abad946e27751744eb2a6801e8 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/darvid
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6332a554265987abad946e27751744eb2a6801e8 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file ultrasync_mcp-1.0.0-py3-none-any.whl.
File metadata
- Download URL: ultrasync_mcp-1.0.0-py3-none-any.whl
- Upload date:
- Size: 393.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bc19502beb7ae3cb825e38e98a04baec60da0b1e08a8c207c8e7ec5cf31b6cb3
|
|
| MD5 |
05a69507e3119b03760cb1acad585d58
|
|
| BLAKE2b-256 |
fd13f650d7b27b461dde37e02bd70e1668684dfa9bf2c75236f0fd45af62983c
|
Provenance
The following attestation bundles were made for ultrasync_mcp-1.0.0-py3-none-any.whl:
Publisher:
publish.yml on darvid/ultrasync
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ultrasync_mcp-1.0.0-py3-none-any.whl -
Subject digest:
bc19502beb7ae3cb825e38e98a04baec60da0b1e08a8c207c8e7ec5cf31b6cb3 - Sigstore transparency entry: 786336125
- Sigstore integration time:
-
Permalink:
darvid/ultrasync@6332a554265987abad946e27751744eb2a6801e8 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/darvid
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@6332a554265987abad946e27751744eb2a6801e8 -
Trigger Event:
workflow_dispatch
-
Statement type: