Semantic code indexing and memory for AI coding agents

These details have not been verified by PyPI

Project links

Project description

ultrasync

Semantic indexing and search for codebases. Exposes an MCP server for integration with coding agents.

TLDR

(not written by AI)

Semantic, lexical, and RRF indexing and search for codebases.
- Results in sub-second response times for most search queries on indexed codebases.
- Index and blob data stored in .ultrasync directory, be sure to gitignore.
Structured memory and recall tools, without:
- Additional LLM calls
- Convoluted, vibe slopped together abstractions or interfaces
- Pollution of repository with arbitrary "human readable" Markdown slop files
Network-speed pattern recognition based heuristics for classification of contexts and codebase insights (like inferring TODOs, comments, code smells, etc.).
- Intentionally fuzzy heuristics to improve p50 performance dramatically and reduce token spend for some of the most common codebase understanding tasks.
Exposes an MCP server for integration with coding agents, virtually no configuration required. No additional processes. Fully local.

Quickstart

uv tool install "ultrasync-mcp[cli,lexical,secrets]"
# or, if you have sync:
uv tool install "ultrasync-mcp[cli,lexical,secrets,sync]"

Then update your MCP server configuration for your coding agent of (currently Claude Code and Codex supported), e.g.:

{
  "ultrasync": {
    "type": "stdio",
    "command": "uv",
    "args": [
      "tool",
      "run",
      "--from",
      "ultrasync-mcp",
      "ultrasync",
      "mcp"
    ],
    "env": {
      // Uncomment below if using remote sync
      // "ULTRASYNC_REMOTE_SYNC": "true",
      // "ULTRASYNC_SYNC_URL": "https://mcp.ultrasync.dev",
      // "ULTRASYNC_SYNC_TOKEN": "uss_...,

      // ULTRASYNC_TOOLS defaults to search,memory
      // If using remote sync, add "sync" to the list
      "ULTRASYNC_TOOLS": "search,memory,sync"
    }
  }
}

Features

Indexing Architecture

Two-layer JIT + AOT system:

JIT (Just-In-Time): On-demand file indexing with change detection via mtime/content hash. Lazy embedding computation with persistent vector caching. Checkpointed progress for resumable large jobs.
AOT (Ahead-Of-Time): Rust-based GlobalIndex with mmapped index.dat (open-addressing hash table, 24-byte buckets) and blob.dat (raw contiguous bytes). Zero-copy slices for Hyperscan pattern scanning.

Storage Layer

LMDB Tracker: Persists file/symbol/memory metadata with blob offsets, vector offsets, content hashes, context classifications, and session thread associations.
Blob Storage: Append-only source code storage with atomic file locking (fcntl) for multi-process safety.
Vector Storage: Persistent append-only vectors.dat with compaction support. Waste diagnostics track live/dead bytes and auto-compact at >25% waste and >1MB reclaimable.
Lexical Index: Optional Tantivy-backed BM25 full-text search with code-aware tokenization (snake_case, camelCase, PascalCase, kebab-case).

Search

Multi-strategy search engine:

AOT index lookup (exact key match, sub-millisecond)
Semantic vector search (cosine similarity)
Lexical BM25 search (keyword/exact symbol)
Grep fallback
Opportunistic JIT indexing of discovered files

Search modes:

semantic: Vector similarity, best for conceptual queries
hybrid: RRF combining semantic + lexical results
lexical: BM25 keyword matching, best for exact symbol names

Additional features: Recency biasing, threshold filtering, grep cache (semantic search over previous grep/glob results), memory integration (prior decisions/constraints).

Memory System

Structured memories with taxonomy tagging (task, insights, context, symbol_keys) and semantic embeddings.
Auto-extraction from transcripts via Hyperscan pattern matching (269 patterns across 18 categories: decision, bug, fix, constraint, tradeoff, pitfall, assumption, discovery, architecture, etc.).
Deduplication prevents storing duplicate memories.
LRU eviction with configurable max_memories (default 1000). Eviction scoring combines access frequency + recency + age.

Pattern Matching

Hyperscan integration for high-performance bulk regex scanning on mmapped content.
Context detection (24 types, no LLM required):
- Application: auth, frontend, backend, api, data, testing, ui, billing
- Infrastructure: iac, k8s, cloud-aws/azure/gcp, cicd, containers, gitops, observability, service-mesh, secrets, serverless, config-mgmt
Anchor detection: Structural entry points (routes, models, schemas, validators, handlers, services, repositories, events, jobs, middleware) with line-level granularity.
Insight extraction: Auto-detected markers (TODO, FIXME, HACK, BUG, NOTE, INVARIANT, ASSUMPTION, DECISION, CONSTRAINT, PITFALL, OPTIMIZE, DEPRECATED, SECURITY).

Classification

Taxonomy-based classification with cosine similarity scoring between content and category keywords.
17 default categories: models, serialization, validation, core, handlers, services, config, logging, errors, caching, utils, io, networking, indexing, embedding, tests.
Context index for ~1ms LMDB lookups via files_by_context.

Graph Memory

Nodes: file, symbol, decision, constraint, memory types with scoped storage (repo, session, task).
Edges: Adjacency lists with O(1) neighbor lookup. Builtin relations: DEFINES, USES, DERIVES_FROM, CALLS, etc.
Policy storage: Decisions, constraints, procedures as versioned key-value entries with temporal diff queries.
Bootstrap: Auto-creates nodes/edges from FileTracker data.

Conventions

Convention storage with semantic search, 10 categories (naming, style, pattern, security, performance, testing, architecture, documentation, accessibility, error-handling).
Priority levels: required, recommended, optional.
Pattern-based violation checking with line numbers.
Auto-discovery from linter configs (eslint, biome, ruff, prettier, oxlint, etc.).
Export/import (YAML/JSON) for team sharing.

Call Graph & IR Extraction

Static call graph: Symbol nodes with definition locations, call sites with line numbers, caller/callee relationships.
Stack-agnostic IR extraction:
- Entities: data models with fields, types, relationships
- Endpoints: HTTP routes with auth, schemas, business rules
- Flows: feature flows from route to data layer
- Jobs: background tasks and scheduled work
- Services: external integrations (Stripe, Resend, etc.)
Flow tracing from endpoint through call graph to data layer.
Markdown export for LLM consumption.

Session Threads

Thread routing via centroid embeddings with similarity- based assignment.
Context tracking: files accessed, user queries, tool usage.
Transcript watching with multi-agent support (Claude Code, Codex). Leader election (fcntl lock) for single watcher per project.
Search learning: Tracks weak searches, indexes files from grep/glob fallbacks, builds query-file associations.

MCP Server

70+ tools exposed via Model Context Protocol:

Indexing: index_file, index_directory, full_index, add_symbol, reindex_file, delete_file/symbol/memory
Search: search, memory_search, search_grep_cache, list_contexts, files_by_context, list_insights, insights_by_type
Memory: memory_write_structured, memory_search_structured, memory_get, memory_list_structured
Session threads: session_thread_list/get/search_queries, session_thread_for_file, session_thread_stats
Patterns: pattern_load, pattern_scan, pattern_scan_memories, pattern_list
Anchors: anchor_list_types, anchor_scan_file, anchor_scan_indexed, anchor_find_files
Conventions: convention_add/list/search/get/delete, convention_for_context, convention_check, convention_discover, convention_export/import
IR: ir_extract, ir_trace_endpoint, ir_summarize
Graph: graph_put/get_node, graph_put/delete_edge, graph_get_neighbors, graph_put/get/list_kv, graph_diff_since, graph_bootstrap, graph_relations
Utilities: get_stats, recently_indexed, compute_hash, get_source, compact_vectors, watcher_start/stop/reprocess

Tool Categories

By default, only essential tools are loaded to reduce noise in agent tool lists. Control which tools are exposed via the ULTRASYNC_TOOLS environment variable:

# Default: search + memory only (recommended for most use cases)
ULTRASYNC_TOOLS=search,memory

# Enable all 70+ tools
ULTRASYNC_TOOLS=all

# Enable specific categories
ULTRASYNC_TOOLS=search,memory,index,sync

Available categories:

Category	Tools
`search`	`search`, `get_source`
`memory`	`memory_write`, `memory_search`, `memory_get`, ...
`index`	`index_file`, `index_directory`, `full_index`, ...
`watcher`	`watcher_stats`, `watcher_start/stop/reprocess`
`sync`	`sync_connect`, `sync_status`, `sync_push_*`, ...
`session`	`session_thread_list/get/search_queries`, ...
`patterns`	`pattern_load`, `pattern_scan`, `pattern_list`
`anchors`	`anchor_list_types`, `anchor_scan_*`, ...
`conventions`	`convention_add/list/search/get/delete`, ...
`ir`	`ir_extract`, `ir_trace_endpoint`, `ir_summarize`
`graph`	`graph_put/get_node`, `graph_*_edge`, ...
`context`	`search_grep_cache`, `list_contexts`, ...

Installation

We recommend installing ultrasync as a tool with uv tool or uvx.

# install with CLI and lexical+hybrid search support (recommended)
uv tool install ultrasync[cli,lexical]

Currently Supported Agents

Claude Code
OpenAI Codex
Others coming soon

MCP Installation

Add the following to your mcpServers or equivalent configuration:

{
  "ultrasync": {
    "type": "stdio",
    "command": "/path/to/uv",
    "args": [
      "tool",
      "run",
      "ultrasync",
      "mcp"
    ]
  }
}

To enable additional tool categories, add the env field:

{
  "ultrasync": {
    "type": "stdio",
    "command": "/path/to/uv",
    "args": ["tool", "run", "ultrasync", "mcp"],
    "env": {
      "ULTRASYNC_TOOLS": "search,memory,index,sync"
    }
  }
}

Usage

# Start MCP server
uv tool run ultrasync serve

# Index a directory
uv tool run ultrasync index .

# Interactive TUI
uv tool run ultrasync voyager

Development

# Install with dev dependencies
uv sync --group dev

# Build Rust extension
uv run maturin develop -m crates/ultrasync_index/Cargo.toml

# Install pre-commit hooks (using prek - faster rust-based runner)
cargo install prek
prek install

# Run hooks manually
prek run --all-files

# Lint and format (also runs via pre-commit)
ruff check src/ultrasync
ruff format src/ultrasync
cargo fmt --manifest-path crates/ultrasync_index/Cargo.toml
cargo clippy --manifest-path crates/ultrasync_index/Cargo.toml

# Run tests
uv run pytest tests/ -v
cargo test --manifest-path crates/ultrasync_index/Cargo.toml

Team Sync

Private, self-hosted memory and context sharing for development teams. Centralized convention management, shared decision/constraint policies, and cross-session knowledge persistence without external dependencies.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.1.0

Jan 12, 2026

1.0.1

Jan 1, 2026

This version

1.0.0

Dec 31, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ultrasync_mcp-1.0.0.tar.gz (354.2 kB view details)

Uploaded Dec 31, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ultrasync_mcp-1.0.0-py3-none-any.whl (393.1 kB view details)

Uploaded Dec 31, 2025 Python 3

File details

Details for the file ultrasync_mcp-1.0.0.tar.gz.

File metadata

Download URL: ultrasync_mcp-1.0.0.tar.gz
Upload date: Dec 31, 2025
Size: 354.2 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ultrasync_mcp-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`d91ed829ad66e04c7886a1d00081a6ebf838ba45185273e0b235e72c130b67bc`
MD5	`a2b917025f5c05fdc4fb2e5fc9632baf`
BLAKE2b-256	`88592c9a578857534dcd861bae3c551e30e25dd85b26a16c4869695e4c86806a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ultrasync_mcp-1.0.0.tar.gz:

Publisher: publish.yml on darvid/ultrasync

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ultrasync_mcp-1.0.0.tar.gz
- Subject digest: d91ed829ad66e04c7886a1d00081a6ebf838ba45185273e0b235e72c130b67bc
- Sigstore transparency entry: 786336113
- Sigstore integration time: Dec 31, 2025
Source repository:
- Permalink: darvid/ultrasync@6332a554265987abad946e27751744eb2a6801e8
- Branch / Tag: refs/heads/main
- Owner: https://github.com/darvid
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6332a554265987abad946e27751744eb2a6801e8
- Trigger Event: workflow_dispatch

File details

Details for the file ultrasync_mcp-1.0.0-py3-none-any.whl.

File metadata

Download URL: ultrasync_mcp-1.0.0-py3-none-any.whl
Upload date: Dec 31, 2025
Size: 393.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ultrasync_mcp-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`bc19502beb7ae3cb825e38e98a04baec60da0b1e08a8c207c8e7ec5cf31b6cb3`
MD5	`05a69507e3119b03760cb1acad585d58`
BLAKE2b-256	`fd13f650d7b27b461dde37e02bd70e1668684dfa9bf2c75236f0fd45af62983c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ultrasync_mcp-1.0.0-py3-none-any.whl:

Publisher: publish.yml on darvid/ultrasync

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ultrasync_mcp-1.0.0-py3-none-any.whl
- Subject digest: bc19502beb7ae3cb825e38e98a04baec60da0b1e08a8c207c8e7ec5cf31b6cb3
- Sigstore transparency entry: 786336125
- Sigstore integration time: Dec 31, 2025
Source repository:
- Permalink: darvid/ultrasync@6332a554265987abad946e27751744eb2a6801e8
- Branch / Tag: refs/heads/main
- Owner: https://github.com/darvid
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6332a554265987abad946e27751744eb2a6801e8
- Trigger Event: workflow_dispatch

ultrasync-mcp 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ultrasync

TLDR

Quickstart

Features

Indexing Architecture

Storage Layer

Search

Memory System

Pattern Matching

Classification

Graph Memory

Conventions

Call Graph & IR Extraction

Session Threads

MCP Server

Tool Categories

Installation

Currently Supported Agents

MCP Installation

Usage

Development

Team Sync

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance