Sub-millisecond code search via sparse trigram indexing.
Project description
ix
Sub-millisecond code search via sparse trigram indexing.
ix builds a compressed trigram index that is typically 2-3× the source
size for pure code, and can be smaller than the source for
repetitive or binary-heavy repos (measured: 0.13× on a 1 GB mixed-content
repo). The compaction pipeline — delta encoding → protobuf varint →
ZSTD level 3 — achieves 88% reduction vs raw u32 storage and 60%
additional savings on top of varint alone. The CDX trigram table uses
a B-tree page architecture (block index → ZSTD-compressed 1024-entry
blocks) for sub-50μs random access into compressed data.
This eliminates the linear-scan bottleneck of traditional tools on large codebases. Target hardware floor: 2015 CPU, 8 GB RAM.
Documentation
| For | Read |
|---|---|
| Getting started (tutorial) | docs/QUICKSTART.md |
| CLI flag reference | ix --help |
| Running the daemon | docs/DAEMON-RUNBOOK.md |
.ixd.toml config |
docs/.ixd.toml.md |
| Socket API (tool builders) | docs/SOCKET-API.md |
| Index delta format | docs/DELTA-FORMAT.md |
| Performance benchmarks | docs/BENCHMARKS.md |
| Contributing | docs/CONTRIBUTING.md |
| Release history | CHANGELOG.md |
| Upgrade from v0.7.x | docs/v0.8.0-UPGRADE-GUIDE.md |
Install
cargo install moeix
Installs two binaries:
ix— CLI search toolixd— background daemon (requiresnotifyfeature, enabled by default)
You only need ix for search. Install ixd if you want continuous indexing.
Quick Start
# Build the index
ix --build /path/to/repo
# Literal search
ix "fn validate"
# Regex search
ix --regex "fn\s+\w+_handler"
# Context lines around each match
ix --context 3 "TODO"
# Show query statistics
ix --stats "struct Config"
# Only matching file paths
ix --files-only "error"
# Count matches only
ix --count "TODO"
# Filter by file extension
ix --type rs --type py "fn main"
Daemon
ixd watches one or more directories for file changes and incrementally
updates the index:
# Single directory
ixd /path/to/repo
# Multiple directories (v0.9+)
ixd /project-a /project-b /project-c
Each directory runs on its own thread with independent index, watcher, beacon, and Unix domain socket. Signal handling and memory monitoring are shared.
Service Management (Linux / systemd)
# Install as a user-level systemd service
ix service install /path/to/repo
# Start / stop / restart / status the service
ix service start
ix service stop
ix service restart
ix service status
The service auto-starts on login and survives reboots. See docs/DAEMON-RUNBOOK.md for full operation guide.
Daemon Socket
The daemon exposes a Unix domain socket for external consumers (editors, tooling):
$XDG_RUNTIME_DIR/ixd/{hash}.sock
Protocol is NDJSON — one JSON object per newline-terminated line. See
docs/SOCKET-API.md. The ix CLI reads the index
file directly, not through the socket.
Configuring the Daemon
Scope what the daemon watches and indexes with .ixd.toml:
# .ixd.toml
watch_roots = ["src", "lib"]
exclude_patterns = [".git", "node_modules", "target", "vendor"]
See docs/.ixd.toml.md for full schema and examples.
How It Works
- Extract —
ix --buildwalks the directory, extracts byte-level trigrams (skipping null bytes to nullify binary noise), and caps at 64 offset samples per trigram for files >1 MB. - Accumulate — Trigrams are grouped into posting lists (one per unique trigram). An external sort with 500K-entry flush threshold keeps RAM constant regardless of repository size.
- Compress — Posting lists and the trigram table use the same pipeline: delta-encode adjacent file IDs and offsets → protobuf varint → ZSTD level 3. The CDX trigram table is organized as a B-tree: a 12-byte-per-1024-entry block index for O(log N) lookup, then decompress one ~5 KB block to find the target.
- Plan — On search, the query is decomposed into trigrams. The block index finds the target block, one ZSTD call decompresses it, and a linear scan finds the posting list offset.
- Verify — Candidates are filtered through per-file bloom filters (256 B, 0.7% false-positive rate), then streamed through a regex matcher with constant memory usage.
Compaction Pipeline (measured)
Raw u32 entries → delta-encode → varint → ZSTD level 3
10.6 MB 2-3× smaller 60% more 88% total reduction
(1.3 MB final)
| Stage | What it catches | Typical savings |
|---|---|---|
| Null-byte skip | Binary files (30-80% null bytes) | near-zero trigram cost |
| Offset sampling | Repeated patterns in large files | 64 offsets max per trigram |
| Delta encoding | Sequential file IDs, clustered offsets | 2-3× vs raw u32 |
| Protobuf varint | Small values fit in 1 byte (<128) | dense trigrams stay compact |
| ZSTD level 3 | Byte-pattern redundancy in varint runs | 60% on top of varint |
Index Format (v1.3)
All integers little-endian, offsets absolute from file start, 8-byte aligned.
| Section | Size (example, 70 files) | Description |
|---|---|---|
| Header | 256 B | magic IX01, version, flags, CRC, section offsets |
| File table | 3.4 KB | 48 B per file: path offset, content hash, size, mtime |
| Posting lists | 1,332 KB (90.1%) | Per-trigram file entries: delta+varint+ZSTD |
| CDX trigram table | 122 KB (8.3%) | 4.9 B/trigram (75% vs naive 20 B) |
| CDX block index | 312 B (0.02%) | 12 B per 1024-entry block, O(log N) binary search |
| Bloom filters | 18 KB (1.2%) | 256 B per file, 5 hashes, 0.7% FPR |
| String pool | 1.9 KB | Interned file paths |
CDX compression is always-on since v1.3. Not backward compatible with v1.1/v1.2 — rebuild indexes after upgrading:
rm -rf .ix/
ix --build .
Performance
Measured on a 2015-era CPU (Haswell equivalent), 8 GB RAM. All ratios verified from actual indexes.
| Workload | Source | Index | Ratio |
|---|---|---|---|
| Source code (70 files) | 576 KB | 1,477 KB | 2.56× |
| Mixed-content repo (426 files) | 1,069 MB | 138 MB | 0.13× |
| Metric | Value | Notes |
|---|---|---|
| Posting data vs raw u32 | 88% reduction | 10.6 MB → 1.3 MB |
| ZSTD on varint buffer | 60% savings | varint 3.3 MB → zstd 1.3 MB |
| CDX trigram table vs naive | 75% smaller | 4.9 B vs 20 B per entry |
| Block index overhead | 0.02% of index | 12 B per 1024 trigrams |
| CDX lookup latency | <50 μs | block index search + 1 ZSTD call |
| Build RAM peak | <8 MB | HashMap flushes at 500K entries |
| Safety ceiling | 60% RAM | ResourceGuard (llmosafe), 80% fallback |
| Cold start | <3 s | From disk to first result |
| Selective query (10% match) | 40 ms | 10× fewer files than ripgrep |
ix wins when the trigram index eliminates most files from scanning.
On small repos or queries where every file matches, linear-scan tools
like ripgrep are faster.
Feature Flags
| Flag | Default | Description |
|---|---|---|
notify |
yes | File watcher + daemon (ixd) |
decompress |
no | gz/zst/bz2/xz decompression |
archive |
no | zip/tar archive support |
full |
no | All optional features |
Library
ix is also a library (moeix on crates.io, ix as the crate name):
[dependencies]
moeix = "0.11"
use ix::reader::Reader;
use ix::executor::{Executor, QueryOptions};
use ix::planner::Planner;
let reader = Reader::open(".ix/shard.ix")?;
let plan = Planner::plan("struct Config", false);
let mut executor = Executor::new(&reader);
let (matches, stats) = executor.execute(&plan, &QueryOptions::default())?;
See docs.rs/moeix for the full API reference.
Building
cargo build --all-features
cargo test --all-features
cargo clippy --all-features -- -D warnings
Requires Rust 1.85+.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file moeix-0.11.9.tar.gz.
File metadata
- Download URL: moeix-0.11.9.tar.gz
- Upload date:
- Size: 202.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
724bbaf346c63403e8aac5e57360c9f37699fec9691106defa645a9ce5ee03fa
|
|
| MD5 |
afdfda09a8ddc2ad186161394f4eb88e
|
|
| BLAKE2b-256 |
927f46f977a00b7561a5b5c204c56455a5a24b664d0e6ccfd47f4d127e31e008
|
File details
Details for the file moeix-0.11.9-cp312-cp312-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: moeix-0.11.9-cp312-cp312-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 1.3 MB
- Tags: CPython 3.12, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d324f2b59d20c7f78f203e3cdbfb5f8bedb330a26fb3d93b7b6735e83f27647f
|
|
| MD5 |
6171dc38a21ffd16063630c6971b3e35
|
|
| BLAKE2b-256 |
e236434c890c63a18a2d7183540c63206abc7953b967917059c8b8e784d13ba6
|