desksearch

Private semantic search engine for your local files, powered by Starbucks 2D Matryoshka embeddings

These details have not been verified by PyPI

Project description

DeskSearch

Search your files by meaning. Faster than Spotlight. 100% private.

0.83ms search · 13MB binary · 100% offline · 50k+ files · 10× ONNX speedup

DeskSearch Demo

DeskSearch is a local semantic search engine that understands what you're looking for — not just the words you type. Index your documents, code, and emails in seconds. Search in under a millisecond. Everything stays on your machine.

Available as a Python package (pip install desksearch) and a standalone Rust binary (13MB, zero dependencies).

DeskSearch — Search Home

DeskSearch — Search Results

DeskSearch — Dashboard

More screenshots

Settings — Speed Tiers & File Types
DeskSearch — Settings

Data Sources — Connectors
DeskSearch — Data Sources

File Explorer
DeskSearch — Files

Features

	Feature	What it means
🔍	Semantic search	Finds "Q3 earnings" when you search "quarterly revenue" — understands meaning, not just keywords
⚡	Sub-millisecond search	0.83ms p50 latency powered by Starbucks 2D Matryoshka embeddings
🔒	100% private	All processing runs locally. No cloud. No telemetry. No data ever leaves your machine
📄	30+ file types	PDF, DOCX, PPTX, XLSX, code (20+ languages), email, Jupyter notebooks, archives
👁	Live reindexing	Built-in file watcher auto-detects changes — your index is always current
🎨	Beautiful web UI	React frontend with dark mode, live search, file preview, and keyboard shortcuts
🦀	Rust core	13MB self-contained binary with PDF/DOCX/PPTX/XLSX parsers, hybrid BM25+dense search, embedded frontend
☕	3 speed tiers	Fast (2-layer, 32d) · Middle (4-layer, 64d) · Pro (6-layer, 128d) — you pick the trade-off
🔌	Connector plugins	Local files, email (.eml/.mbox), Chrome bookmarks, Slack exports — extensible architecture
⚙️	ONNX acceleration	10× embedding speedup (171 chunks/sec) with INT8 quantization support
📊	Advanced filters	Filter by file type, date range, size · Sort by relevance, date, size, name · Export as JSON/CSV/text
⭐	Favorites & recents	Bookmark important files and track recently opened documents

Quick Start

Python

pip install desksearch
desksearch
# → opens http://localhost:3777

Rust

curl -fsSL https://github.com/wshuai190/desksearch/releases/latest/download/desksearch -o desksearch
chmod +x desksearch
./desksearch
# → opens http://localhost:3777

On first run, DeskSearch walks you through an onboarding wizard to pick folders and a speed tier. After that, one command is all you need.

CLI Examples

# Semantic search from the terminal
desksearch search "machine learning papers"
desksearch search "budget spreadsheet" --type xlsx --json

# Index specific folders
desksearch index ~/Projects ~/Research

# Check index health
desksearch status
desksearch doctor            # full health check

# Manage watched folders
desksearch folders add ~/Notes
desksearch folders list

# Switch speed tier
desksearch config set search_speed pro

# Run as a background daemon
desksearch daemon start
desksearch daemon install    # auto-start on login (macOS LaunchAgent)

# Benchmark your setup
desksearch benchmark --files 1000

All commands support --json for scripting and automation.

DeskSearch vs. Alternatives

	DeskSearch	Spotlight	Everything	Alfred
Search type	Semantic + keyword	Keyword	Filename only	Keyword
Understands meaning	✅ Yes	❌	❌	❌
File content search	✅ 30+ formats	Limited	❌	Via plugins
Search latency	~1ms	~50ms	~1ms	~50ms
Privacy	100% local	Local (Siri opt-in)	Local	Local
Code-aware	✅ 20+ languages	Minimal	❌	❌
Extensible	Plugins + REST API	❌	❌	Workflows
Open source	✅ MIT	❌	❌	❌
Cross-platform	macOS, Linux	macOS only	Windows only	macOS only

Powered by Starbucks Embeddings

DeskSearch uses the Starbucks 2D Matryoshka embedding model, which enables flexible layer × dimension truncation for speed/quality trade-offs.

Paper: Starbucks: Improved Training for 2D Matryoshka Embeddings Shengyao Zhuang*, Shuai Wang*, Fabio Zheng, Bevan Koopman, Guido Zuccon — ECIR 2026 (*equal contribution)

Unlike traditional embeddings that require fixed dimensions, Starbucks lets you choose both the number of transformer layers AND the embedding dimension at inference time — no retraining needed. DeskSearch leverages this to offer three speed tiers from a single model.

If you use DeskSearch or the Starbucks model in your research, please cite:

@inproceedings{wang2026starbucks,
  title={Starbucks: Improved Training for 2D Matryoshka Embeddings},
  author={Zhuang, Shengyao and Wang, Shuai and Zheng, Fabio and Koopman, Bevan and Zuccon, Guido},
  booktitle={ECIR},
  year={2026}
}

Architecture

DeskSearch runs hybrid retrieval — every query hits a Tantivy BM25 index and a FAISS dense vector index in parallel, then merges results via Reciprocal Rank Fusion (RRF). Embeddings come from the Starbucks 2D Matryoshka model with layer and dimension truncation, running on ONNX Runtime for 10× speedup over PyTorch (171 chunks/sec vs 17). The indexing pipeline parses files across 6 parallel workers, chunks at sentence boundaries, and embeds in batches of 256 for maximum throughput. A connector plugin system (v0.6.0) lets you pull in data from local files, email, Chrome bookmarks, and Slack exports via a unified API.

Your Files (PDF, DOCX, Markdown, Code, ...)
              │
              ▼  Parse → Chunk → Embed
   ┌──────────────────────────────────────┐
   │   30+ parsers → 512-char chunks      │
   │   → Starbucks 2D Matryoshka (ONNX)   │
   └───────────┬──────────────────────────┘
               │
       ┌───────┴────────┐
       ▼                ▼
  BM25 (Tantivy)   FAISS (dense)
  keyword index     semantic index
       │                │
       └───────┬────────┘
               ▼
    Reciprocal Rank Fusion
               │
               ▼
     Ranked Results + Snippets

Connectors (v0.6.0)

DeskSearch supports pluggable data connectors to index content from multiple sources:

Connector	What it does
Local files	File system scanning with scheduled sync and live re-indexing
Email	Parse `.eml` and `.mbox` files with sender, subject, and date extraction
Chrome bookmarks	Read your Chrome profile's bookmark hierarchy
Slack export	Import Slack ZIP exports with username resolution

Manage connectors via the API (/api/connectors/v2/) or the web UI settings panel. The ConnectorRegistry handles discovery, configuration, and sync scheduling.

Speed Tiers

Tier	Layers	Dimensions	Best for
`fast`	2	32	Large corpora, older hardware
`middle`	4	64	Default — balanced speed and quality
`pro`	6	128	Best accuracy, research use

desksearch config set search_speed pro

API

DeskSearch exposes a REST API on localhost:3777:

curl "http://localhost:3777/api/search?q=quarterly+revenue&limit=5"
curl "http://localhost:3777/api/status"
curl "http://localhost:3777/api/health"

Python SDK

from desksearch import DeskSearch

with DeskSearch() as ds:
    results = ds.search("quarterly revenue", limit=5)
    for r in results:
        print(f"{r.rank}. {r.filename} ({r.score:.3f})")
        print(f"   {r.snippet}\n")

Contributing

Contributions welcome. DeskSearch is MIT-licensed.

git clone https://github.com/wshuai190/desksearch.git
cd desksearch
pip install -e ".[dev]"
pytest

License

MIT © Shuai Wang

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.6.5

Mar 28, 2026

0.6.4

Mar 28, 2026

This version

0.6.3

Mar 28, 2026

0.6.2

Mar 28, 2026

0.6.1

Mar 28, 2026

0.6.0

Mar 28, 2026

0.5.0

Mar 24, 2026

0.4.0

Mar 23, 2026

0.3.0

Mar 23, 2026

0.2.0

Mar 23, 2026

0.1.5

Mar 23, 2026

0.1.4

Mar 23, 2026

0.1.3

Mar 23, 2026

0.1.2

Mar 23, 2026

0.1.1

Mar 23, 2026

0.1.0

Mar 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

desksearch-0.6.3-py3-none-any.whl (263.2 kB view details)

Uploaded Mar 28, 2026 Python 3

File details

Details for the file desksearch-0.6.3-py3-none-any.whl.

File metadata

Download URL: desksearch-0.6.3-py3-none-any.whl
Upload date: Mar 28, 2026
Size: 263.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for desksearch-0.6.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`233e676c9788c07f361fdec1eb3b3f88d5d495ceb034bc48d6faf69545a593a0`
MD5	`1b542c53f1e47889e46c83075f8b16f3`
BLAKE2b-256	`114899276553dcb8ae3c793c0de96c76acdde43ec2af390739eb8661e2988fd8`

See more details on using hashes here.

desksearch 0.6.3

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

DeskSearch

Features

Quick Start

Python

Rust

CLI Examples

DeskSearch vs. Alternatives

Powered by Starbucks Embeddings

Architecture

Connectors (v0.6.0)

Speed Tiers

API

Python SDK

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes