Skip to main content

Extract searchable knowledge from any document. Expose it to LLMs via MCP.

Project description

punt-quarry

PyPI GitHub release Python 3.13+ Tests Lint codecov

Unlock the knowledge trapped on your hard drive. Works with Claude Desktop, Claude Code, and the macOS menu bar.

Quick Start

Claude Desktop

Download punt-quarry.mcpb and double-click to install. Claude Desktop will prompt you for a data directory.

Attach a document to your conversation and ask Claude to index it:

"Index this report"

"What does it say about Q3 margins?"

That's it. Everything runs locally — no API keys, no cloud accounts. The embedding model (~500 MB) downloads automatically on first use.

Claude Code / CLI

curl -fsSL https://raw.githubusercontent.com/punt-labs/quarry/0cbb5e3/install.sh | sh
Manual install (if you already have uv)
uv tool install punt-quarry
quarry install
quarry doctor
Verify before running
curl -fsSL https://raw.githubusercontent.com/punt-labs/quarry/0cbb5e3/install.sh -o install.sh
shasum -a 256 install.sh
cat install.sh
sh install.sh

Then start using it:

quarry ingest-file notes.md      # index a file
quarry search "my topic"         # search by meaning, not keywords

What You Can Do

Index anything you have. PDFs, scanned documents, images, spreadsheets, presentations, source code, Markdown, LaTeX, DOCX, HTML, and webpages. Quarry reads each format the way you would and extracts the knowledge inside.

Search by meaning. "What did the Q3 report say about margins?" finds relevant passages even if they never use the word "margins." This is semantic search — it understands what you mean, not just what you typed.

Give your LLM access. As an MCP server, Quarry lets Claude Desktop and Claude Code search your indexed documents directly. Ask Claude about something in your files and it pulls the relevant context automatically.

Keep things organized. Named databases separate work from personal. Directory sync watches your folders and re-indexes when files change. Collections group documents within a database.

Supported Formats

Source What happens
PDF (text pages) Text extraction via PyMuPDF
PDF (image pages) OCR (local by default; optional cloud backend)
Images (PNG, JPG, TIFF, BMP, WebP) OCR (local by default; optional cloud backend)
Spreadsheets (XLSX, CSV) Tabular serialization preserving structure
Presentations (PPTX) Slide-per-chunk with tables and speaker notes
HTML / webpages Boilerplate stripping, converted to Markdown
Text files (TXT, MD, LaTeX, DOCX) Split by headings, sections, or paragraphs
Source code (30+ languages) AST parsing into functions and classes

Using with Claude Desktop

The easiest way to install is the .mcpb file — download and double-click. Claude Desktop handles the rest.

Alternatively, quarry install (from the CLI) also configures Claude Desktop automatically.

Manual setup (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "quarry": {
      "command": "/path/to/uvx",
      "args": ["--from", "punt-quarry", "quarry", "mcp"]
    }
  }
}

Use the absolute path to uvx (e.g. /opt/homebrew/bin/uvx). quarry install resolves this automatically.

Note: Uploaded files in Claude Desktop live in a sandbox that Quarry cannot access. Use ingest_content for uploaded content, or provide local file paths to ingest_file.

Menu Bar App (macOS)

Quarry Menu Bar is a native macOS companion app that puts your knowledge base one click away. It sits in the menu bar and lets you search across all your indexed documents without switching apps.

  • Semantic search with instant results
  • Switch between named databases
  • Syntax-highlighted results for code, Markdown, and prose
  • Detail view with full page context

The app manages its own quarry serve process automatically — no manual server setup needed. Requires macOS 14 (Sonoma) or later and punt-quarry installed.

Using with Claude Code

quarry install configures Claude Code automatically. To set up manually:

claude mcp add quarry -- uvx --from punt-quarry quarry mcp

Once configured, Claude Code can call these tools on your behalf:

Tool What it does
search_documents Semantic search with optional filters
ingest_file Index a file by path
ingest_url Fetch and index a webpage
ingest_sitemap Crawl a sitemap and ingest all discovered URLs
ingest_content Index inline text (for uploads, clipboard, etc.)
get_documents List indexed documents
get_page Get raw text for a specific page
delete_document Remove a document
list_collections List collections
delete_collection Remove a collection
register_directory Register a directory for sync
deregister_directory Remove a directory registration
sync_all_registrations Re-index all registered directories
list_registrations List registered directories
list_databases List named databases
use_database Switch to a different database
status Database stats

CLI Reference

# Ingest
quarry ingest-file report.pdf                  # index a file
quarry ingest-file report.pdf --overwrite      # replace existing data
quarry ingest-url https://example.com/page     # index a webpage
quarry ingest-sitemap https://docs.example.com/sitemap.xml  # crawl a sitemap
quarry ingest-sitemap URL --include '/docs/*' --exclude '/docs/v1/*' --limit 50

# Search
quarry search "revenue trends"                 # semantic search
quarry search "revenue" --limit 5              # limit results
quarry search "tests" --page-type code         # only code results
quarry search "revenue" --source-format .xlsx  # only spreadsheet results
quarry search "deploy" --document README.md    # search within one document

# Manage documents
quarry list                                    # list indexed documents
quarry delete report.pdf                       # remove a document
quarry collections                             # list collections

# Directory sync
quarry register ~/Documents/notes              # watch a directory
quarry sync                                    # re-index all registered directories
quarry registrations                           # list registered directories
quarry deregister notes                        # stop watching

# System
quarry doctor                                  # health check
quarry databases                               # list all databases with stats
quarry serve                                   # start HTTP API server

Named Databases

Keep separate databases for different purposes:

quarry ingest-file report.pdf --db work
quarry ingest-file recipe.md --db personal
quarry search "revenue" --db work
quarry databases                               # list all databases

Each database is fully isolated — its own vector index and sync registry. The default database is called default.

You can point MCP servers at different databases:

{
  "mcpServers": {
    "work": {
      "command": "/path/to/uvx",
      "args": ["--from", "punt-quarry", "quarry", "mcp", "--db", "work"]
    }
  }
}

Configuration

Quarry works with zero configuration. These environment variables are available for customization:

Variable Default Description
OCR_BACKEND local local (offline, no setup) or textract (AWS, better for degraded scans)
QUARRY_ROOT ~/.quarry/data Base directory for all databases (log path configured separately via LOG_PATH)
CHUNK_MAX_CHARS 1800 Max characters per chunk (~450 tokens)
CHUNK_OVERLAP_CHARS 200 Overlap between consecutive chunks

For advanced settings (Textract polling, embedding model, paths), see Advanced Configuration.

Cloud Backends (Optional)

Quarry works entirely offline by default. Cloud backends are available for specialized use cases.

AWS Textract (OCR)

Better character accuracy on degraded scans, faxes, and low-resolution images. For clean digital documents, local OCR produces equivalent search results.

export OCR_BACKEND=textract
export AWS_ACCESS_KEY_ID=...
export AWS_SECRET_ACCESS_KEY=...
export AWS_DEFAULT_REGION=us-east-1
export S3_BUCKET=my-bucket

See docs/AWS-SETUP.md for IAM policies and full setup.

SageMaker Embedding

Cloud-accelerated embedding for large-scale batch ingestion (thousands of files). Search always uses the local model regardless of this setting.

export EMBEDDING_BACKEND=sagemaker
export SAGEMAKER_ENDPOINT_NAME=quarry-embedding

Deploy with ./infra/manage-stack.sh deploy. See docs/AWS-SETUP.md for details.

Roadmap

  • Google Drive connector
  • quarry sync --watch for live filesystem monitoring
  • PII detection and redaction

For product vision and positioning, see PR/FAQ.

Development

uv run ruff check .
uv run ruff format --check .
uv run mypy src/ tests/
uv run pytest                  # run the test suite

Quarry is fully typed (py.typed) and can be used as a Python library. See CONTRIBUTING.md for setup, architecture, and how to add new formats.

Documentation

License

MIT

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

punt_quarry-0.9.0.tar.gz (59.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

punt_quarry-0.9.0-py3-none-any.whl (74.3 kB view details)

Uploaded Python 3

File details

Details for the file punt_quarry-0.9.0.tar.gz.

File metadata

  • Download URL: punt_quarry-0.9.0.tar.gz
  • Upload date:
  • Size: 59.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for punt_quarry-0.9.0.tar.gz
Algorithm Hash digest
SHA256 11e8d9a6dbd7bc7806dfc292307b7c56bddec636321c701000349c42bf870331
MD5 eccb2e2c77fbc21cd82088905dca185b
BLAKE2b-256 18551357352782298cb3587a06ce9a82e053d6f64651e25d1248c213f6746ab6

See more details on using hashes here.

Provenance

The following attestation bundles were made for punt_quarry-0.9.0.tar.gz:

Publisher: release.yml on punt-labs/quarry

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file punt_quarry-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: punt_quarry-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 74.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for punt_quarry-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7546320c59bf91df39356382dff62b118c179da7436d9127819f9163a51ae54f
MD5 50c6060edf3f51b179350a5b7089e2cf
BLAKE2b-256 20e03cb1c63a430345c2ce7e4a1892a13a5bd36e2551b9f3910daaae95afe588

See more details on using hashes here.

Provenance

The following attestation bundles were made for punt_quarry-0.9.0-py3-none-any.whl:

Publisher: release.yml on punt-labs/quarry

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page