Skip to main content

Extended storage, searchable memory and instant retrieval

Project description

Sahara

CI Latest release Python 3.11+ License: MIT

Extended storage, searchable memory and instant retrieval.

Sahara turns folders on your computer into searchable memory. Find files by meaning, ask questions with cited sources, and expose the same local index to MCP clients. External drives, MinIO, and AWS storage are optional extensions, not prerequisites.

Local-first: indexing and semantic search run on your computer. No account, API key, storage bucket, or additional drive is required for the core search experience.

Latest release: v0.2.1 (June 7, 2026) adds index-only setup, multiple content roots, verified offload/fetch, one-command Claude Desktop setup, and trusted sahara-memory packaging. See the changelog.

Fictional Sahara retrieval examples: timeline reconstruction, vendor lookup, and honest missing-data handling

Fictional documents shown in a generic MCP client. Sahara retrieves from the configured local index, cites its sources, and reports when a requested detail is absent.

What Sahara Does

  • Searches PDFs, DOCX files, notes, code, and other text documents by meaning
  • Answers questions over indexed files with source paths and supporting snippets
  • Indexes multiple folders without copying them to a storage backend
  • Exposes read-only search and Q&A tools through MCP
  • Optionally syncs selected folders to a drive, NAS, MinIO, or AWS S3
  • Can offload verified stored files while keeping their indexed content searchable

Sahara is a single-user CLI and local retrieval service. It is not a hosted cloud service, autonomous agent, or general filesystem access layer.

Quick Start

Sahara requires Python 3.11 or newer.

The Python distribution is named sahara-memory, but it installs the sahara command. Do not run pip install sahara; that name belongs to the unrelated OpenStack project.

Install Sahara from PyPI:

python3 -m pip install "sahara-memory[search,mcp]"

sahara init --mode basic --folder ~/Documents
sahara index
sahara search "my tax return from 2024" --snippet

On Windows, use py -3.11 -m pip instead of python3 -m pip.

The first sahara index downloads a local embedding model of roughly 200 MB. Hugging Face may show an unauthenticated-download warning; no account or token is required.

Add more folders whenever you need them:

sahara folder add ~/Projects
sahara index

Every folder added this way remains index-only unless you explicitly enable storage sync for it.

Ask Questions

Semantic search does not require an LLM. sahara ask adds an optional answer-generation step after Sahara retrieves the relevant local passages.

Local answers with Ollama

Install Ollama, then download Sahara's default model:

ollama pull mistral
sahara ask --snippet "what does the lease say about pets?"

The current Mistral download is approximately 4.4 GB. New Sahara installations use Ollama as the answer provider. If Ollama is not already running, launch the application or run ollama serve in another terminal.

OpenAI without Ollama

Ollama is not required when you prefer OpenAI:

export OPENAI_API_KEY="your-api-key"
sahara config set answer_provider openai
sahara ask --snippet "what does the lease say about pets?"

Windows PowerShell:

$env:OPENAI_API_KEY = "your-api-key"
sahara config set answer_provider openai
sahara ask --snippet "what does the lease say about pets?"

Sahara stores the provider preference, not the API key. When OpenAI is selected, the question and retrieved snippets needed to answer it are sent to OpenAI. OpenAI API billing is separate from a ChatGPT subscription.

See Answer Provider Setup for installation, model selection, privacy details, and troubleshooting.

Connect an MCP Client

Sahara exposes five read-only MCP tools for search, cited Q&A, chunk reads, folder listing, and index status. These tools operate only on Sahara's indexed corpus; they cannot browse arbitrary files or modify your data.

Claude Desktop is the first tested client:

sahara mcp install-claude

Fully quit and reopen Claude Desktop, then confirm sahara appears under Connectors. The installer preserves existing settings and MCP servers, uses Sahara's absolute executable path, and creates a backup before changing an existing config.

See Claude Desktop Setup for verification and troubleshooting, or MCP Integrations for the tool surface and authenticated remote transport.

Optional Storage

Start with local indexing. Add storage later without rebuilding the semantic index.

Setup What it provides Status
Basic Local indexing across one or more folders Core mode
Local drive Copies selected folders to an external drive, NAS, or network share Optional
AWS Copies selected folders to S3, with optional Glacier features Optional

Attach a local drive:

sahara storage configure local --drive /Volumes/Archive/Sahara
sahara folder sync ~/Documents --enable
sahara sync

Attach AWS:

sahara storage configure aws \
  --bucket my-sahara-bucket \
  --region us-east-1
sahara folder sync ~/Documents --enable
sahara sync

MinIO and local-plus-Glacier modes are available through the interactive sahara init wizard. See Getting Started for storage credentials, content-root behavior, deletion semantics, and migration paths.

After a file has been synced and indexed, Sahara can free its source disk space while retaining search metadata:

sahara offload Documents/archive/report.pdf
sahara fetch Documents/archive/report.pdf

Offload verifies the stored copy before removing the local source. Ordinary filesystem deletion is not treated as offload.

Privacy and Security

  • The semantic index is stored locally in ~/.sahara/state.db.
  • Indexing, embeddings, and sahara search stay local.
  • Ollama answer generation stays local.
  • OpenAI receives the question and retrieved snippets when explicitly selected.
  • MCP is read-only and scoped to indexed content.
  • Remote MCP requires authentication by default and supports tool, folder, and snippet limits.
  • Optional storage encryption uses client-side AES-256-GCM.

Review SECURITY.md before exposing MCP remotely or relying on encrypted storage.

Supported Content

Sahara extracts text from:

  • PDF and DOCX documents
  • Markdown, reStructuredText, and plain text
  • Python, JavaScript, TypeScript, JSON, YAML, TOML, CSV, HTML, and XML
  • Other files that can be safely detected as UTF-8 text

Current limitations:

  • Scanned PDFs and images are not searchable because OCR is not implemented yet.
  • Audio and video transcription are not supported.
  • Sahara is designed for one user and one local index.
  • The project is beta; keep independent backups of important files.

Use sahara index-report to inspect indexed files, unsupported content, and failures.

Create a .saharaignore file in any indexed folder to exclude content using gitignore-style patterns:

.env*
secrets/
node_modules/
*.tmp

Start from the example ignore file for common operating-system, editor, build, and credential exclusions.

Core Commands

Command Purpose
sahara init --mode basic --folder PATH Create an index-only local library
sahara folder add/list/remove Manage indexed folders
sahara index [--force] Build or refresh the semantic index
sahara index-report Inspect indexing coverage and failures
sahara search QUERY Find files and passages by meaning
sahara ask --snippet QUESTION Generate an answer and show supporting sources
sahara mcp install-claude Connect Sahara to Claude Desktop
sahara mcp serve Run the read-only MCP server
Storage and operational command groups
Command group Purpose
sahara storage ... Configure, inspect, or disable optional storage
sahara folder sync ... Choose which indexed folders also sync
sahara sync/push/pull/status Inspect and execute storage synchronization
sahara offload/fetch Free and restore local space with verification
sahara encryption ... Configure or rotate storage encryption
sahara doctor Diagnose configuration and connectivity
sahara daemon ... Manage background watching and synchronization
sahara config ... Inspect or change configuration

See the complete command reference, or run sahara --help and sahara COMMAND --help for live CLI help.

Documentation

License

Sahara is available under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sahara_memory-0.2.1.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sahara_memory-0.2.1-py3-none-any.whl (98.3 kB view details)

Uploaded Python 3

File details

Details for the file sahara_memory-0.2.1.tar.gz.

File metadata

  • Download URL: sahara_memory-0.2.1.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sahara_memory-0.2.1.tar.gz
Algorithm Hash digest
SHA256 b19c62c628ab25c5ca8ad5a52030ae9998463004bd1b58d65dc6d541de4dba0a
MD5 589107c89221e4b6ba7830c0f91e3f7d
BLAKE2b-256 e82d5e13178ac92d30688ba37cc3a654c9eafed371cdf6314a5c9ea366643a57

See more details on using hashes here.

Provenance

The following attestation bundles were made for sahara_memory-0.2.1.tar.gz:

Publisher: publish.yml on nidheesh-p/sahara

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file sahara_memory-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: sahara_memory-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 98.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sahara_memory-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 07d27d9d6949156e7285ad3573b37e63a5cc1a8f0ccbc0b888102f2486dd22ec
MD5 56a368a19e5a14c22e4388e0247a50a0
BLAKE2b-256 4b77a57846236c4610ccb610cbe0c87cee7d07e38022e0c00b926643e80bc426

See more details on using hashes here.

Provenance

The following attestation bundles were made for sahara_memory-0.2.1-py3-none-any.whl:

Publisher: publish.yml on nidheesh-p/sahara

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page