Skip to main content

Semantic search over the Epstein Files using zvec + sentence-transformers

Project description

epstein-search 🔍

PyPI License: MIT Python 3.10+

Semantic search over the publicly available Epstein Files — court documents, FBI reports, and DOJ publications — using AI-powered vector search.

Built with zvec and pre-computed embeddings from devankit7873/EpsteinFiles-Vector-Embeddings-ChromaDB.

No API key needed for search. Runs entirely locally.


☕ Support This Project

This is free, open-source research tooling. If you find it useful, please consider supporting:

Buy Me A Coffee

Or tip via crypto:

Chain Address
SOL 76fCU6va3cGrbak4i9mwFfdYx1QsJJrqrxViFDUTsXUL
ETH 0xE808754a18A893A3eeFE01780D822C902680d1B7
BASE 0xE808754a18A893A3eeFE01780D822C902680d1B7

Quick Start

pip install epstein-search

# One-time setup — downloads pre-built index (~100K+ document chunks)
epstein-search setup

# Search — free, local, no API key needed
epstein-search search "flight logs to the island"

# Filter by document type
epstein-search search "testimony" --doc-type deposition

# AI-powered answers (local Ollama = free, or any LLM API)
epstein-search ask "Who appears most in the flight logs?" --model ollama/llama3

Commands

Command Description API Key?
epstein-search setup Download & build search index No
epstein-search chat Interactive mode (recommended) Only for cloud LLMs
epstein-search search "query" One-off semantic search No
epstein-search ask "question" One-off RAG answer Only for cloud LLMs
epstein-search info Show index stats No
epstein-search ingest Build from scratch (advanced) No

Interactive Mode (Recommended)

The easiest way to use epstein-search:

epstein-search chat

Type questions naturally. No flags needed. Commands inside chat:

Command Description
/search Switch to search-only mode (no LLM)
/ask Switch to RAG mode (LLM answers)
/model openai/llama3 Change LLM on the fly
/topk 5 Change number of results
/info Show current settings
/quit Exit

Tip: Set your model once in .env so you never need to specify it:

EPSTEIN_LLM_MODEL=openai/your-model-name

Search Options

epstein-search search "financial records" --top-k 5
epstein-search search "testimony" --source "FBI"
epstein-search search "flight logs" --doc-type flight_log
epstein-search search "court order" --json-output

Document Type Filters

Filter Description
court_filing Court motions, orders, filings
deposition Sworn testimony, Q&A transcripts
fbi_report FBI investigation reports
flight_log Aircraft flight records
financial Bank records, wire transfers
other Miscellaneous documents

Ask (RAG)

The ask command retrieves relevant documents and generates an answer using any LLM via LiteLLM.

Free options (no API key):

# Run Ollama locally — completely free
epstein-search ask "Key individuals mentioned" --model ollama/llama3
epstein-search ask "What do the flight logs show?" --model ollama/mistral

LM Studio (local, no API key):

  1. Download LM Studio, load a model, start the local server (default: port 1234)
  2. Add to your .env file:
OPENAI_API_BASE=http://localhost:1234/v1
OPENAI_API_KEY=lm-studio
EPSTEIN_LLM_MODEL=openai/your-model-name
  1. Then just run:
epstein-search chat   # interactive mode, no flags needed

Cloud LLM options (API key required):

# Gemini (requires GEMINI_API_KEY)
epstein-search ask "Summarize the findings" --model gemini/gemini-3-flash-preview

# OpenAI (requires OPENAI_API_KEY)
epstein-search ask "Timeline of events" --model gpt-4o

# Anthropic (requires ANTHROPIC_API_KEY)
epstein-search ask "Financial connections" --model anthropic/claude-sonnet-4-20250514
# Show source documents alongside the answer
epstein-search ask "Flight patterns" --show-sources

How It Works

pip install epstein-search
         ↓
epstein-search setup
  → Downloads 100K+ pre-computed embeddings (all-MiniLM-L6-v2)
  → Imports into local zvec vector database
         ↓
epstein-search search "your query"
  → Embeds query locally (sentence-transformers, no API key)
  → Vector similarity search via zvec
  → Returns matching document chunks

Dataset

Embeddings: devankit7873/EpsteinFiles-Vector-Embeddings-ChromaDB — 100K+ chunks, 384-dim vectors, based on the Epstein Files 20K corpus.

All content is from publicly available sources:

  • U.S. Department of Justice Epstein Library
  • House Oversight Committee releases
  • Unsealed federal court documents
  • FBI reports and DOJ publications

License

MIT — see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

epstein_search-0.1.1.tar.gz (18.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

epstein_search-0.1.1-py3-none-any.whl (23.1 kB view details)

Uploaded Python 3

File details

Details for the file epstein_search-0.1.1.tar.gz.

File metadata

  • Download URL: epstein_search-0.1.1.tar.gz
  • Upload date:
  • Size: 18.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.19

File hashes

Hashes for epstein_search-0.1.1.tar.gz
Algorithm Hash digest
SHA256 2a5b84e1e56a526ac60ccc91e2e70a95666e50e497989d5f6aa4ef23abf0d386
MD5 0718fee69fc9133dc9fe21c1e13a2606
BLAKE2b-256 7fa37a42c7ec1e3c60b0ca12ca5ae839af5e326915dbae13a8d8e4544b81f8f4

See more details on using hashes here.

File details

Details for the file epstein_search-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for epstein_search-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 1273a357fcb4005a89588647888fb85a136e5b86a25bd15c04c20ae26629365b
MD5 437ca5fb979a0a37cd23b07eb2dc1b07
BLAKE2b-256 70c70d98dc43bd94dc89e970f2d14598b19ac120150234a8dc1c5af412c3bf4f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page