Semantic search over the Epstein Files using zvec + sentence-transformers
Project description
epstein-search 🔍
Semantic search over the publicly available Epstein Files — court documents, FBI reports, and DOJ publications — using AI-powered vector search.
Built with zvec and pre-computed embeddings from devankit7873/EpsteinFiles-Vector-Embeddings-ChromaDB.
No API key needed for search. Runs entirely locally.
☕ Support This Project
This is free, open-source research tooling. If you find it useful, please consider supporting:
Or tip via crypto:
| Chain | Address |
|---|---|
| SOL | 76fCU6va3cGrbak4i9mwFfdYx1QsJJrqrxViFDUTsXUL |
| ETH | 0xE808754a18A893A3eeFE01780D822C902680d1B7 |
| BASE | 0xE808754a18A893A3eeFE01780D822C902680d1B7 |
Quick Start
pip install epstein-search
# One-time setup — downloads pre-built index (~100K+ document chunks)
epstein-search setup
# Search — free, local, no API key needed
epstein-search search "flight logs to the island"
# Filter by document type
epstein-search search "testimony" --doc-type deposition
# AI-powered answers (local Ollama = free, or any LLM API)
epstein-search ask "Who appears most in the flight logs?" --model ollama/llama3
Commands
| Command | Description | API Key? |
|---|---|---|
epstein-search setup |
Download & build search index | No |
epstein-search chat |
Interactive mode (recommended) | Only for cloud LLMs |
epstein-search search "query" |
One-off semantic search | No |
epstein-search ask "question" |
One-off RAG answer | Only for cloud LLMs |
epstein-search info |
Show index stats | No |
epstein-search ingest |
Build from scratch (advanced) | No |
Interactive Mode (Recommended)
The easiest way to use epstein-search:
epstein-search chat
Type questions naturally. No flags needed. Commands inside chat:
| Command | Description |
|---|---|
/search |
Switch to search-only mode (no LLM) |
/ask |
Switch to RAG mode (LLM answers) |
/model openai/llama3 |
Change LLM on the fly |
/topk 5 |
Change number of results |
/info |
Show current settings |
/quit |
Exit |
Tip: Set your model once in .env so you never need to specify it:
EPSTEIN_LLM_MODEL=openai/your-model-name
Search Options
epstein-search search "financial records" --top-k 5
epstein-search search "testimony" --source "FBI"
epstein-search search "flight logs" --doc-type flight_log
epstein-search search "court order" --json-output
Document Type Filters
| Filter | Description |
|---|---|
court_filing |
Court motions, orders, filings |
deposition |
Sworn testimony, Q&A transcripts |
fbi_report |
FBI investigation reports |
flight_log |
Aircraft flight records |
financial |
Bank records, wire transfers |
other |
Miscellaneous documents |
Ask (RAG)
The ask command retrieves relevant documents and generates an answer using any LLM via LiteLLM.
Free options (no API key):
# Run Ollama locally — completely free
epstein-search ask "Key individuals mentioned" --model ollama/llama3
epstein-search ask "What do the flight logs show?" --model ollama/mistral
LM Studio (local, no API key):
- Download LM Studio, load a model, start the local server (default: port 1234)
- Add to your
.envfile:
OPENAI_API_BASE=http://localhost:1234/v1
OPENAI_API_KEY=lm-studio
EPSTEIN_LLM_MODEL=openai/your-model-name
- Then just run:
epstein-search chat # interactive mode, no flags needed
Cloud LLM options (API key required):
# Gemini (requires GEMINI_API_KEY)
epstein-search ask "Summarize the findings" --model gemini/gemini-3-flash-preview
# OpenAI (requires OPENAI_API_KEY)
epstein-search ask "Timeline of events" --model gpt-4o
# Anthropic (requires ANTHROPIC_API_KEY)
epstein-search ask "Financial connections" --model anthropic/claude-sonnet-4-20250514
# Show source documents alongside the answer
epstein-search ask "Flight patterns" --show-sources
How It Works
pip install epstein-search
↓
epstein-search setup
→ Downloads 100K+ pre-computed embeddings (all-MiniLM-L6-v2)
→ Imports into local zvec vector database
↓
epstein-search search "your query"
→ Embeds query locally (sentence-transformers, no API key)
→ Vector similarity search via zvec
→ Returns matching document chunks
Dataset
Embeddings: devankit7873/EpsteinFiles-Vector-Embeddings-ChromaDB — 100K+ chunks, 384-dim vectors, based on the Epstein Files 20K corpus.
All content is from publicly available sources:
- U.S. Department of Justice Epstein Library
- House Oversight Committee releases
- Unsealed federal court documents
- FBI reports and DOJ publications
License
MIT — see LICENSE
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file epstein_search-0.1.1.tar.gz.
File metadata
- Download URL: epstein_search-0.1.1.tar.gz
- Upload date:
- Size: 18.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2a5b84e1e56a526ac60ccc91e2e70a95666e50e497989d5f6aa4ef23abf0d386
|
|
| MD5 |
0718fee69fc9133dc9fe21c1e13a2606
|
|
| BLAKE2b-256 |
7fa37a42c7ec1e3c60b0ca12ca5ae839af5e326915dbae13a8d8e4544b81f8f4
|
File details
Details for the file epstein_search-0.1.1-py3-none-any.whl.
File metadata
- Download URL: epstein_search-0.1.1-py3-none-any.whl
- Upload date:
- Size: 23.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.19
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1273a357fcb4005a89588647888fb85a136e5b86a25bd15c04c20ae26629365b
|
|
| MD5 |
437ca5fb979a0a37cd23b07eb2dc1b07
|
|
| BLAKE2b-256 |
70c70d98dc43bd94dc89e970f2d14598b19ac120150234a8dc1c5af412c3bf4f
|