Local-first RAG library — ingest files and SQLite, query semantically, pipe results into any AI agent
Project description
Remex turns any folder of documents — PDFs, Word files, notes, spreadsheets, code — into a private, searchable knowledge base. Ask questions in plain language and get answers grounded in your own files, with sources cited.
Everything runs on your machine. No cloud account. No API key required for search. Bring your own AI provider (Anthropic, OpenAI, or a local Ollama instance) only when you want synthesised answers.
Remex Studio
Native desktop app for Windows. No terminal required.
⚠️ Windows SmartScreen warning Windows may display a "Windows protected your PC" warning when downloading or installing Remex Studio. This happens because the app is not yet code-signed with a paid certificate — the software is safe and fully open source, feel free to audit the source code in this repository. To proceed: click "More info" then "Run anyway".
What you can do
| 🔍 Semantic search | Vector similarity search across one or more collections simultaneously |
| 🤖 AI Answer | Ask a question, get a synthesised answer with cited sources (Anthropic · OpenAI · Ollama) |
| 📄 12 file formats | .pdf .docx .md .txt .csv .json .jsonl + .html .pptx .xlsx .epub .odt (optional package) |
| 🗄 SQLite ingest | Embed rows from any table alongside your files |
| ♻️ Incremental ingest | SHA-256 hash check — only changed files are re-processed |
| 🎯 Source filter | Narrow results to one or more documents before searching or asking AI |
| 🔎 Chunk viewer | Expand any result to read the full chunk, navigate with keyboard arrows |
| 📦 Collections manager | Rename, describe, purge, bulk-delete sources, one-click re-ingest |
| 📤 Export | JSON · CSV · Markdown · BibTeX · RIS · CSL-JSON · Obsidian vault |
| 👁️ Watch folders | Re-ingest automatically when files change inside chosen directories |
| 🔬 All embedding models | MiniLM, bge-base, bge-large, multilingual, nomic-embed long-context, custom HuggingFace/FastEmbed names |
| 🌙 Themes | Light, dark, auto (follows OS) + sixteen accent colours |
| 🔎 Searchable query history | Filter past queries by substring |
| ⌨️ Keyboard-driven | Press ? anywhere in Studio for the full shortcuts reference |
| ⚙️ Optional packages | Install extra file formats, AI integrations, or sentence chunking from Settings → General at any time |
Remex is free and open-source. Every feature ships in the box — no tiers, no license keys, no payment required.
Python CLI & Library
pip install remex-cli # core — ingest + query (7 formats)
pip install "remex-cli[formats]" # + .pptx .xlsx .epub .html .odt
pip install "remex-cli[ai]" # + Anthropic & OpenAI embeddings / generation
pip install "remex-cli[sentence]" # + sentence-aware chunking (NLTK)
pip install "remex-cli[api]" # + FastAPI sidecar (used by Studio)
pip install "remex-cli[all]" # everything above
Quick start
# Scaffold a project
remex init
# Ingest a folder of documents
remex ingest ./docs
# Semantic search
remex query "how does authentication work?"
# AI-synthesised answer (requires ANTHROPIC_API_KEY, OPENAI_API_KEY, or a running Ollama)
remex query "how does authentication work?" --ai
Command reference
| Command | Description |
|---|---|
remex init [path] |
Scaffold docs/, remex.toml, and .gitignore entries |
remex ingest <dir> |
Ingest files from a directory into a collection |
remex ingest-sqlite <db> |
Ingest rows from a SQLite table |
remex query <text> |
Semantic search; add --ai for an AI-synthesised answer |
remex sources |
List all ingested source paths in a collection |
remex stats |
Show chunk and source counts |
remex delete-source <path> |
Remove all chunks for a specific source |
remex purge |
Remove chunks whose source file no longer exists on disk |
remex reset |
Wipe an entire collection |
remex list-collections |
List all collections in a database |
remex serve |
Start the FastAPI sidecar on localhost:8000 |
remex <command> --help # full option reference for any command
Use as a library
from remex import ingest, query
# Ingest a folder
result = ingest("./docs", collection_name="my-kb")
print(f"{result.chunks_stored} chunks stored")
# Search
results = query("how does auth work?", collection_name="my-kb")
for r in results:
print(f"[{r.score:.3f}] {r.source} → {r.text[:120]}")
Configuration
Drop a remex.toml in your project root (or run remex init to generate one):
[remex]
db = "./remex_db"
collection = "my-kb"
embedding_model = "all-MiniLM-L6-v2"
# chunk_size = 768 # characters per chunk (512-1024 works well)
# overlap = 150 # ~20% overlap preserves context at boundaries
# min_chunk_size = 50 # discard chunks shorter than this
# chunking = "recursive" # "recursive" (default) | "sentence" | "word"
CLI flags always override remex.toml values.
Supported embedding models
| Preset | Model | Size | Notes |
|---|---|---|---|
| Light | all-MiniLM-L6-v2 |
22 MB | Default — fast, good accuracy |
| Balanced | intfloat/e5-base-v2 |
438 MB | Better retrieval quality |
| Multilingual | paraphrase-multilingual-MiniLM-L12-v2 |
470 MB | 50+ languages |
| Large (Pro) | BAAI/bge-large-en-v1.5 |
1.3 GB | Best English accuracy |
| E5 Large (Pro) | intfloat/e5-large-v2 |
1.3 GB | Strong retrieval benchmark |
| Long ctx (Pro) | nomic-ai/nomic-embed-text-v1.5 |
547 MB | 8,192-token context window |
Any model from SBERT, HuggingFace sentence-similarity, or Ollama can be used by typing the model name directly.
Building from source
Studio requires Rust, Node.js 20+, and the Tauri prerequisites for Windows.
# Python CLI
pip install -e ".[dev]"
pytest
# Studio (dev server with hot-reload)
cd studio
npm install
npm run tauri dev
# Studio (production build)
npm run tauri build
See studio/README.md for the full build guide.
Changelog · Contributing · Licensing · GitHub
Python CLI: Apache-2.0 · Studio (v1.3.0+): FSL-1.1-MIT — see LICENSES.md
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file remex_cli-1.5.2.tar.gz.
File metadata
- Download URL: remex_cli-1.5.2.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fd05d73649807d37e151bbfa658dc09e6bf95094cd4d3958bf58948ee6c2ea0f
|
|
| MD5 |
1522d48ad4b721e69cd13e6039bf4198
|
|
| BLAKE2b-256 |
bea1123ac0fedfa497d8f032da235cf5d3edf1426b58441ef8c6a34e317c6a5a
|
Provenance
The following attestation bundles were made for remex_cli-1.5.2.tar.gz:
Publisher:
publish.yml on adm-crow/remex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
remex_cli-1.5.2.tar.gz -
Subject digest:
fd05d73649807d37e151bbfa658dc09e6bf95094cd4d3958bf58948ee6c2ea0f - Sigstore transparency entry: 1584559207
- Sigstore integration time:
-
Permalink:
adm-crow/remex@c9bedc16d2e33c84a7db0e3cc34f3b0eb800f1bf -
Branch / Tag:
refs/tags/v1.5.2 - Owner: https://github.com/adm-crow
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c9bedc16d2e33c84a7db0e3cc34f3b0eb800f1bf -
Trigger Event:
push
-
Statement type:
File details
Details for the file remex_cli-1.5.2-py3-none-any.whl.
File metadata
- Download URL: remex_cli-1.5.2-py3-none-any.whl
- Upload date:
- Size: 60.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dc7db016b509fef56c8fc5b7f50b857a7dd4831629623ba7b337a86e168eb9e6
|
|
| MD5 |
c209b20743209a88a9b41f457706b8d2
|
|
| BLAKE2b-256 |
a1f6e22ecb12d1ad50d2fcfb49586e0f8842c5e79ffe0bb400c9603073a26c9b
|
Provenance
The following attestation bundles were made for remex_cli-1.5.2-py3-none-any.whl:
Publisher:
publish.yml on adm-crow/remex
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
remex_cli-1.5.2-py3-none-any.whl -
Subject digest:
dc7db016b509fef56c8fc5b7f50b857a7dd4831629623ba7b337a86e168eb9e6 - Sigstore transparency entry: 1584559356
- Sigstore integration time:
-
Permalink:
adm-crow/remex@c9bedc16d2e33c84a7db0e3cc34f3b0eb800f1bf -
Branch / Tag:
refs/tags/v1.5.2 - Owner: https://github.com/adm-crow
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@c9bedc16d2e33c84a7db0e3cc34f3b0eb800f1bf -
Trigger Event:
push
-
Statement type: