Skip to main content

Local-first RAG library — ingest files and SQLite, query semantically, pipe results into any AI agent

Project description

Remex

Remex

Unleash the power of your files with local AI.


GitHub Release CI PyPI License Windows Python


Remex Studio — homepage

Remex turns any folder of documents — PDFs, Word files, notes, spreadsheets, code — into a private, searchable knowledge base. Ask questions in plain language and get answers grounded in your own files, with sources cited.

Everything runs on your machine. No cloud account. No API key required for search. Bring your own AI provider (Anthropic, OpenAI, or a local Ollama instance) only when you want synthesised answers.



Remex Studio

Native desktop app for Windows. No terminal required.

⬇ Download the latest release

⚠️ Windows SmartScreen warning Windows may display a "Windows protected your PC" warning when downloading or installing Remex Studio. This happens because the app is not yet code-signed with a paid certificate — the software is safe and fully open source, feel free to audit the source code in this repository. To proceed: click "More info" then "Run anyway".


What you can do

🔍 Semantic search Vector similarity search across one or more collections simultaneously
🤖 AI Answer Ask a question, get a synthesised answer with cited sources (Anthropic · OpenAI · Ollama)
📄 12 file formats .pdf .docx .md .txt .csv .json .jsonl + .html .pptx .xlsx .epub .odt (optional package)
🗄 SQLite ingest Embed rows from any table alongside your files
♻️ Incremental ingest SHA-256 hash check — only changed files are re-processed
🎯 Source filter Narrow results to one or more documents before searching or asking AI
🔎 Chunk viewer Expand any result to read the full chunk, navigate with keyboard arrows
📦 Collections manager Rename, describe, purge, bulk-delete sources, one-click re-ingest
📤 Export JSON · CSV · Markdown · BibTeX · RIS · CSL-JSON · Obsidian vault
👁️ Watch folders Re-ingest automatically when files change inside chosen directories
🔬 All embedding models MiniLM, bge-base, bge-large, multilingual, nomic-embed long-context, custom HuggingFace/FastEmbed names
🌙 Themes Light, dark, auto (follows OS) + sixteen accent colours
🔎 Searchable query history Filter past queries by substring
⌨️ Keyboard-driven Press ? anywhere in Studio for the full shortcuts reference
⚙️ Optional packages Install extra file formats, AI integrations, or sentence chunking from Settings → General at any time

Remex is free and open-source. Every feature ships in the box — no tiers, no license keys, no payment required.



Python CLI & Library

pip install remex-cli                    # core — ingest + query (7 formats)
pip install "remex-cli[formats]"         # + .pptx .xlsx .epub .html .odt
pip install "remex-cli[ai]"              # + Anthropic & OpenAI embeddings / generation
pip install "remex-cli[sentence]"        # + sentence-aware chunking (NLTK)
pip install "remex-cli[api]"             # + FastAPI sidecar (used by Studio)
pip install "remex-cli[all]"             # everything above

Quick start

# Scaffold a project
remex init

# Ingest a folder of documents
remex ingest ./docs

# Semantic search
remex query "how does authentication work?"

# AI-synthesised answer (requires ANTHROPIC_API_KEY, OPENAI_API_KEY, or a running Ollama)
remex query "how does authentication work?" --ai

Command reference

Command Description
remex init [path] Scaffold docs/, remex.toml, and .gitignore entries
remex ingest <dir> Ingest files from a directory into a collection
remex ingest-sqlite <db> Ingest rows from a SQLite table
remex query <text> Semantic search; add --ai for an AI-synthesised answer
remex sources List all ingested source paths in a collection
remex stats Show chunk and source counts
remex delete-source <path> Remove all chunks for a specific source
remex purge Remove chunks whose source file no longer exists on disk
remex reset Wipe an entire collection
remex list-collections List all collections in a database
remex serve Start the FastAPI sidecar on localhost:8000
remex <command> --help    # full option reference for any command

Use as a library

from remex import ingest, query

# Ingest a folder
result = ingest("./docs", collection_name="my-kb")
print(f"{result.chunks_stored} chunks stored")

# Search
results = query("how does auth work?", collection_name="my-kb")
for r in results:
    print(f"[{r.score:.3f}] {r.source}{r.text[:120]}")


Configuration

Drop a remex.toml in your project root (or run remex init to generate one):

[remex]
db              = "./remex_db"
collection      = "my-kb"
embedding_model = "all-MiniLM-L6-v2"

# chunk_size     = 768          # characters per chunk (512-1024 works well)
# overlap        = 150          # ~20% overlap preserves context at boundaries
# min_chunk_size = 50           # discard chunks shorter than this
# chunking       = "recursive"  # "recursive" (default) | "sentence" | "word"

CLI flags always override remex.toml values.



Supported embedding models

Preset Model Size Notes
Light all-MiniLM-L6-v2 22 MB Default — fast, good accuracy
Balanced intfloat/e5-base-v2 438 MB Better retrieval quality
Multilingual paraphrase-multilingual-MiniLM-L12-v2 470 MB 50+ languages
Large (Pro) BAAI/bge-large-en-v1.5 1.3 GB Best English accuracy
E5 Large (Pro) intfloat/e5-large-v2 1.3 GB Strong retrieval benchmark
Long ctx (Pro) nomic-ai/nomic-embed-text-v1.5 547 MB 8,192-token context window

Any model from SBERT, HuggingFace sentence-similarity, or Ollama can be used by typing the model name directly.



Building from source

Studio requires Rust, Node.js 20+, and the Tauri prerequisites for Windows.

# Python CLI
pip install -e ".[dev]"
pytest

# Studio (dev server with hot-reload)
cd studio
npm install
npm run tauri dev

# Studio (production build)
npm run tauri build

See studio/README.md for the full build guide.



Changelog · Contributing · Licensing · GitHub

Python CLI: Apache-2.0 · Studio (v1.3.0+): FSL-1.1-MIT — see LICENSES.md

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

remex_cli-1.5.2.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

remex_cli-1.5.2-py3-none-any.whl (60.7 kB view details)

Uploaded Python 3

File details

Details for the file remex_cli-1.5.2.tar.gz.

File metadata

  • Download URL: remex_cli-1.5.2.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for remex_cli-1.5.2.tar.gz
Algorithm Hash digest
SHA256 fd05d73649807d37e151bbfa658dc09e6bf95094cd4d3958bf58948ee6c2ea0f
MD5 1522d48ad4b721e69cd13e6039bf4198
BLAKE2b-256 bea1123ac0fedfa497d8f032da235cf5d3edf1426b58441ef8c6a34e317c6a5a

See more details on using hashes here.

Provenance

The following attestation bundles were made for remex_cli-1.5.2.tar.gz:

Publisher: publish.yml on adm-crow/remex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file remex_cli-1.5.2-py3-none-any.whl.

File metadata

  • Download URL: remex_cli-1.5.2-py3-none-any.whl
  • Upload date:
  • Size: 60.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for remex_cli-1.5.2-py3-none-any.whl
Algorithm Hash digest
SHA256 dc7db016b509fef56c8fc5b7f50b857a7dd4831629623ba7b337a86e168eb9e6
MD5 c209b20743209a88a9b41f457706b8d2
BLAKE2b-256 a1f6e22ecb12d1ad50d2fcfb49586e0f8842c5e79ffe0bb400c9603073a26c9b

See more details on using hashes here.

Provenance

The following attestation bundles were made for remex_cli-1.5.2-py3-none-any.whl:

Publisher: publish.yml on adm-crow/remex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page