Installable RAG + MCP skills framework with a reliability-loop workflow.

Project description

rag-ai-scientist

Installable toolkit for local RAG indexing + MCP serving in scientific workflows.

rag-ai-scientist gives you:

a CLI to initialize and build a local vector database from your references,
an MCP server entrypoint for Cursor/agent integrations,
packaged reusable skills under rag_ai_scientist/skills/.

Installation

From source (recommended while developing)

uv venv .venv
source .venv/bin/activate
uv pip install -e .

If uv is not available, fallback to:

python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e .

Recommended isolation: keep this in a dedicated environment (for example venvs/rag-ai-scientist) rather than reusing analysis environments such as ecalgnn311.

Verify install

python -m pip show rag-ai-scientist
rag-ai-scientist --help
python -c "import rag_ai_scientist; print(rag_ai_scientist.__version__)"

Quickstart

Initialize configs/references.yaml for your analysis repo:

rag-ai-scientist init-references \
  --project-root . \
  --references-dir /path/to/references

Build the local RAG database:

rag-ai-scientist setup-rag --project-root . --force

Start the MCP server:

rag-ai-scientist mcp --project-root .

CLI Commands

`init-references`

Creates configs/references.yaml with source paths, chunking, and doc-type rules.

Useful options:

--references-dir path containing .pdf/.md/.txt/.tex/.py/.rst
--collection-name default: rag-ai-scientist
--chunk-size, --chunk-overlap
--scientific-chunk-size, --scientific-chunk-overlap
--force overwrite existing config

`setup-rag`

Indexes references and writes ChromaDB to .cursor/rag_db.

Useful options:

--force rebuild from scratch
--collection-name override config collection
--chunk-size, --chunk-overlap runtime overrides

`mcp`

Starts the stdio MCP server for Cursor or compatible MCP clients.

Cursor MCP Configuration

Example ~/.cursor/mcp.json entry:

{
  "mcpServers": {
    "rag-ai-scientist": {
      "command": "rag-ai-scientist",
      "args": ["mcp", "--project-root", "/absolute/path/to/analysis-repo"]
    }
  }
}

Running Agents With Separate Training Environments

If agents should run training/inference scripts and update configs, use two environments in parallel:

rag-ai-scientist environment: runs MCP server and agent logic.
analysis/training environment: runs model training and inference commands.

This avoids dependency conflicts while still letting agents orchestrate the full workflow for another repository.

Recommended architecture

Keep a dedicated environment for rag-ai-scientist:

cd /path/to/rag-ai-scientist-installable
uv venv .venv
source .venv/bin/activate
uv pip install -e .

Keep your analysis repository and its own environment separate:

repo: /path/to/analysis-repo
env: /path/to/analysis-env (conda or venv)

Start MCP from the rag-ai-scientist environment, but point it to the analysis repo:

rag-ai-scientist mcp --project-root /path/to/analysis-repo

Let agents launch analysis commands explicitly inside the analysis environment (for example via conda run -p), instead of relying on ambient shell state.

Safe command wrapper for agent execution

Create a wrapper script in the analysis repo (example: /path/to/analysis-repo/scripts/run_training.sh) and let agents call only this script:

#!/usr/bin/env bash
set -euo pipefail

ANALYSIS_ENV="/path/to/analysis-env"
ANALYSIS_REPO="/path/to/analysis-repo"

cd "$ANALYSIS_REPO"
exec conda run -p "$ANALYSIS_ENV" python scripts/train.py "$@"

This gives deterministic execution and avoids accidental environment drift.

Guardrails for autonomous edits and runs

Restrict editable files to a whitelist (for example configs/**/*.yaml).
Keep one output directory per run (runs/<timestamp>_<tag>).
Save the exact config snapshot and command used for each run.
Use a lock file to prevent concurrent training launches.
Require human approval before expensive or long GPU jobs.

Package Layout

rag_ai_scientist/
  cli.py                  # Installable CLI entrypoint
  mcp_server.py           # MCP server implementation
  skills/                 # Packaged reusable skills
rag/
  index_documents.py      # Indexing backend used by setup-rag
configs/
  references.example.yaml # Example indexing config

Development

python -m pip install -e .
python -m pip install build
python -m build

License

Open-source: AGPL-3.0-or-later (LICENSE)
Commercial: see LICENSE-COMMERCIAL.md

Security Notes

Never commit secrets (.env, API keys, tokens).
Keep local vector stores and credentials in gitignored paths.
Review indexed sources before sharing databases externally.

Project details

Release history Release notifications | RSS feed

0.1.3

May 2, 2026

0.1.2

May 2, 2026

0.1.1

May 2, 2026

This version

0.1.0

Apr 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rag_ai_scientist-0.1.0.tar.gz (21.3 kB view details)

Uploaded Apr 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rag_ai_scientist-0.1.0-py3-none-any.whl (25.9 kB view details)

Uploaded Apr 22, 2026 Python 3

File details

Details for the file rag_ai_scientist-0.1.0.tar.gz.

File metadata

Download URL: rag_ai_scientist-0.1.0.tar.gz
Upload date: Apr 22, 2026
Size: 21.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for rag_ai_scientist-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`cfe6b749bc6e5ef8e60bd2544f43adeb3525b06b05697fa0cf944cb56d62d874`
MD5	`2e9222164ecd44894716db7bbea7add1`
BLAKE2b-256	`34897ab9a42536462a4b68020066e62d858779b36e6fe806fc1627d4c309e86c`

See more details on using hashes here.

File details

Details for the file rag_ai_scientist-0.1.0-py3-none-any.whl.

File metadata

Download URL: rag_ai_scientist-0.1.0-py3-none-any.whl
Upload date: Apr 22, 2026
Size: 25.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for rag_ai_scientist-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`94e8ae83cf5978c43d2092f592689c5043a98d8dd1d0d2b47ec02e468d6a3a0c`
MD5	`9491ead90569cb6b0acc1e9142b02b9d`
BLAKE2b-256	`4112c77f536dbcad2e4404a3559f82df8a16591713498bff13df26d98473d2e6`

See more details on using hashes here.

rag-ai-scientist 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

rag-ai-scientist

Installation

From source (recommended while developing)

Verify install

Quickstart

CLI Commands

`init-references`

`setup-rag`

`mcp`

Cursor MCP Configuration

Running Agents With Separate Training Environments

Recommended architecture

Safe command wrapper for agent execution

Guardrails for autonomous edits and runs

Package Layout

Development

License

Security Notes

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes