Installable RAG + MCP skills framework with a reliability-loop workflow.
Project description
rag-ai-scientist
Installable toolkit for local RAG indexing + MCP serving in scientific workflows.
rag-ai-scientist gives you:
- a CLI to initialize and build a local vector database from your references,
- an MCP server entrypoint for Cursor/agent integrations,
- packaged reusable skills under
rag_ai_scientist/skills/.
Installation
From PyPI (recommended for end users)
Project page: rag-ai-scientist on PyPI
python -m pip install rag-ai-scientist==0.1.0
Or install the latest published release:
python -m pip install rag-ai-scientist
From source (recommended while developing)
uv venv .venv
source .venv/bin/activate
uv pip install -e .
If uv is not available, fallback to:
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e .
Recommended isolation: keep this in a dedicated environment (for example
venvs/rag-ai-scientist) rather than reusing analysis environments such as
ecalgnn311.
Verify install
python -m pip show rag-ai-scientist
rag-ai-scientist --help
python -c "import rag_ai_scientist; print(rag_ai_scientist.__version__)"
Quickstart
- Initialize
configs/references.yamlfor your analysis repo:
rag-ai-scientist init-references \
--project-root . \
--references-dir /path/to/references
- Build the local RAG database:
rag-ai-scientist setup-rag --project-root . --force
- Start the MCP server:
rag-ai-scientist mcp --project-root .
CLI Commands
init-references
Creates configs/references.yaml with source paths, chunking, and doc-type rules.
Useful options:
--references-dirpath containing.pdf/.md/.txt/.tex/.py/.rst--collection-namedefault:rag-ai-scientist--chunk-size,--chunk-overlap--scientific-chunk-size,--scientific-chunk-overlap--forceoverwrite existing config
setup-rag
Indexes references and writes ChromaDB to .cursor/rag_db.
Useful options:
--forcerebuild from scratch--collection-nameoverride config collection--chunk-size,--chunk-overlapruntime overrides
mcp
Starts the stdio MCP server for Cursor or compatible MCP clients.
Cursor MCP Configuration
Example ~/.cursor/mcp.json entry:
{
"mcpServers": {
"rag-ai-scientist": {
"command": "rag-ai-scientist",
"args": ["mcp", "--project-root", "/absolute/path/to/analysis-repo"]
}
}
}
Running Agents With Separate Training Environments
If agents should run training/inference scripts and update configs, use two environments in parallel:
rag-ai-scientistenvironment: runs MCP server and agent logic.- analysis/training environment: runs model training and inference commands.
This avoids dependency conflicts while still letting agents orchestrate the full workflow for another repository.
Recommended architecture
- Keep a dedicated environment for
rag-ai-scientist:
cd /path/to/rag-ai-scientist-installable
uv venv .venv
source .venv/bin/activate
uv pip install -e .
- Keep your analysis repository and its own environment separate:
- repo:
/path/to/analysis-repo - env:
/path/to/analysis-env(conda or venv)
- Start MCP from the
rag-ai-scientistenvironment, but point it to the analysis repo:
rag-ai-scientist mcp --project-root /path/to/analysis-repo
- Let agents launch analysis commands explicitly inside the analysis
environment (for example via
conda run -p), instead of relying on ambient shell state.
Safe command wrapper for agent execution
Create a wrapper script in the analysis repo (example:
/path/to/analysis-repo/scripts/run_training.sh) and let agents call only this
script:
#!/usr/bin/env bash
set -euo pipefail
ANALYSIS_ENV="/path/to/analysis-env"
ANALYSIS_REPO="/path/to/analysis-repo"
cd "$ANALYSIS_REPO"
exec conda run -p "$ANALYSIS_ENV" python scripts/train.py "$@"
This gives deterministic execution and avoids accidental environment drift.
Guardrails for autonomous edits and runs
- Restrict editable files to a whitelist (for example
configs/**/*.yaml). - Keep one output directory per run (
runs/<timestamp>_<tag>). - Save the exact config snapshot and command used for each run.
- Use a lock file to prevent concurrent training launches.
- Require human approval before expensive or long GPU jobs.
Package Layout
rag_ai_scientist/
cli.py # Installable CLI entrypoint
mcp_server.py # MCP server implementation
skills/ # Packaged reusable skills
rag/
index_documents.py # Indexing backend used by setup-rag
configs/
references.example.yaml # Example indexing config
Development
python -m pip install -e .
python -m pip install build
python -m build
License
- Open-source: AGPL-3.0-or-later (
LICENSE) - Commercial: see
LICENSE-COMMERCIAL.md
Security Notes
- Never commit secrets (
.env, API keys, tokens). - Keep local vector stores and credentials in gitignored paths.
- Review indexed sources before sharing databases externally.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rag_ai_scientist-0.1.1.tar.gz.
File metadata
- Download URL: rag_ai_scientist-0.1.1.tar.gz
- Upload date:
- Size: 21.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
86a7e5a119c8dfed3a6a25036f3d54d7c98299411eb04d44c5d8c647af4a9ab7
|
|
| MD5 |
8333692e11376e1d4fcf0940dc3b24ca
|
|
| BLAKE2b-256 |
07f20968f2eed2bed8d2829c880afd04eb92ec3926cb4b8ed52b2e326ae4f158
|
File details
Details for the file rag_ai_scientist-0.1.1-py3-none-any.whl.
File metadata
- Download URL: rag_ai_scientist-0.1.1-py3-none-any.whl
- Upload date:
- Size: 26.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
986b695a6df0ac8360244d8e2b31cd742516dc0492aee404accfecb1b1b6e823
|
|
| MD5 |
326fcfe5aae29530a078ffd2e2376839
|
|
| BLAKE2b-256 |
1c63f4743e6081bce8435b261910ddda1b136f31a3cd9f0034cfdbf08a956874
|