Skip to main content

Installable RAG + MCP skills framework with a reliability-loop workflow.

Project description

rag-ai-scientist

Installable toolkit for local RAG indexing + MCP serving in scientific workflows.

PyPI Python License

rag-ai-scientist gives you:

  • a CLI to initialize and build a local vector database from your references,
  • an MCP server entrypoint for Cursor/agent integrations,
  • packaged reusable skills under rag_ai_scientist/skills/.

Installation

From PyPI (recommended for end users)

Project page: rag-ai-scientist on PyPI

python -m pip install rag-ai-scientist==0.1.0

Or install the latest published release:

python -m pip install rag-ai-scientist

From source (recommended while developing)

uv venv .venv
source .venv/bin/activate
uv pip install -e .

If uv is not available, fallback to:

python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e .

Recommended isolation: keep this in a dedicated environment (for example venvs/rag-ai-scientist) rather than reusing analysis environments such as ecalgnn311.

Verify install

python -m pip show rag-ai-scientist
rag-ai-scientist --help
python -c "import rag_ai_scientist; print(rag_ai_scientist.__version__)"

Quickstart

  1. Initialize configs/references.yaml for your analysis repo:
rag-ai-scientist init-references \
  --project-root . \
  --references-dir /path/to/references
  1. Build the local RAG database:
rag-ai-scientist setup-rag --project-root . --force
  1. Start the MCP server:
rag-ai-scientist mcp --project-root .

CLI Commands

init-references

Creates configs/references.yaml with source paths, chunking, and doc-type rules.

Useful options:

  • --references-dir path containing .pdf/.md/.txt/.tex/.py/.rst
  • --collection-name default: rag-ai-scientist
  • --chunk-size, --chunk-overlap
  • --scientific-chunk-size, --scientific-chunk-overlap
  • --force overwrite existing config

setup-rag

Indexes references and writes ChromaDB to .cursor/rag_db.

Useful options:

  • --force rebuild from scratch
  • --collection-name override config collection
  • --chunk-size, --chunk-overlap runtime overrides

mcp

Starts the stdio MCP server for Cursor or compatible MCP clients.

Cursor MCP Configuration

Example ~/.cursor/mcp.json entry:

{
  "mcpServers": {
    "rag-ai-scientist": {
      "command": "rag-ai-scientist",
      "args": ["mcp", "--project-root", "/absolute/path/to/analysis-repo"]
    }
  }
}

Running Agents With Separate Training Environments

If agents should run training/inference scripts and update configs, use two environments in parallel:

  • rag-ai-scientist environment: runs MCP server and agent logic.
  • analysis/training environment: runs model training and inference commands.

This avoids dependency conflicts while still letting agents orchestrate the full workflow for another repository.

Recommended architecture

  1. Keep a dedicated environment for rag-ai-scientist:
cd /path/to/rag-ai-scientist-installable
uv venv .venv
source .venv/bin/activate
uv pip install -e .
  1. Keep your analysis repository and its own environment separate:
  • repo: /path/to/analysis-repo
  • env: /path/to/analysis-env (conda or venv)
  1. Start MCP from the rag-ai-scientist environment, but point it to the analysis repo:
rag-ai-scientist mcp --project-root /path/to/analysis-repo
  1. Let agents launch analysis commands explicitly inside the analysis environment (for example via conda run -p), instead of relying on ambient shell state.

Safe command wrapper for agent execution

Create a wrapper script in the analysis repo (example: /path/to/analysis-repo/scripts/run_training.sh) and let agents call only this script:

#!/usr/bin/env bash
set -euo pipefail

ANALYSIS_ENV="/path/to/analysis-env"
ANALYSIS_REPO="/path/to/analysis-repo"

cd "$ANALYSIS_REPO"
exec conda run -p "$ANALYSIS_ENV" python scripts/train.py "$@"

This gives deterministic execution and avoids accidental environment drift.

Guardrails for autonomous edits and runs

  • Restrict editable files to a whitelist (for example configs/**/*.yaml).
  • Keep one output directory per run (runs/<timestamp>_<tag>).
  • Save the exact config snapshot and command used for each run.
  • Use a lock file to prevent concurrent training launches.
  • Require human approval before expensive or long GPU jobs.

Package Layout

rag_ai_scientist/
  cli.py                  # Installable CLI entrypoint
  mcp_server.py           # MCP server implementation
  skills/                 # Packaged reusable skills
rag/
  index_documents.py      # Indexing backend used by setup-rag
configs/
  references.example.yaml # Example indexing config

Development

python -m pip install -e .
python -m pip install build
python -m build

License

  • Open-source: AGPL-3.0-or-later (LICENSE)
  • Commercial: see LICENSE-COMMERCIAL.md

Security Notes

  • Never commit secrets (.env, API keys, tokens).
  • Keep local vector stores and credentials in gitignored paths.
  • Review indexed sources before sharing databases externally.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rag_ai_scientist-0.1.1.tar.gz (21.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rag_ai_scientist-0.1.1-py3-none-any.whl (26.0 kB view details)

Uploaded Python 3

File details

Details for the file rag_ai_scientist-0.1.1.tar.gz.

File metadata

  • Download URL: rag_ai_scientist-0.1.1.tar.gz
  • Upload date:
  • Size: 21.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for rag_ai_scientist-0.1.1.tar.gz
Algorithm Hash digest
SHA256 86a7e5a119c8dfed3a6a25036f3d54d7c98299411eb04d44c5d8c647af4a9ab7
MD5 8333692e11376e1d4fcf0940dc3b24ca
BLAKE2b-256 07f20968f2eed2bed8d2829c880afd04eb92ec3926cb4b8ed52b2e326ae4f158

See more details on using hashes here.

File details

Details for the file rag_ai_scientist-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for rag_ai_scientist-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 986b695a6df0ac8360244d8e2b31cd742516dc0492aee404accfecb1b1b6e823
MD5 326fcfe5aae29530a078ffd2e2376839
BLAKE2b-256 1c63f4743e6081bce8435b261910ddda1b136f31a3cd9f0034cfdbf08a956874

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page