Skip to main content

Installable RAG + MCP skills framework with a reliability-loop workflow.

Project description

rag-ai-scientist

Installable toolkit for local RAG indexing + MCP serving in scientific workflows.

PyPI Python License

rag-ai-scientist gives you:

  • a CLI to initialize and build a local vector database from your papers and notes,
  • an MCP server entrypoint for Cursor / agent integrations,
  • packaged skills under rag_ai_scientist/skills/ (workflow checklists—no Git clone needed).

End-user workflow (pip only — no GitHub)

You install from PyPI, create any folder for your project, put your research materials there, index once, then connect Cursor.

Full step-by-step: docs/GETTING_STARTED.md — install → references/init-referencessetup-rag → MCP → update notes and rebuild.

Minimal command sequence (after pip install rag-ai-scientist):

mkdir -p ~/my-ai-scientist/references
cd ~/my-ai-scientist
# Add your own .md / .pdf files under references/

rag-ai-scientist init-references --project-root . --references-dir ./references
rag-ai-scientist setup-rag --project-root . --force
rag-ai-scientist mcp --project-root .    # usually configured once inside Cursor — see GETTING_STARTED
  • query_analysis_knowledge answers from your indexed files.
  • get_skill loads packaged skills (e.g. cms-higgs-opendata) without indexing anything extra.

You update your AI scientist by editing files under references/ (and configs/references.yaml if paths change), then setup-rag --force again.


Installation

From PyPI (recommended)

python -m pip install rag-ai-scientist

Pinned example:

python -m pip install rag-ai-scientist==0.1.2

PyPI: rag-ai-scientist

Verify

rag-ai-scientist --help
python -c "import rag_ai_scientist; print(rag_ai_scientist.__version__)"

From source (maintainers / contributors only)

git clone <your fork or upstream URL>
cd rag-ai-scientist-installable   # or package repo name
python3 -m venv .venv && source .venv/bin/activate
python -m pip install -e .

Isolation tip: use a dedicated venv (e.g. ~/venvs/rag-ai-scientist) instead of mixing with heavy analysis stacks.


CLI commands

Command Purpose
init-references Writes configs/references.yaml pointing at your references directory.
setup-rag Indexes sources into .cursor/rag_db.
mcp Starts the stdio MCP server — point --project-root at the same folder you indexed.

Common flags: --project-root, --force (rebuild index), --references-dir (with init-references).


Cursor MCP configuration

Register the server so Cursor runs it with your project path:

{
  "mcpServers": {
    "rag-ai-scientist": {
      "command": "rag-ai-scientist",
      "args": ["mcp", "--project-root", "/absolute/path/to/my-ai-scientist"]
    }
  }
}

See docs/GETTING_STARTED.md for optional .cursor/.env (LLM keys).


Packaged skills and examples

  • Skills ship inside the installed package. Access via MCP get_skill (e.g. cms-higgs-opendata). No clone required.
  • docs/examples/README.md explains get_skill, Cursor wiring, and optional curated markdown for maintainers who ship a full docs tree. End users normally only need their own files under references/.

Running agents beside a separate lab environment

If training runs use a different conda/venv than rag-ai-scientist:

  1. Install rag-ai-scientist in its own small venv.
  2. Keep --project-root pointed at your research folder.
  3. Run heavy jobs via explicit wrappers (conda run, scripts) from the agent — see docs/RUNBOOK.md if present for patterns.

Repository layout (when developing from source)

rag_ai_scientist/
  cli.py                  # CLI entrypoint
  mcp_server.py           # MCP server
  skills/                 # Packaged skills (ship in wheel)
rag/
  index_documents.py      # Indexer used by setup-rag
configs/
  references.example.yaml # Example only — users run init-references instead
docs/
  GETTING_STARTED.md      # Primary user guide (pip-only path)
  examples/               # Maintainer docs / optional narratives

Development & PyPI releases

Contributor workflow and release steps: DEV_README.md.


License

  • Open-source: AGPL-3.0-or-later (LICENSE)
  • Commercial: see LICENSE-COMMERCIAL.md

Security notes

  • Never commit secrets (.env, API keys).
  • Treat .cursor/rag_db as sensitive if your indexed PDFs are sensitive.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rag_ai_scientist-0.1.2.tar.gz (23.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rag_ai_scientist-0.1.2-py3-none-any.whl (28.1 kB view details)

Uploaded Python 3

File details

Details for the file rag_ai_scientist-0.1.2.tar.gz.

File metadata

  • Download URL: rag_ai_scientist-0.1.2.tar.gz
  • Upload date:
  • Size: 23.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for rag_ai_scientist-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b0cb75e161063730e335e6f8ee48ddc8b58280d2173c54d5fc55c46cb99f9fc6
MD5 f13edc81869b88bfe82fc18a02d022fd
BLAKE2b-256 68dc8a247cb8c73de97734ad803b63f23f4b5092eb0ae3bca25031a83c732054

See more details on using hashes here.

File details

Details for the file rag_ai_scientist-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for rag_ai_scientist-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 6afe80bccd317a2a3b85f3648a4d6ac299d76dc3f8840dc63ff7901b4b48f846
MD5 33d7952ac9495bcb07133d774a577709
BLAKE2b-256 3092d0a185e08d59db68b94be697ad01ccc85f62757aa08dad307e476ff96fa8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page