Skip to main content

MCP server for RAGStack knowledge base - search, chat, upload, and scrape

Project description

RAGStack MCP Server

MCP (Model Context Protocol) server for RAGStack knowledge bases. Enables AI assistants to search, chat, upload documents/media, and scrape your knowledge base.

Installation

# Using uvx (recommended - no install needed)
uvx ragstack-mcp

# Or install globally
pip install ragstack-mcp

Configuration

Get your GraphQL endpoint and API key from the RAGStack dashboard: Settings → API Key

Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (Mac) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "ragstack-kb": {
      "command": "uvx",
      "args": ["ragstack-mcp"],
      "env": {
        "RAGSTACK_GRAPHQL_ENDPOINT": "https://xxx.appsync-api.us-east-1.amazonaws.com/graphql",
        "RAGSTACK_API_KEY": "da2-xxxxxxxxxxxx"
      }
    }
  }
}

Amazon Q CLI

Edit ~/.aws/amazonq/mcp.json:

{
  "mcpServers": {
    "ragstack-kb": {
      "command": "uvx",
      "args": ["ragstack-mcp"],
      "env": {
        "RAGSTACK_GRAPHQL_ENDPOINT": "https://xxx.appsync-api.us-east-1.amazonaws.com/graphql",
        "RAGSTACK_API_KEY": "da2-xxxxxxxxxxxx"
      }
    }
  }
}

Cursor

Open Settings → MCP Servers → Add Server, or edit .cursor/mcp.json:

{
  "ragstack-kb": {
    "command": "uvx",
    "args": ["ragstack-mcp"],
    "env": {
      "RAGSTACK_GRAPHQL_ENDPOINT": "https://xxx.appsync-api.us-east-1.amazonaws.com/graphql",
      "RAGSTACK_API_KEY": "da2-xxxxxxxxxxxx"
    }
  }
}

VS Code + Cline

Edit .vscode/cline_mcp_settings.json:

{
  "mcpServers": {
    "ragstack-kb": {
      "command": "uvx",
      "args": ["ragstack-mcp"],
      "env": {
        "RAGSTACK_GRAPHQL_ENDPOINT": "https://xxx.appsync-api.us-east-1.amazonaws.com/graphql",
        "RAGSTACK_API_KEY": "da2-xxxxxxxxxxxx"
      }
    }
  }
}

VS Code + Continue

Edit ~/.continue/config.json, add to mcpServers array:

{
  "mcpServers": [
    {
      "name": "ragstack-kb",
      "command": "uvx",
      "args": ["ragstack-mcp"],
      "env": {
        "RAGSTACK_GRAPHQL_ENDPOINT": "https://xxx.appsync-api.us-east-1.amazonaws.com/graphql",
        "RAGSTACK_API_KEY": "da2-xxxxxxxxxxxx"
      }
    }
  ]
}

Available Tools

search_knowledge_base

Search for relevant documents in the knowledge base.

Parameter Type Required Default Description
query string Yes - The search query
max_results int No 5 Maximum results to return

chat_with_knowledge_base

Ask questions and get AI-generated answers with source citations.

Parameter Type Required Default Description
query string Yes - Your question
conversation_id string No null ID to maintain conversation context

start_scrape_job

Scrape a website into the knowledge base.

Parameter Type Required Default Description
url string Yes - Starting URL to scrape
max_pages int No 50 Maximum pages to scrape
max_depth int No 3 How deep to follow links (0 = start page only)
scope string No "HOSTNAME" SUBPAGES, HOSTNAME, or DOMAIN
include_patterns list[str] No null Only scrape URLs matching these glob patterns
exclude_patterns list[str] No null Skip URLs matching these glob patterns
scrape_mode string No "AUTO" AUTO, FAST (HTTP only), or FULL (browser)
cookies string No null Cookie string for authenticated sites
force_rescrape bool No false Re-scrape even if content unchanged

Scope values:

  • SUBPAGES - Only URLs under the starting path
  • HOSTNAME - All pages on the same subdomain
  • DOMAIN - All subdomains of the domain

Scrape mode values:

  • AUTO - Try fast mode, fall back to full for SPAs
  • FAST - HTTP only, faster but may miss JavaScript content
  • FULL - Uses headless browser, handles all JavaScript

get_scrape_job_status

Check the status of a scrape job.

Parameter Type Required Description
job_id string Yes The scrape job ID

list_scrape_jobs

List recent scrape jobs.

Parameter Type Required Default Description
limit int No 10 Maximum jobs to return

upload_document_url

Get a presigned URL to upload a document or media file.

Parameter Type Required Description
filename string Yes Name of the file (e.g., 'report.pdf', 'meeting.mp4')

Supported formats:

  • Documents: PDF, DOCX, XLSX, HTML, TXT, CSV, JSON, XML, EML, EPUB, Markdown
  • Images: JPG, PNG, GIF, WebP, AVIF, BMP, TIFF
  • Video: MP4, WebM
  • Audio: MP3, WAV, M4A, OGG, FLAC

Video/audio files are transcribed using AWS Transcribe and segmented for search.

upload_image_url

Get a presigned URL to upload an image (step 1 of image upload workflow).

Parameter Type Required Description
filename string Yes Name of the image file (e.g., 'photo.jpg')

Supported formats: JPEG, PNG, GIF, WebP, AVIF, BMP, TIFF

generate_image_caption

Generate an AI caption for an uploaded image using a vision model (step 2, optional).

Parameter Type Required Description
s3_uri string Yes S3 URI returned by upload_image_url

submit_image

Finalize an image upload and trigger indexing (step 3).

Parameter Type Required Default Description
image_id string Yes - Image ID from upload_image_url
caption string No null Primary caption
user_caption string No null User-provided caption
ai_caption string No null AI-generated caption

Configuration Tools (Read-Only)

get_configuration

Get all current RAGStack configuration settings organized by category.

Returns settings for:

  • Chat: Models, quotas, system prompt, document access
  • Metadata Extraction: Enabled, model, mode (auto/manual), max keys
  • Query-Time Filtering: Filter generation, multi-slice retrieval settings
  • Public Access: Which endpoints allow unauthenticated access
  • Document Processing: OCR backend, image caption prompt
  • Media Processing: Transcribe language, speaker diarization, segment duration
  • Budget: Alert thresholds

Note: Read-only. To modify settings, use the admin dashboard (Cognito auth required).


Metadata Analysis Tools

These tools help understand and optimize metadata extraction and filtering.

get_metadata_stats

Get statistics about metadata keys extracted from documents.

Returns key names, data types, occurrence counts, sample values, and status.

get_filter_examples

Get AI-generated filter examples for metadata-based search queries.

Returns filter patterns with name, description, use case, and JSON filter syntax.

Filter syntax reference:

  • Basic operators: $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $exists
  • Logical operators: $and, $or
  • Example: {"topic": {"$eq": "genealogy"}}

get_key_library

Get the complete metadata key library with all discovered keys.

Returns all keys available for filtering with data types and sample values.

check_key_similarity

Check if a proposed metadata key is similar to existing keys.

Parameter Type Required Default Description
key_name string Yes - Proposed key name to check
threshold float No 0.8 Similarity threshold (0.0-1.0)

Use this before adding documents with new keys to avoid duplicates.

analyze_metadata

Trigger metadata analysis to discover keys and generate filter examples.

Note: This is a long-running operation (1-2 minutes). It samples up to 1000 vectors and uses LLM analysis.

Run this after ingesting new documents or when filter generation isn't working as expected.


Usage Examples

Once configured, just ask your AI assistant naturally:

Search & Chat:

  • "Search my knowledge base for authentication best practices"
  • "What does our documentation say about API rate limits?"
  • "What was discussed in the team meeting about deadlines?" (searches video/audio transcripts)

Web Scraping:

  • "Scrape the React docs at react.dev/reference"
  • "Check the status of my scrape job"

Document, Image & Media Upload:

  • "Upload a new document called quarterly-report.pdf"
  • "Upload this image and generate a caption for it"
  • "Upload the meeting recording meeting-2024-01.mp4"

Metadata Analysis:

  • "What metadata keys are available for filtering?"
  • "Analyze the metadata in my knowledge base"
  • "Show me the filter examples"
  • "Check if 'author' is similar to any existing keys"

Configuration:

  • "What are my current RAGStack settings?"
  • "What model is being used for chat?"
  • "Is multi-slice retrieval enabled?"
  • "What are my quota limits?"
  • "What language is configured for transcription?"

Environment Variables

Variable Required Description
RAGSTACK_GRAPHQL_ENDPOINT Yes Your RAGStack GraphQL API URL
RAGSTACK_API_KEY Yes Your RAGStack API key

Development

# Clone and install
cd src/ragstack-mcp
uv sync

# Run locally
uv run ragstack-mcp

# Build package
uv build

# Publish to PyPI
uv publish

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ragstack_mcp-0.1.8.tar.gz (62.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ragstack_mcp-0.1.8-py3-none-any.whl (18.8 kB view details)

Uploaded Python 3

File details

Details for the file ragstack_mcp-0.1.8.tar.gz.

File metadata

  • Download URL: ragstack_mcp-0.1.8.tar.gz
  • Upload date:
  • Size: 62.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for ragstack_mcp-0.1.8.tar.gz
Algorithm Hash digest
SHA256 842ac6572d000f2706e4440aece2af908ccff8f7a972c2951d92b984f68e8f4a
MD5 975283c943283f6d39d640eebfe2897f
BLAKE2b-256 9fce50ee865154d7d0f1ad723bc0a7d30d8432fba5a067399c902eff3ee9248f

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragstack_mcp-0.1.8.tar.gz:

Publisher: publish-mcp.yml on HatmanStack/RAGStack-Lambda

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ragstack_mcp-0.1.8-py3-none-any.whl.

File metadata

  • Download URL: ragstack_mcp-0.1.8-py3-none-any.whl
  • Upload date:
  • Size: 18.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for ragstack_mcp-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 166d85599dcd051b39e2bc882a7333a90d1739814bca7c3192e76b9860d82bbf
MD5 5c58e67d1dbd59e78b951ca7cf666d69
BLAKE2b-256 1f0ec73589ff8bf11c4200de6d425430f8de83c58c7eb2d1f715955da315ff9f

See more details on using hashes here.

Provenance

The following attestation bundles were made for ragstack_mcp-0.1.8-py3-none-any.whl:

Publisher: publish-mcp.yml on HatmanStack/RAGStack-Lambda

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page