A Model Context Protocol (MCP) server for Confluence RAG with ChromaDB vector search

These details have not been verified by PyPI

Project links

Project description

Confluence RAG Data Pipeline with MCP Protocol

A Model Context Protocol (MCP) server that provides relevant context from Confluence pages using RAG (Retrieval Augmented Generation).

Features

Crawls Confluence spaces and pages
Stores document vectors using ChromaDB
Implements MCP protocol for context retrieval
Supports filtering by space, labels, and metadata
Handles attachments and comments
Provides REST API endpoints

Requirements

Python 3.9 or higher
UV for dependency management
Confluence API access token
ChromaDB for vector storage

Installation

Setup Python Environment:
- Make sure you have Python 3.9 or higher installed
```
python --version
```
- Install UV if you haven't already:
```
curl -LsSf https://astral.sh/uv/install.sh | sh
```

Clone and Setup Project:

git clone <repository-url>
cd confluence-scraper-mcp
# Create virtual environment
uv venv .venv
# Activate virtual environment
source .venv/bin/activate
# Install dependencies
uv pip install -r requirements.txt

Configure Environment:

Create a .env file in the project root:

touch .env

Add the following configuration (adjust values as needed):

# Required settings
CONFLUENCE_BASE_URL=https://your-domain.atlassian.net
CONFLUENCE_TOKEN=your-api-token
CONFLUENCE_SPACE_KEY=optional-space-key

# Optional settings (with defaults)
INITIAL_CRAWL=false
CHROMA_PERSIST_DIR=./data/chroma
EMBEDDING_MODEL="all-MiniLM-L6-v2"
MAX_PAGES=1000
INCLUDE_ATTACHMENTS=true
INCLUDE_COMMENTS=true

Usage

Using uvx (Recommended):

# Development mode with auto-reload
uvx uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

# Run tests
uvx pytest

# Code formatting and checks
uvx black .
uvx isort .
uvx mypy .

Alternative: Using Virtual Environment:

# Activate virtual environment
source .venv/bin/activate

# Then run commands as usual
uvicorn app.main:app --host 0.0.0.0 --port 8000 --reload

Initial Setup:

# Start initial crawl of Confluence pages
curl -X POST http://localhost:8000/crawl

# Verify server health
curl http://localhost:8000/health

Use the MCP API:

# Get context for an LLM query
curl -X POST http://localhost:8000/mcp/context \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "Tell me about project X"}],
    "query": "project X documentation",
    "max_context_length": 1000
  }'

# The response will include relevant context from your Confluence pages

Monitor and Maintain:

# View logs
tail -f logs/app.log

# Re-crawl Confluence (e.g., after updates)
curl -X POST http://localhost:8000/crawl

API Endpoints

GET /health: Health check endpoint
POST /crawl: Trigger Confluence crawl
POST /mcp/context: Get relevant context for a query

Using with Code Assistants

This MCP server is specialized for Confluence documentation and uses RAG (Retrieval Augmented Generation) with ChromaDB, which makes it different from typical MCP servers in several ways:

Confluence Integration:
- Direct integration with Confluence API
- Handles Confluence-specific content types (pages, attachments, comments)
- Preserves Confluence metadata (space keys, labels, authors)
Vector Search:
- Uses ChromaDB for semantic search instead of traditional text search
- Embeddings are generated using sentence transformers
- More accurate context retrieval based on meaning, not just keywords
Filtering Capabilities:
- Can filter by Confluence space keys
- Supports label-based filtering
- Can include/exclude attachments and comments
- Configurable context length per endpoint

This MCP server can be integrated with code assistants like GitHub Copilot to provide relevant context from your Confluence documentation. Here's how to set it up:

Start the MCP Server:

# Make sure the server is running
poetry shell
uvicorn app.main:app --port 8000

Configure Your Code Assistant:

For GitHub Copilot:

Open VS Code settings (Cmd+,)
Search for "copilot chat"

Add a new MCP endpoint under "Copilot Chat: MCP Servers" using either:

Option 1: Direct URL

Use URL: http://localhost:8000/mcp/context
Note: This basic setup won't include filtering capabilities

Option 2: MCP Configuration File (Recommended)

An example configuration file is provided in examples/mcp.json
Supports Confluence-specific filtering
Can configure multiple endpoints for different spaces
Allows fine-tuning of context retrieval

{
  "endpoints": [
    {
      "name": "API Documentation",
      "url": "http://localhost:8000/mcp/context",
      "options": {
        "max_context_length": 2000,
        "filter": {
          "space_key": "API",
          "labels": ["technical-docs", "api-reference"],
          "include_comments": true,
          "include_attachments": false,
          "semantic_ranking": {
            "weight": 0.7,
            "model": "all-MiniLM-L6-v2"
          }
        }
      },
      "authentication": {
        "type": "none"
      }
    },
    {
      "name": "Architecture Docs",
      "url": "http://localhost:8000/mcp/context",
      "options": {
        "max_context_length": 3000,
        "filter": {
          "space_key": "ARCH",
          "labels": ["architecture", "design"],
          "include_comments": false,
          "include_attachments": true,
          "semantic_ranking": {
            "weight": 0.8,
            "model": "all-MiniLM-L6-v2"
          }
        }
      },
      "authentication": {
        "type": "none"
      }
    }
  ],
  "default_endpoint": "API Documentation"
}

Add the path to this file in VS Code settings under "Copilot Chat: MCP Configuration File"
See examples/mcp.json for a full example with multiple endpoints and filtering options

Usage with Copilot:
- In VS Code, open Copilot Chat (Cmd+I)
- Your queries will now include relevant context from your Confluence pages
- Example: "How do I implement feature X?" will include context from related Confluence documentation
- You can also use /doc command in Copilot Chat to explicitly search documentation
Tips for Better Results:
- Keep Confluence pages well-organized and up-to-date
- Use descriptive titles and labels in Confluence
- Re-crawl after significant documentation updates:
```
curl -X POST http://localhost:8000/crawl
```

Development

Install Development Dependencies:
```
uv pip install -r requirements.txt
```
Using uvx for Development: UV installs a command runner called uvx that can run Python scripts and modules without explicitly activating the virtual environment:
```
# Run the FastAPI server
uvx uvicorn app.main:app --reload

# Run tests
uvx pytest

# Code formatting
uvx black .
uvx isort .
uvx mypy .
```

Environment Configuration: The project uses environment variables for configuration. Copy .env.example to .env and update the values:

CONFLUENCE_BASE_URL=https://your-domain.atlassian.net
CONFLUENCE_TOKEN=your-api-token
CONFLUENCE_SPACE_KEY=your-space-key
CHROMA_PERSIST_DIR=data/chroma
CHROMA_COLLECTION_NAME=confluence_docs
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2
CHUNK_SIZE=512
CHUNK_OVERLAP=50
TOP_K=3
SIMILARITY_THRESHOLD=0.7

Contributing

Fork the repository
Create your feature branch (git checkout -b feature/amazing-feature)
Make your changes:
- Use uvx black . and uvx isort . to format code
- Use uvx mypy . for type checking
- Add tests for new features
- Update documentation as needed
Run tests (uvx pytest)
Commit your changes (git commit -m 'Add some amazing feature')
Push to the branch (git push origin feature/amazing-feature)
Open a Pull Request

License

MIT License. See LICENSE for more information.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.3

Jul 19, 2025

This version

0.1.2

Jul 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

confluence_scraper_mcp-0.1.2.tar.gz (21.9 kB view details)

Uploaded Jul 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

confluence_scraper_mcp-0.1.2-py3-none-any.whl (15.7 kB view details)

Uploaded Jul 19, 2025 Python 3

File details

Details for the file confluence_scraper_mcp-0.1.2.tar.gz.

File metadata

Download URL: confluence_scraper_mcp-0.1.2.tar.gz
Upload date: Jul 19, 2025
Size: 21.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for confluence_scraper_mcp-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`20b6e375ddc88f7cd6eb3f5306e5b54bc0739b19c2da96afb71cd3f53e95fcff`
MD5	`f7055d5f14ef29cd56f7262ac16481c4`
BLAKE2b-256	`e968102f7b1b8b87f166d057e096dcc0de8ba1515e3a66fee016784e37d650fa`

See more details on using hashes here.

File details

Details for the file confluence_scraper_mcp-0.1.2-py3-none-any.whl.

File metadata

Download URL: confluence_scraper_mcp-0.1.2-py3-none-any.whl
Upload date: Jul 19, 2025
Size: 15.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.2

File hashes

Hashes for confluence_scraper_mcp-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0a175cd3c10717cc16994efdef3faf339f5664852120221a9c65c5c447d3b243`
MD5	`d40acab34d29a4df12c71193880a60a7`
BLAKE2b-256	`6ea8ebd8d612c5aa02d738e6004703cfc4d841ec40417d923ba36ff5dd22b6db`

See more details on using hashes here.

confluence-scraper-mcp 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Confluence RAG Data Pipeline with MCP Protocol

Features

Requirements

Installation

Usage

API Endpoints

Using with Code Assistants

Development

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes