Skip to main content

Simple MCP server for downloading documentation websites

Project description

MseeP.ai Security Assessment Badge

MCP Website Downloader

Simple MCP server for downloading documentation websites and preparing them for RAG indexing.

Features

  • Downloads complete documentation sites, well big chunks anyway.
  • Maintains link structure and navigation, not really. lol
  • Downloads and organizes assets (CSS, JS, images), but isn't really AI friendly and it all probably needs some kind of parsing or vectorizing into a db or something.
  • Creates clean index for RAG systems, currently seems to make an index in each folder, not even looked at it.
  • Simple single-purpose MCP interface, yup.

Installation

Fork and download, cd to the repository.

uv venv
./venv/Scripts/activate
pip install -e .

Put this in your claude_desktop_config.json with your own paths:

   "mcp-windows-website-downloader": {
     "command": "uv",
     "args": [
       "--directory",
       "F:/GithubRepos/mcp-windows-website-downloader",
       "run",
       "mcp-windows-website-downloader",
       "--library",
       "F:/GithubRepos/mcp-windows-website-downloader/website_library"
     ]
   },

alt text

Other Usage you don't need to worry about and may be hallucinatory lol:

  1. Start the server:
python -m mcp_windows_website_downloader.server --library docs_library
  1. Use through Claude Desktop or other MCP clients:
result = await server.call_tool("download", {
    "url": "https://docs.example.com"
})

Output Structure

docs_library/
  domain_name/
    index.html
    about.html
    docs/
      getting-started.html
      ...
    assets/
      css/
      js/
      images/
      fonts/
    rag_index.json

Development

The server follows standard MCP architecture:

src/
  mcp_windows_website_downloader/
    __init__.py
    server.py    # MCP server implementation
    core.py      # Core downloader functionality
    utils.py     # Helper utilities

Components

  • server.py: Main MCP server implementation that handles tool registration and requests
  • core.py: Core website downloading functionality with proper asset handling
  • utils.py: Helper utilities for file handling and URL processing

Design Principles

  1. Single Responsibility

    • Each module has one clear purpose
    • Server handles MCP interface
    • Core handles downloading
    • Utils handles common operations
  2. Clean Structure

    • Maintains original site structure
    • Organizes assets by type
    • Creates clear index for RAG systems
  3. Robust Operation

    • Proper error handling
    • Reasonable depth limits
    • Asset download verification
    • Clean URL/path processing

RAG Index

The rag_index.json file contains:

{
  "url": "https://docs.example.com",
  "domain": "docs.example.com", 
  "pages": 42,
  "path": "/path/to/site"
}

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Submit a pull request

License

MIT License - See LICENSE file

Error Handling

The server handles common issues:

  • Invalid URLs
  • Network errors
  • Asset download failures
  • Malformed HTML
  • Deep recursion
  • File system errors

Error responses follow the format:

{
  "status": "error",
  "error": "Detailed error message"
}

Success responses:

{
  "status": "success",
  "path": "/path/to/downloaded/site",
  "pages": 42
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mseep_mcp_windows_website_downloader-0.1.1.tar.gz (11.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file mseep_mcp_windows_website_downloader-0.1.1.tar.gz.

File metadata

File hashes

Hashes for mseep_mcp_windows_website_downloader-0.1.1.tar.gz
Algorithm Hash digest
SHA256 3f91024810f26ea567cd30cc1ce81933b4b4354c9bb4ffd12d4bd291ef6cfe0d
MD5 27c8ff9ec0a76bc4766aba092386cc7b
BLAKE2b-256 06f31bc387ef0ad1d22ebb321c98bf44f16bd5d5eb0cd32eb82d9afb4975fada

See more details on using hashes here.

File details

Details for the file mseep_mcp_windows_website_downloader-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for mseep_mcp_windows_website_downloader-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5de55ed8a99940a37146845fb77c73a25a7c970c522bf5f22e23d6b90080d26b
MD5 98d0c1801145ec22e58ccc530167d4d7
BLAKE2b-256 77689cb227be19957a4bf424b03cf4272b6efbe61f47e9300672a3f3df6b33e1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page