Skip to main content

AI-powered legacy web application analysis and documentation generation via MCP

Project description

Legacy Web MCP Server

Legacy Web MCP Server implements the Model Context Protocol (MCP) to power automated discovery and analysis of legacy web applications. This repository provides the backend foundation for crawling websites, analyzing site structure, and generating comprehensive documentation artifacts that help teams understand and plan modernization efforts.

Getting Started

Prerequisites

  • Python 3.11+
  • uv for dependency management
  • Optional: Playwright browsers for enhanced crawling (uv run playwright install)

Installation

Option 1: Development Installation (uv)

uv sync

The command creates a virtual environment under .venv/ and installs runtime and development dependencies.

Option 2: Direct Execution (uvx)

Run the MCP server directly without installation using uvx:

# Install uv/uvx if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh

# Run directly from GitHub repository
uvx --from git+https://github.com/your-username/web-discovery-mcp-claude-1.git legacy-web-mcp

# Or run locally from project directory
uvx --from . legacy-web-mcp

Option 3: PyPI Installation (Recommended for End Users)

# Install and run from PyPI (when published)
uvx legacy-web-mcp

# Or install to use in scripts
pip install legacy-web-mcp

Running the Server

Development Mode (with uv)

uv run legacy-web-mcp

Direct Execution (with uvx)

# From GitHub
uvx --from git+https://github.com/your-username/web-discovery-mcp-claude-1.git legacy-web-mcp

# From local directory
uvx --from . legacy-web-mcp

# Using FastMCP CLI (alternative)
fastmcp run fastmcp.json

The entry point starts a FastMCP stdio server that provides comprehensive website discovery and analysis tools.

Development Tooling

  • Linting & Formatting: uv run ruff check and uv run ruff format
  • Static Typing: uv run mypy
  • Testing: uv run pytest

CI runs these same commands on every push via GitHub Actions.

MCP Client Configuration

To integrate this server with MCP-compatible clients (like Claude Desktop), add the following configuration:

Claude Desktop Configuration

Add to your Claude Desktop configuration file (~/.config/claude-desktop/config.json):

{
  "mcpServers": {
    "legacy-web-mcp": {
      "command": "uvx",
      "args": [
        "--from",
        "git+https://github.com/your-username/web-discovery-mcp-claude-1.git",
        "legacy-web-mcp"
      ]
    }
  }
}

Alternative Configurations

From Local Directory

{
  "mcpServers": {
    "legacy-web-mcp": {
      "command": "uvx",
      "args": ["--from", ".", "legacy-web-mcp"],
      "cwd": "/path/to/web-discovery-mcp-claude-1"
    }
  }
}

Using Development Environment

{
  "mcpServers": {
    "legacy-web-mcp": {
      "command": "uv",
      "args": ["run", "legacy-web-mcp"],
      "cwd": "/path/to/web-discovery-mcp-claude-1"
    }
  }
}

From PyPI (when published)

{
  "mcpServers": {
    "legacy-web-mcp": {
      "command": "uvx",
      "args": ["legacy-web-mcp"]
    }
  }
}

Available MCP Tools

The server provides the following tools:

  • ping - Server health and status information
  • health_check - Comprehensive system health report
  • validate_dependencies - Check Playwright browser installations
  • test_llm_connectivity - Verify LLM provider connections
  • show_config - Display current configuration (redacted)
  • discover_website - Discover and analyze website structure

Testing and Development

Manual Testing Scripts

The scripts/ directory contains comprehensive testing tools with full MCP support:

# Interactive testing of all tools
python scripts/test_mcp_client.py

# Test ALL tools directly via MCP client (including discover_website!)
python scripts/test_mcp_client.py ping
python scripts/test_mcp_client.py health_check
python scripts/test_mcp_client.py show_config
python scripts/test_mcp_client.py validate_dependencies
python scripts/test_mcp_client.py test_llm_connectivity
python scripts/test_mcp_client.py discover_website https://context7.com

# Alternative direct testing (bypasses MCP layer)
python scripts/test_discovery_direct.py https://example.com
python scripts/test_discovery_direct.py https://github.com

# Comprehensive test suite
python scripts/manual_test.py all
python scripts/manual_test.py health
python scripts/manual_test.py discover https://context7.com

✨ New Feature: The MCP client script now supports all tools including discover_website with a mock MCP session context that provides full logging and progress reporting!

Quick Demo

# Test website discovery via MCP client
uv run python scripts/test_mcp_client.py discover_website https://context7.com

# Output includes real-time MCP logging:
# [INFO] Validated target URL: https://context7.com
# [INFO] Initialized project context7-com_20250919-052810
# [INFO] Analyzed robots.txt directives
# [INFO] Manual crawl discovered 4 URLs
# ✅ Success! Full JSON result with discovered URLs

See scripts/README.md for detailed usage instructions.

Repository Layout

src/legacy_web_mcp/        # Application source code
├── mcp/                   # FastMCP bootstrap and MCP tools
├── discovery/             # Website discovery and crawling engine
├── storage/              # Project and data persistence
├── config/               # Configuration management
└── shared/               # Cross-cutting utilities

docs/                     # Documentation and specifications
├── architecture.md       # System architecture overview
├── mcp-context.md        # MCP Context system documentation
├── stories/              # Epic and story documentation
└── web_discovery/        # Discovery output examples

scripts/                  # Manual testing and development tools
tests/                    # pytest test suites

Documentation

Comprehensive documentation is available in the docs/ directory:

Configuration

Environment Setup

  1. Optional: Copy .env.template to .env and configure your environment:

    # LLM API Keys (for future AI-powered analysis features)
    OPENAI_API_KEY=your_openai_key_here
    ANTHROPIC_API_KEY=your_anthropic_key_here
    GEMINI_API_KEY=your_gemini_key_here
    
    # Discovery settings
    DISCOVERY_TIMEOUT=60
    DISCOVERY_MAX_DEPTH=3
    OUTPUT_ROOT=docs/web_discovery
    
  2. Install Playwright browsers (optional, for enhanced crawling):

    uv run playwright install
    

Configuration Management

  • Check current configuration: Use the show_config tool to inspect active settings

    python scripts/test_mcp_client.py show_config
    
  • Health monitoring: Get comprehensive system status

    python scripts/test_mcp_client.py health_check
    
  • Validate dependencies: Check Playwright browser installations

    python scripts/test_mcp_client.py validate_dependencies
    

Default settings and configuration documentation are in docs/stories/1.3.basic-configuration-management.md.

Website Discovery Features

The server provides comprehensive website discovery capabilities:

Discovery Methods

  • Sitemap parsing - Automatically finds and parses XML sitemaps
  • Robots.txt analysis - Extracts allowed/disallowed paths and additional sitemaps
  • Intelligent crawling - Discovers internal pages, external links, and static assets

Output Formats

  • JSON inventory - Machine-readable site structure data
  • YAML inventory - Human-readable site structure overview
  • Project metadata - Discovery configuration and statistics

Quick Start Examples

# Discover a website structure
python scripts/test_discovery_direct.py https://example.com

# Discover with comprehensive output
python scripts/manual_test.py discover https://context7.com

Discovery results are stored in docs/web_discovery/ with timestamped project folders containing:

  • discovery/inventory.json - Complete site structure
  • discovery/inventory.yaml - Human-readable overview
  • metadata.json - Project configuration and stats

Continuous Integration

GitHub Actions workflow in .github/workflows/ci.yml runs linting, typing, and tests against Python 3.11 using uv. The workflow keeps dependencies consistent with the local development setup.

License

Distributed under the MIT License. See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

legacy_web_mcp-0.1.1.tar.gz (167.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

legacy_web_mcp-0.1.1-py3-none-any.whl (197.5 kB view details)

Uploaded Python 3

File details

Details for the file legacy_web_mcp-0.1.1.tar.gz.

File metadata

  • Download URL: legacy_web_mcp-0.1.1.tar.gz
  • Upload date:
  • Size: 167.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for legacy_web_mcp-0.1.1.tar.gz
Algorithm Hash digest
SHA256 117532a54b80dbbe8e70d3f0a88fa49d72fa21a408a2715e4358eb2d3010e8f9
MD5 fb25d3808babc5cbd73435bb7899a0ae
BLAKE2b-256 ea152097d94e1ddebab8a749e83a5f685be0cec813c88de8f4cd723d3e89dc11

See more details on using hashes here.

File details

Details for the file legacy_web_mcp-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: legacy_web_mcp-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 197.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for legacy_web_mcp-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 09fc2131010ca6d0818da49744b6dc62ef7b2cbc99e55442fb1224bc0ff647a8
MD5 dc71227ed0a43b15efbbff549234c49f
BLAKE2b-256 fd93569747757b885d50594133647dd9c1f1a188a0034c57590714ff28f19665

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page