AI-powered legacy web application analysis and documentation generation via MCP
Project description
Legacy Web MCP Server
Legacy Web MCP Server implements the Model Context Protocol (MCP) to power automated discovery and analysis of legacy web applications. This repository provides the backend foundation for crawling websites, analyzing site structure, and generating comprehensive documentation artifacts that help teams understand and plan modernization efforts.
Getting Started
Prerequisites
- Python 3.11+
- uv for dependency management
- Optional: Playwright browsers for enhanced crawling (
uv run playwright install)
Installation
Option 1: Development Installation (uv)
uv sync
The command creates a virtual environment under .venv/ and installs runtime and development dependencies.
Option 2: Direct Execution (uvx)
Run the MCP server directly without installation using uvx:
# Install uv/uvx if not already installed
curl -LsSf https://astral.sh/uv/install.sh | sh
# Run directly from GitHub repository
uvx --from git+https://github.com/your-username/web-discovery-mcp-claude-1.git legacy-web-mcp
# Or run locally from project directory
uvx --from . legacy-web-mcp
Option 3: PyPI Installation (Recommended for End Users)
# Install and run from PyPI (when published)
uvx legacy-web-mcp
# Or install to use in scripts
pip install legacy-web-mcp
Running the Server
Development Mode (with uv)
uv run legacy-web-mcp
Direct Execution (with uvx)
# From GitHub
uvx --from git+https://github.com/your-username/web-discovery-mcp-claude-1.git legacy-web-mcp
# From local directory
uvx --from . legacy-web-mcp
# Using FastMCP CLI (alternative)
fastmcp run fastmcp.json
The entry point starts a FastMCP stdio server that provides comprehensive website discovery and analysis tools.
Development Tooling
- Linting & Formatting:
uv run ruff checkanduv run ruff format - Static Typing:
uv run mypy - Testing:
uv run pytest
CI runs these same commands on every push via GitHub Actions.
MCP Client Configuration
To integrate this server with MCP-compatible clients (like Claude Desktop), add the following configuration:
Claude Desktop Configuration
Add to your Claude Desktop configuration file (~/.config/claude-desktop/config.json):
{
"mcpServers": {
"legacy-web-mcp": {
"command": "uvx",
"args": [
"--from",
"git+https://github.com/your-username/web-discovery-mcp-claude-1.git",
"legacy-web-mcp"
]
}
}
}
Alternative Configurations
From Local Directory
{
"mcpServers": {
"legacy-web-mcp": {
"command": "uvx",
"args": ["--from", ".", "legacy-web-mcp"],
"cwd": "/path/to/web-discovery-mcp-claude-1"
}
}
}
Using Development Environment
{
"mcpServers": {
"legacy-web-mcp": {
"command": "uv",
"args": ["run", "legacy-web-mcp"],
"cwd": "/path/to/web-discovery-mcp-claude-1"
}
}
}
From PyPI (when published)
{
"mcpServers": {
"legacy-web-mcp": {
"command": "uvx",
"args": ["legacy-web-mcp"]
}
}
}
Available MCP Tools
The server provides the following tools:
ping- Server health and status informationhealth_check- Comprehensive system health reportvalidate_dependencies- Check Playwright browser installationstest_llm_connectivity- Verify LLM provider connectionsshow_config- Display current configuration (redacted)discover_website- Discover and analyze website structure
Testing and Development
Manual Testing Scripts
The scripts/ directory contains comprehensive testing tools with full MCP support:
# Interactive testing of all tools
python scripts/test_mcp_client.py
# Test ALL tools directly via MCP client (including discover_website!)
python scripts/test_mcp_client.py ping
python scripts/test_mcp_client.py health_check
python scripts/test_mcp_client.py show_config
python scripts/test_mcp_client.py validate_dependencies
python scripts/test_mcp_client.py test_llm_connectivity
python scripts/test_mcp_client.py discover_website https://context7.com
# Alternative direct testing (bypasses MCP layer)
python scripts/test_discovery_direct.py https://example.com
python scripts/test_discovery_direct.py https://github.com
# Comprehensive test suite
python scripts/manual_test.py all
python scripts/manual_test.py health
python scripts/manual_test.py discover https://context7.com
✨ New Feature: The MCP client script now supports all tools including discover_website with a mock MCP session context that provides full logging and progress reporting!
Quick Demo
# Test website discovery via MCP client
uv run python scripts/test_mcp_client.py discover_website https://context7.com
# Output includes real-time MCP logging:
# [INFO] Validated target URL: https://context7.com
# [INFO] Initialized project context7-com_20250919-052810
# [INFO] Analyzed robots.txt directives
# [INFO] Manual crawl discovered 4 URLs
# ✅ Success! Full JSON result with discovered URLs
See scripts/README.md for detailed usage instructions.
Repository Layout
src/legacy_web_mcp/ # Application source code
├── mcp/ # FastMCP bootstrap and MCP tools
├── discovery/ # Website discovery and crawling engine
├── storage/ # Project and data persistence
├── config/ # Configuration management
└── shared/ # Cross-cutting utilities
docs/ # Documentation and specifications
├── architecture.md # System architecture overview
├── mcp-context.md # MCP Context system documentation
├── stories/ # Epic and story documentation
└── web_discovery/ # Discovery output examples
scripts/ # Manual testing and development tools
tests/ # pytest test suites
Documentation
Comprehensive documentation is available in the docs/ directory:
- Architecture Overview - System design and component interaction
- MCP Context Guide - Understanding the MCP Context system, testing approaches, and best practices
- Distribution Guide - Complete guide for packaging and distributing with uvx
- Story Documentation - Epic and user story specifications
- Discovery Examples - Sample website discovery outputs
Configuration
Environment Setup
-
Optional: Copy
.env.templateto.envand configure your environment:# LLM API Keys (for future AI-powered analysis features) OPENAI_API_KEY=your_openai_key_here ANTHROPIC_API_KEY=your_anthropic_key_here GEMINI_API_KEY=your_gemini_key_here # Discovery settings DISCOVERY_TIMEOUT=60 DISCOVERY_MAX_DEPTH=3 OUTPUT_ROOT=docs/web_discovery
-
Install Playwright browsers (optional, for enhanced crawling):
uv run playwright install
Configuration Management
-
Check current configuration: Use the
show_configtool to inspect active settingspython scripts/test_mcp_client.py show_config
-
Health monitoring: Get comprehensive system status
python scripts/test_mcp_client.py health_check
-
Validate dependencies: Check Playwright browser installations
python scripts/test_mcp_client.py validate_dependencies
Default settings and configuration documentation are in docs/stories/1.3.basic-configuration-management.md.
Website Discovery Features
The server provides comprehensive website discovery capabilities:
Discovery Methods
- Sitemap parsing - Automatically finds and parses XML sitemaps
- Robots.txt analysis - Extracts allowed/disallowed paths and additional sitemaps
- Intelligent crawling - Discovers internal pages, external links, and static assets
Output Formats
- JSON inventory - Machine-readable site structure data
- YAML inventory - Human-readable site structure overview
- Project metadata - Discovery configuration and statistics
Quick Start Examples
# Discover a website structure
python scripts/test_discovery_direct.py https://example.com
# Discover with comprehensive output
python scripts/manual_test.py discover https://context7.com
Discovery results are stored in docs/web_discovery/ with timestamped project folders containing:
discovery/inventory.json- Complete site structurediscovery/inventory.yaml- Human-readable overviewmetadata.json- Project configuration and stats
Continuous Integration
GitHub Actions workflow in .github/workflows/ci.yml runs linting, typing, and tests against Python 3.11 using uv. The workflow keeps dependencies consistent with the local development setup.
License
Distributed under the MIT License. See LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file legacy_web_mcp-0.1.1.tar.gz.
File metadata
- Download URL: legacy_web_mcp-0.1.1.tar.gz
- Upload date:
- Size: 167.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
117532a54b80dbbe8e70d3f0a88fa49d72fa21a408a2715e4358eb2d3010e8f9
|
|
| MD5 |
fb25d3808babc5cbd73435bb7899a0ae
|
|
| BLAKE2b-256 |
ea152097d94e1ddebab8a749e83a5f685be0cec813c88de8f4cd723d3e89dc11
|
File details
Details for the file legacy_web_mcp-0.1.1-py3-none-any.whl.
File metadata
- Download URL: legacy_web_mcp-0.1.1-py3-none-any.whl
- Upload date:
- Size: 197.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
09fc2131010ca6d0818da49744b6dc62ef7b2cbc99e55442fb1224bc0ff647a8
|
|
| MD5 |
dc71227ed0a43b15efbbff549234c49f
|
|
| BLAKE2b-256 |
fd93569747757b885d50594133647dd9c1f1a188a0034c57590714ff28f19665
|