Skip to main content

MCP server for web search, PDF parsing, and content extraction

Project description

MCP Search Server

MCP (Model Context Protocol) server for web search, content extraction, and PDF parsing.

🔓 No API keys required! No registration needed! All tools work out of the box using free public APIs.

Features

  • Web Search: Search the web using DuckDuckGo
  • Wikipedia Search: Search and retrieve Wikipedia articles
  • Web Content Extraction: Extract clean text from web pages using multiple parsing methods
  • PDF Parsing: Extract text from PDF files
  • DateTime Tool: Get current date and time with timezone awareness
  • Geolocation: IP-based location detection with timezone, coordinates, and ISP info
  • Multi-Source Search: Parallel search across multiple sources

Installation

Prerequisites

  • Python 3.10 or higher
  • pip

Install from PyPI (recommended)

pip install mcp-search-server

Install from source

git clone https://github.com/KazKozDev/mcp-search-server.git
cd mcp-search-server
pip install -e .

Usage

Running the server

The server can be run directly:

python -m mcp_search_server.server

Or using the installed script:

mcp-search-server

Configuration for Claude Desktop

Add this to your Claude Desktop configuration file:

MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "search": {
      "command": "python",
      "args": [
        "-m",
        "mcp_search_server.server"
      ]
    }
  }
}

Or if you installed it as a package:

{
  "mcpServers": {
    "search": {
      "command": "mcp-search-server"
    }
  }
}

Configuration for other MCP clients

The server uses stdio transport, so it can be integrated with any MCP client that supports stdio.

Available Tools

1. search_web

Search the web using DuckDuckGo with optional time filtering.

Parameters:

  • query (string, required): The search query
  • limit (integer, optional): Maximum number of results (default: 10)
  • timelimit (string, optional): Filter by time - 'd' (past day), 'w' (past week), 'm' (past month), 'y' (past year), null (all time, default)

Examples:

{
  "query": "Python async programming",
  "limit": 5
}

Search for recent news (past day):

{
  "query": "latest AI developments",
  "limit": 10,
  "timelimit": "d"
}

2. search_wikipedia

Search Wikipedia for articles.

Parameters:

  • query (string, required): The search query
  • limit (integer, optional): Maximum number of results (default: 5)

Example:

{
  "query": "Machine Learning",
  "limit": 3
}

3. get_wikipedia_summary

Get a summary of a specific Wikipedia article.

Parameters:

  • title (string, required): The Wikipedia article title

Example:

{
  "title": "Artificial Intelligence"
}

4. extract_webpage_content

Extract clean text content from a web page.

Parameters:

  • url (string, required): The URL to extract content from

Example:

{
  "url": "https://example.com/article"
}

Features:

  • Multiple parsing methods (Readability, Newspaper3k, BeautifulSoup)
  • Automatic fallback if one method fails
  • Cleans boilerplate content (ads, navigation, etc.)

5. parse_pdf

Extract text from PDF files.

Parameters:

  • url (string, required): The URL of the PDF file
  • max_chars (integer, optional): Maximum characters to extract (default: 50000)

Example:

{
  "url": "https://example.com/document.pdf",
  "max_chars": 100000
}

Features:

  • Supports PyPDF2 and pdfplumber
  • Automatic library selection

6. search_multi

Search multiple sources in parallel (web + Wikipedia).

Parameters:

  • query (string, required): The search query
  • web_limit (integer, optional): Max web results (default: 5)
  • wiki_limit (integer, optional): Max Wikipedia results (default: 3)

Example:

{
  "query": "Python programming",
  "web_limit": 5,
  "wiki_limit": 3
}

Features:

  • Runs searches in parallel for faster results
  • Combines results from multiple sources
  • Returns structured output with clear source attribution

7. get_current_datetime

Get current date and time with timezone information. Essential for time-aware AI responses.

Parameters:

  • timezone (string, optional): Timezone name (default: "UTC")
  • include_details (boolean, optional): Include additional details (default: true)

Example:

{
  "timezone": "Europe/Moscow",
  "include_details": true
}

Returns:

  • ISO datetime string
  • Date and time components
  • Day of week, week number
  • Multiple formatted representations
  • Unix timestamp

Features:

  • Supports 596+ timezones worldwide
  • Automatic timezone conversion
  • Detailed formatting options
  • Graceful error handling for invalid timezones

8. list_timezones

List available timezones by region.

Parameters:

  • region (string, optional): Region filter - "all", "Europe", "America", "Asia", "Africa", "Australia" (default: "all")

Example:

{
  "region": "Europe"
}

Features:

  • Lists all available timezone names
  • Filter by continent/region
  • Useful for discovering correct timezone names

9. get_location_by_ip

Get geolocation information based on IP address. Returns country, city, timezone, coordinates, ISP, and more.

Parameters:

  • ip_address (string, optional): IP address to lookup (e.g., "8.8.8.8"). If not provided, detects the server's public IP location.

Example:

{
  "ip_address": "8.8.8.8"
}

Returns:

  • IP address
  • Country, region, city, ZIP code
  • Timezone (can be used with get_current_datetime!)
  • Latitude and longitude coordinates
  • ISP and organization information
  • AS number

Features:

  • Free API, no API key required
  • Automatic timezone detection for location-aware responses
  • Works with both IPv4 and IPv6
  • Graceful error handling for invalid/private IPs
  • Perfect companion to datetime tool for automatic timezone detection

Use Cases:

  • Auto-detect user's timezone for time-aware responses
  • Location-based content customization
  • Network diagnostics and IP analysis
  • Geographic data for analytics

Development

Install development dependencies

pip install -e ".[dev]"

Running tests

pytest

Code formatting

black src/

Linting

ruff check src/

Architecture

Tools

  • DuckDuckGo Search (tools/duckduckgo.py)

    • Async web scraping from DuckDuckGo HTML and Lite versions
    • Result caching (24 hours)
    • Retry logic with backoff
  • Wikipedia (tools/wikipedia.py)

    • Wikipedia API integration
    • Article search and summary retrieval
    • HTML cleaning
  • Link Parser (tools/link_parser.py)

    • Multiple parsing methods (Readability, Newspaper3k, BeautifulSoup)
    • Early exit optimization
    • Content cleaning
  • PDF Parser (tools/pdf_parser.py)

    • PyPDF2 and pdfplumber support
    • Automatic library selection
    • Page-by-page extraction with limits

Caching

The server uses local caching for search results:

  • Location: ~/.mcp-search-cache/
  • TTL: 24 hours
  • Format: JSON

Troubleshooting

PDF parsing not working

Install one of the PDF libraries:

pip install PyPDF2
# or
pip install pdfplumber

Web content extraction fails

The server tries multiple methods automatically:

  1. Readability (best for articles)
  2. Newspaper3k (good for news sites)
  3. BeautifulSoup (fallback for all sites)

If all methods fail, check:

  • The URL is accessible
  • The site doesn't block automated access
  • Your internet connection

Wikipedia search returns no results

  • Check your internet connection
  • Try a different search term
  • The Wikipedia API might be temporarily unavailable

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_search_server-0.1.0.tar.gz (31.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mcp_search_server-0.1.0-py3-none-any.whl (35.7 kB view details)

Uploaded Python 3

File details

Details for the file mcp_search_server-0.1.0.tar.gz.

File metadata

  • Download URL: mcp_search_server-0.1.0.tar.gz
  • Upload date:
  • Size: 31.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for mcp_search_server-0.1.0.tar.gz
Algorithm Hash digest
SHA256 782a756e881b6a13b86a987794cd05d775e2e8bda003eeee90b60cf48942ed9f
MD5 cf183bd7eff80d383f5dae5e5c26a0e3
BLAKE2b-256 249ad989f8a3f9289b33b3d4632546a3627877263b0db8c044b9383fd78489d7

See more details on using hashes here.

File details

Details for the file mcp_search_server-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for mcp_search_server-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5498254e26a82ede3bf0dde309c53712af309f03c0c44dbeec27d4f0853f99db
MD5 ceed87ef2c13f5ad808e4d121fa3db92
BLAKE2b-256 4c7f6b4141219ca98a1341561e7f464788e1d8e84351fdc28dbec8b699471fe7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page