MCP server for web search, PDF parsing, and content extraction

These details have not been verified by PyPI

Project description

log

MCP Search Server

mcp-name: io.github.KazKozDev/search

MCP (Model Context Protocol) server for web search, content extraction, and PDF parsing.

All tools work out of the box using free public APIs. No API keys required. No registration needed.

Context-Aware AI: Built-in tools for real-time datetime and geolocation detection give LLMs the ability to understand "here and now" - enabling timezone-aware responses, location-based content, and time-sensitive information without manual configuration.

Features

DateTime Tool: Get current date and time with timezone awareness
Geolocation: IP-based location detection with timezone, coordinates, and ISP info
Web Search: Smart multi-engine search with automatic fallback
- DuckDuckGo (primary): Fast, reliable, works out of the box
- Brave Search (fallback): Browser-based with anti-bot bypass
- Startpage (fallback): Privacy-focused Google proxy
- Qwant (fallback): European search engine
Wikipedia Search: Search and retrieve Wikipedia articles
Web Content Extraction: Extract clean text from web pages using multiple parsing methods
PDF Parsing: Extract text from PDF files
Multi-Source Search: Parallel search across multiple sources
Academic Search: Search arXiv, PubMed for scientific papers
GitHub Search: Find repositories and README files
Reddit Search: Search posts and comments
News Search: GDELT global news database
🆕 Credibility Assessment: Bayesian source credibility scoring with 30+ signals, domain age (WHOIS), citation network (PageRank), and uncertainty quantification - no API keys required
🆕 Text Summarization: Multi-strategy summarization (TF-IDF extractive, keyword-based, heuristic) - fast, accurate, no API keys required

Installation

Prerequisites

Python 3.10 or higher
pip

Install from PyPI (recommended)

pip install mcp-search-server

Install from source

git clone https://github.com/KazKozDev/mcp-search-server.git
cd mcp-search-server
pip install -e .

Optional: Browser-based search engines

To enable Brave Search and Startpage with anti-bot bypass (using Playwright):

# Install optional browser dependencies
pip install -e ".[browser]"

# Install Firefox browser (recommended - more stable on macOS)
playwright install firefox

# Alternative: Install Chromium browser
playwright install chromium

Note: DuckDuckGo works perfectly without Playwright. Browser support is only needed for Brave and Startpage fallback engines.

Usage

Running the server

The server can be run directly:

python -m mcp_search_server.server

Or using the installed script:

mcp-search-server

Configuration for Claude Desktop

Add this to your Claude Desktop configuration file:

MacOS: ~/Library/Application Support/Claude/claude_desktop_config.json Windows: %APPDATA%\Claude\claude_desktop_config.json

{
  "mcpServers": {
    "search": {
      "command": "python",
      "args": [
        "-m",
        "mcp_search_server.server"
      ]
    }
  }
}

Or if you installed it as a package:

{
  "mcpServers": {
    "search": {
      "command": "mcp-search-server"
    }
  }
}

Configuration for other MCP clients

The server uses stdio transport, so it can be integrated with any MCP client that supports stdio.

Available Tools

1. search_web

Search the web with smart multi-engine fallback (DuckDuckGo → Qwant → Brave → Startpage).

Parameters:

query (string, required): The search query
limit (integer, optional): Maximum number of results (default: 10)
mode (string, optional): Search mode - 'web' (default) or 'news'
timelimit (string, optional): Filter by time - 'd' (past day), 'w' (past week), 'm' (past month), 'y' (past year), null (all time, default)
engine (string, optional): Specific search engine - 'duckduckgo', 'brave', 'startpage', 'qwant' (default: auto-fallback)
use_fallback (boolean, optional): Enable automatic fallback to other engines (default: true)
no_cache (boolean, optional): Disable cache (default: false)

Examples:

Auto-fallback search (recommended):

{
  "query": "Python async programming",
  "limit": 5,
  "use_fallback": true
}

Search using specific engine:

{
  "query": "machine learning",
  "limit": 10,
  "engine": "brave",
  "use_fallback": false
}

Search for recent news (past day):

{
  "query": "latest AI developments",
  "limit": 10,
  "mode": "news",
  "timelimit": "d"
}

2. search_wikipedia

Search Wikipedia for articles.

Parameters:

query (string, required): The search query
limit (integer, optional): Maximum number of results (default: 5)

Example:

{
  "query": "Machine Learning",
  "limit": 3
}

3. get_wikipedia_summary

Get a summary of a specific Wikipedia article.

Parameters:

title (string, required): The Wikipedia article title

Example:

{
  "title": "Artificial Intelligence"
}

4. extract_webpage_content

Extract clean text content from a web page.

Parameters:

url (string, required): The URL to extract content from

Example:

{
  "url": "https://example.com/article"
}

Features:

Multiple parsing methods (Readability, Newspaper3k, BeautifulSoup)
Automatic fallback if one method fails
Cleans boilerplate content (ads, navigation, etc.)

5. parse_pdf

Extract text from PDF files.

Parameters:

url (string, required): The URL of the PDF file
max_chars (integer, optional): Maximum characters to extract (default: 50000)

Example:

{
  "url": "https://example.com/document.pdf",
  "max_chars": 100000
}

Features:

Supports PyPDF2 and pdfplumber
Automatic library selection

6. search_multi

Search multiple sources in parallel (web + Wikipedia).

Parameters:

query (string, required): The search query
web_limit (integer, optional): Max web results (default: 5)
wiki_limit (integer, optional): Max Wikipedia results (default: 3)

Example:

{
  "query": "Python programming",
  "web_limit": 5,
  "wiki_limit": 3
}

Features:

Runs searches in parallel for faster results
Combines results from multiple sources
Returns structured output with clear source attribution

7. get_current_datetime

Get current date and time with timezone information. Essential for time-aware AI responses.

Parameters:

timezone (string, optional): Timezone name (default: "UTC")
include_details (boolean, optional): Include additional details (default: true)

Example:

{
  "timezone": "Europe/Moscow",
  "include_details": true
}

Returns:

ISO datetime string
Date and time components
Day of week, week number
Multiple formatted representations
Unix timestamp

Features:

Supports 596+ timezones worldwide
Automatic timezone conversion
Detailed formatting options
Graceful error handling for invalid timezones

8. list_timezones

List available timezones by region.

Parameters:

region (string, optional): Region filter - "all", "Europe", "America", "Asia", "Africa", "Australia" (default: "all")

Example:

{
  "region": "Europe"
}

Features:

Lists all available timezone names
Filter by continent/region
Useful for discovering correct timezone names

9. get_location_by_ip

Get geolocation information based on IP address. Returns country, city, timezone, coordinates, ISP, and more.

Parameters:

ip_address (string, optional): IP address to lookup (e.g., "8.8.8.8"). If not provided, detects the server's public IP location.

Example:

{
  "ip_address": "8.8.8.8"
}

Returns:

IP address
Country, region, city, ZIP code
Timezone (can be used with get_current_datetime!)
Latitude and longitude coordinates
ISP and organization information
AS number

Features:

Free API, no API key required
Automatic timezone detection for location-aware responses
Works with both IPv4 and IPv6
Graceful error handling for invalid/private IPs
Perfect companion to datetime tool for automatic timezone detection

Use Cases:

Auto-detect user's timezone for time-aware responses
Location-based content customization
Network diagnostics and IP analysis
Geographic data for analytics

10. assess_source_credibility 🆕

Assess the credibility of web sources using advanced Bayesian analysis with 30+ signals.

Parameters:

url (string, required): URL to assess
title (string, optional): Document title
content (string, optional): Full text content (improves accuracy)
metadata (object, optional): Structured metadata (year, authors, citations, doi, is_peer_reviewed)

Example:

{
  "url": "https://arxiv.org/abs/2301.00234",
  "title": "Deep Learning for Medical Imaging",
  "metadata": {
    "year": 2023,
    "is_peer_reviewed": true,
    "citations": 42
  }
}

Returns:

Credibility score (0-1)
Confidence interval (e.g., 0.75 ± 0.08)
Category (academic, news, code, forum, blog, government)
PageRank score from citation network
30+ individual signal scores
Recommendation (✓✓ Excellent / ✓ Good / ⚠ Caution / ✗ Limited)

Features:

Real Domain Age: WHOIS-based domain registration date checking
Citation Network: PageRank algorithm for link analysis
Bayesian Inference: Prior probabilities + likelihood + posterior
30+ Signals: Domain reputation, content quality, metadata analysis
Uncertainty Quantification: Confidence intervals based on evidence
No API Keys Required: All analysis runs locally

Optional Enhancement: Install WHOIS support for real domain age checking:

pip install mcp-search-server[credibility]

Documentation: See docs/CREDIBILITY_ASSESSMENT.md for detailed usage, examples, and technical details.

11. summarize_text 🆕

Summarize long text using multiple strategies (TF-IDF, keyword-based, or heuristic).

Parameters:

text (string, required): Text to summarize
strategy (string, optional): "auto" (default), "extractive_tfidf", "extractive_keyword", "heuristic"
compression_ratio (number, optional): Target compression 0.1-0.9 (default: 0.3 = 30%)

Example:

{
  "text": "Long article text here...",
  "strategy": "extractive_tfidf",
  "compression_ratio": 0.3
}

Returns:

Summary text
Method used (extractive-tfidf, extractive-keyword, heuristic-3sent)
Statistics (original/summary length, compression ratio, sentences)

Strategies:

extractive_tfidf (best): Uses TF-IDF scoring to select important sentences. Requires NLTK.
extractive_keyword: Prioritizes sentences with entities and key terms. Requires NLTK.
heuristic: Ultra-fast fallback (first + middle + last sentences). No dependencies.
auto: Automatically picks best available strategy.

Features:

Fast: ~50ms for typical article (with NLTK), ~5ms (heuristic)
No API Keys: All processing local
Smart Selection: Maintains original sentence order
Graceful Degradation: Falls back if NLTK unavailable

Optional Enhancement: Install NLTK for better quality:

pip install mcp-search-server[summarizer]

Use Cases:

Summarize web articles before credibility assessment
Condense research papers for quick review
Extract key points from long documents
Generate previews for search results

Development

Install development dependencies

pip install -e ".[dev]"

Running tests

pytest

Code formatting

black src/

Linting

ruff check src/

Architecture

Tools

DuckDuckGo Search (tools/duckduckgo.py)
- Async web scraping from DuckDuckGo HTML and Lite versions
- Result caching (24 hours)
- Retry logic with backoff
Wikipedia (tools/wikipedia.py)
- Wikipedia API integration
- Article search and summary retrieval
- HTML cleaning
Link Parser (tools/link_parser.py)
- Multiple parsing methods (Readability, Newspaper3k, BeautifulSoup)
- Early exit optimization
- Content cleaning
PDF Parser (tools/pdf_parser.py)
- PyPDF2 and pdfplumber support
- Automatic library selection
- Page-by-page extraction with limits

Caching

The server uses local caching for search results:

Location: ~/.mcp-search-cache/
TTL: 24 hours
Format: JSON

Troubleshooting

PDF parsing not working

Install one of the PDF libraries:

pip install PyPDF2
# or
pip install pdfplumber

Web content extraction fails

The server tries multiple methods automatically:

Readability (best for articles)
Newspaper3k (good for news sites)
BeautifulSoup (fallback for all sites)

If all methods fail, check:

The URL is accessible
The site doesn't block automated access
Your internet connection

Wikipedia search returns no results

Check your internet connection
Try a different search term
The Wikipedia API might be temporarily unavailable

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.1

Jan 3, 2026

0.2.0

Jan 3, 2026

0.1.9

Dec 26, 2025

0.1.8

Dec 25, 2025

0.1.7

Dec 25, 2025

0.1.6

Dec 25, 2025

This version

0.1.5

Dec 25, 2025

0.1.4

Dec 24, 2025

0.1.3

Dec 24, 2025

0.1.2

Dec 16, 2025

0.1.1

Dec 16, 2025

0.1.0

Dec 16, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mcp_search_server-0.1.5.tar.gz (73.3 kB view details)

Uploaded Dec 25, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mcp_search_server-0.1.5-py3-none-any.whl (81.5 kB view details)

Uploaded Dec 25, 2025 Python 3

File details

Details for the file mcp_search_server-0.1.5.tar.gz.

File metadata

Download URL: mcp_search_server-0.1.5.tar.gz
Upload date: Dec 25, 2025
Size: 73.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for mcp_search_server-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`e13bc47cb4e335e1a4ef8e7fca64d5ad8232a47924db2804e2dd06a514befb67`
MD5	`0770c185530a646c0a0038f4c45e8fde`
BLAKE2b-256	`f8b8e5244c4af7706f41eec7eef24e535e5aac95c01cdd5dffe87d4a405cef1a`

See more details on using hashes here.

File details

Details for the file mcp_search_server-0.1.5-py3-none-any.whl.

File metadata

Download URL: mcp_search_server-0.1.5-py3-none-any.whl
Upload date: Dec 25, 2025
Size: 81.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for mcp_search_server-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2c620eaca2bbb377a7a870e9e7a0526c66208a5ad96b7fcca91a8e5bdb9e940c`
MD5	`ba550057ee4c00d19e25b9062cd79e2e`
BLAKE2b-256	`d13648697b5712939ea7b5b3df6e6207e3c5a924940f8cbd8a790b801bd8dd78`

See more details on using hashes here.

mcp-search-server 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

MCP Search Server

Features

Installation

Prerequisites

Install from PyPI (recommended)

Install from source

Optional: Browser-based search engines

Usage

Running the server

Configuration for Claude Desktop

Configuration for other MCP clients

Available Tools

1. search_web

2. search_wikipedia

3. get_wikipedia_summary

4. extract_webpage_content

5. parse_pdf

6. search_multi

7. get_current_datetime

8. list_timezones

9. get_location_by_ip

10. assess_source_credibility 🆕

11. summarize_text 🆕

Development

Install development dependencies

Running tests

Code formatting

Linting

Architecture

Tools

Caching

Troubleshooting

PDF parsing not working

Web content extraction fails

Wikipedia search returns no results

License

Contributing

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes