A cozy `nest` where all needed context is pre‑assembled for the LLM.

These details have not been verified by PyPI

Project links

Project description

ContextNest 🦅

A cozy nest where all needed context is pre-assembled for the LLM. ContextNest provides a Model Context Protocol (MCP) server that enables AI assistants to access web scraping, knowledge base search, and document management capabilities.

Project Overview
Architecture
Installation
Configuration
Usage
API Documentation
Examples
Detailed Module Documentation
Contributing
License

Project Overview

ContextNest is a Python-based MCP (Model Context Protocol) server that provides AI assistants with powerful context management capabilities. The system combines:

Web Scraping: Automated content extraction from URLs using Playwright and BeautifulSoup
Knowledge Base: Vector and full-text search capabilities with hybrid ranking
Document Management: Storage and retrieval of documents in a DuckDB database
MCP Integration: Seamless integration with AI assistants through the Model Context Protocol

Key Features

Web Scraping: Extract content from web pages and convert to Markdown format
Hybrid Search: Combine vector similarity search with full-text search (BM25) using Reciprocal Rank Fusion (RRF)
CAPTCHA Handling: Automatic detection and handling of common CAPTCHA challenges
Stealth Browsing: Anti-bot detection evasion techniques
Resource Management: MCP resources for accessing stored files and metadata
Logging: Comprehensive logging with structured messages using loguru

Architecture

The ContextNest architecture is modular and consists of several key components:

contextnest/
├── mcp_server.py     # Main MCP server implementation
├── mcp_models.py     # Pydantic models for tool inputs
├── mcp_logger.py     # Structured logging implementation
├── server_tools.py   # Core tool logic implementations
├── web_scraper/      # Web scraping functionality
│   ├── scraper.py    # Main scraper class
│   ├── captcha_handler.py  # CAPTCHA detection and handling
│   └── markdown_converter.py  # HTML to Markdown conversion
└── micro_search/     # Search functionality
    ├── database.py   # DuckDB database management
    ├── insertion.py  # Document insertion logic
    ├── hybrid_search.py  # Vector + BM25 search with RRF
    └── db_preparation.py  # Database setup and preparation

Core Components

MCP Server (`mcp_server.py`)

The main server implementation using FastMCP that exposes tools and resources to AI assistants. It defines the MCP endpoints and handles the communication protocol.

Web Scraper (`web_scraper/`)

A comprehensive web scraping module that includes:

Playwright-based browser automation
CAPTCHA detection and handling
Stealth techniques to avoid bot detection
HTML to Markdown conversion
Human-like behavior simulation

Micro Search (`micro_search/`)

A powerful search module that provides:

Vector similarity search using embeddings
Full-text search with BM25 ranking
Hybrid search combining both approaches with Reciprocal Rank Fusion
DuckDB-based storage for documents and embeddings

MCP Logger (`mcp_logger.py`)

Specialized logging for MCP operations with structured, configurable output using loguru.

Installation

Prerequisites

Python 3.13 or higher
Node.js (for Playwright dependencies)
System dependencies for Playwright (Chromium browser)

Setup

Clone the repository:

git clone <repository-url>
cd contextnest

Install dependencies using uv (recommended):
```
uv sync
```
Or using pip:
```
pip install -r requirements.txt
```
Install Playwright browser dependencies:
```
playwright install chromium
```

Set up the database:

python -m contextnest.micro_search.db_preparation

Install from PyPI

You can also install ContextNest directly from PyPI:

pip install contextnest

Or use uvx to run the MCP server directly:

uvx contextnest

MCP Client Configuration

To use ContextNest with MCP-compatible clients (Claude Desktop, Cursor, etc.), add this to your MCP configuration:

{
  "mcpServers": {
    "ContextNest": {
      "command": "uvx",
      "args": ["contextnest"]
    }
  }
}

Dependencies

ContextNest requires the following key dependencies:

fastmcp: Model Context Protocol implementation
playwright: Browser automation for web scraping
duckdb: Database for document storage and search
beautifulsoup4: HTML parsing
loguru: Structured logging
ollama: Optional LLM integration for embeddings
google-genai: Google AI client for embeddings

Configuration

ContextNest uses default configurations that work out of the box, but can be customized:

Configuration Files

Configuration is handled through the MCP protocol, with default settings in the code. The database schema and search parameters are configured in the micro_search module.

Usage

Running the MCP Server

To start the ContextNest MCP server:

python -m contextnest.mcp_server

The server will start and wait for MCP client connections. It provides the following tools and resources:

Available Tools

1. Web Scraping (`web_scrape`)

Scrapes a URL and converts it to Markdown format, automatically saving the result to the default output directory.

Input Parameters:

url (required): The URL to scrape
save_path (optional): Custom path to save the markdown locally

Example Usage:

# When prompted by the MCP client
{
  "url": "https://example.com/article",
  "save_path": "/custom/path/article.md"
}

2. Search (`search`)

Performs a hybrid search (Vector + BM25) on the knowledge base.

Input Parameters:

query (required): The search query
limit (optional): Maximum number of results to return (default: 5)
k (optional): Smoothing constant for RRF (default: 60)
vector_weight (optional): Weight for vector search results (default: 1.0)
fts_weight (optional): Weight for full-text search results (default: 1.0)

Example Usage:

# When prompted by the MCP client
{
  "query": "machine learning algorithms",
  "limit": 10,
  "k": 60,
  "vector_weight": 1.0,
  "fts_weight": 1.0
}

3. Insert Knowledge (`insert_knowledge`)

Inserts a document into the knowledge base. This process chunks, embeds, and stores the content in DuckDB. This tool runs as a background task since it can take seconds to minutes to complete.

Input Parameters:

url (required): The source URL of the content
title (optional): The title of the content (extracted from content if not provided)
content (optional): The actual text content (scraped from URL if not provided)

Example Usage:

# When prompted by the MCP client
{
  "url": "https://example.com/important-document",
  "title": "Important Document Title",
  "content": "Full text content here..."
}

4. Read Metadata (`read_metadata`)

Reads the application's metadata file to see database logical links and configurations.

Input Parameters: None

Available Resources

1. Output File Resource (`contextnest://output/{filename}`)

Reads a markdown file from the ContextNest output directory.

Usage:

contextnest://output/filename.md

2. Metadata Resource (`contextnest://metadata`)

Reads the ContextNest metadata file.

Usage:

contextnest://metadata

MCP Prompt

ContextNest also provides a system prompt to guide LLMs on how to use the available tools:

You are an intelligent assistant with access to the ContextNest knowledge base. Use the available tools to answer requests:

Use 'search' for finding relevant documents using hybrid search (Vector + BM25).

Use 'web_scrape' to ingest new content from URLs if the search yields insufficient results.

Use 'insert_knowledge' to explicitly save important information.

Always cite your sources when providing answers from the knowledge base.

API Documentation

MCPLogger

Specialized logger for MCP operations with structured, configurable logging.

from contextnest.mcp_logger import MCPLogger, log_request, log_response, log_error

# Create a logger instance
logger = MCPLogger(level="INFO")

# Log MCP operations
logger.log_request("web_scrape", {"url": "https://example.com"})
logger.log_response("search", {"results": 5})
logger.log_error("insert_knowledge", Exception("Database error"))

# Use convenience functions that use the global logger
from contextnest.mcp_logger import info_mcp, debug_mcp, warning_mcp

info_mcp("Processing search query")
debug_mcp("Detailed debug information", query_time=0.123)
warning_mcp("Potential issue detected")

Web Scraper API

The web scraper provides both class-based and function-based interfaces:

from contextnest.web_scraper import WebScraper, scrape_url

# Class-based approach with full control
async with WebScraper(headless=True) as scraper:
    markdown = await scraper.scrape("https://example.com")

# Function-based approach for simple scraping
markdown = await scrape_url("https://example.com", headless=True)

Micro Search API

The search functionality includes vector search, full-text search, and hybrid search:

from contextnest.micro_search import HybridSearch, hybrid_search, insert_document

# Insert a document
insert_document(
    url="https://example.com/doc",
    title="Document Title",
    content="Document content..."
)

# Perform hybrid search using convenience function
query_embedding = [0.1, 0.2, 0.3, ...]  # 768-dimensional vector
results = hybrid_search(
    query="search query",
    query_embedding=query_embedding,
    limit=5
)

# Or use the class directly for more control
searcher = HybridSearch()
results = searcher.search(
    query="search query",
    query_embedding=query_embedding,
    limit=5,
    k=60,
    vector_weight=1.0,
    fts_weight=1.0
)

Server Tools API

The server tools module contains the core logic for all MCP operations:

from contextnest.server_tools import (
    web_scrape_logic,
    search_logic,
    insert_knowledge_logic,
    read_metadata_logic
)

# Each logic function can be called independently
result = await web_scrape_logic(input_data, ctx)
result = await search_logic(input_data, ctx)
result = await insert_knowledge_logic(input_data, ctx)
result = read_metadata_logic(input_data)

Examples

Full Cycle Example

The repository includes a full cycle example that demonstrates:

Web scraping from a URL
Content statistics analysis
Document insertion into the database
Hybrid search with vector + BM25 ranking

from examples.full_cycle_example import run_full_cycle_example

run_full_cycle_example()

The example uses the GitHub repository page for the AI Dev Tools Zoomcamp as a source, and performs a search for "what's the first question" to demonstrate the complete workflow. The example includes URL caching to skip scraping if the URL already exists in the database.

Custom Usage

import asyncio
from contextnest.web_scraper import WebScraper
from contextnest.micro_search import insert_document, hybrid_search
from contextnest.mcp_logger import info_mcp

async def custom_workflow():
    # Scrape content from a URL
    content = await scrape_url("https://example.com/article")

    # Insert into knowledge base
    insert_document(
        url="https://example.com/article",
        title="Example Article",
        content=content
    )

    # Search for relevant content
    query_embedding = [0.1, 0.2, 0.3]  # Generated embedding
    results = hybrid_search(
        query="relevant topic",
        query_embedding=query_embedding,
        limit=3
    )

    info_mcp(f"Found {len(results)} results")

# Run the workflow
asyncio.run(custom_workflow())

Detailed Module Documentation

For more detailed information about specific modules, see the documentation in the docs/ directory:

Web Scraper Module: Documentation for the web scraping functionality
Micro Search Module: Documentation for the hybrid search system
Main Documentation Index: Complete documentation index

Contributing

We welcome contributions to ContextNest! Here's how you can help:

Development Setup

Fork the repository

Create a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

Install dependencies:
```
pip install -e .
```
Install Playwright dependencies:
```
playwright install
```

Code Standards

Follow PEP 8 style guidelines
Write type hints for all public functions
Include docstrings for all classes and functions
Add tests for new functionality
Keep dependencies minimal and well-justified

Pull Request Process

Create a feature branch from the main branch
Make your changes with clear, descriptive commit messages
Add tests if applicable
Update documentation as needed
Submit a pull request with a clear description of your changes

Reporting Issues

When reporting issues, please include:

Python version
Operating system
Steps to reproduce the issue
Expected vs. actual behavior
Any relevant error messages

License

This project is licensed under the MIT License - see the LICENSE file for details.

Support

For support, please open an issue in the GitHub repository. For questions about the Model Context Protocol, refer to the official MCP documentation.

Made with ❤️ for the AI development community.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.1.2

Dec 27, 2025

0.1.1

Dec 27, 2025

This version

0.1.0

Dec 26, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contextnest-0.1.0.tar.gz (145.4 kB view details)

Uploaded Dec 26, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

contextnest-0.1.0-py3-none-any.whl (38.3 kB view details)

Uploaded Dec 26, 2025 Python 3

File details

Details for the file contextnest-0.1.0.tar.gz.

File metadata

Download URL: contextnest-0.1.0.tar.gz
Upload date: Dec 26, 2025
Size: 145.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for contextnest-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`3d7a04186de7837b1e55c8ac6e5c07e9ccb8925c39af10e90d0eec0f43b0fe52`
MD5	`1687803d4ff5d5d8cb2812545937a984`
BLAKE2b-256	`13b9b17d9b19f0caeebd2f93e90bf1ee5693c98ee037cc83dbc90086708f50d7`

See more details on using hashes here.

File details

Details for the file contextnest-0.1.0-py3-none-any.whl.

File metadata

Download URL: contextnest-0.1.0-py3-none-any.whl
Upload date: Dec 26, 2025
Size: 38.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"13","id":"trixie","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for contextnest-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2b40cfe1dc4d16c78379ab0ff4b09e95f6530b4fcd93bf0b56407eede7f6e466`
MD5	`a7499b1a65bcb7e840b8876e2364611a`
BLAKE2b-256	`ad518047287722a07770dbac211d69446a964527ad3f714174a259f411df02a4`

See more details on using hashes here.

contextnest 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ContextNest 🦅

Table of Contents

Project Overview

Key Features

Architecture

Core Components

MCP Server (mcp_server.py)

Web Scraper (web_scraper/)

Micro Search (micro_search/)

MCP Logger (mcp_logger.py)

Installation

Prerequisites

Setup

Install from PyPI

MCP Client Configuration

Dependencies

Configuration

Configuration Files

Usage

Running the MCP Server

Available Tools

1. Web Scraping (web_scrape)

2. Search (search)

3. Insert Knowledge (insert_knowledge)

4. Read Metadata (read_metadata)

Available Resources

1. Output File Resource (contextnest://output/{filename})

2. Metadata Resource (contextnest://metadata)

MCP Prompt

API Documentation

MCPLogger

Web Scraper API

Micro Search API

Server Tools API

Examples

Full Cycle Example

Custom Usage

Detailed Module Documentation

Contributing

Development Setup

Code Standards

Pull Request Process

Reporting Issues

License

Support

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

MCP Server (`mcp_server.py`)

Web Scraper (`web_scraper/`)

Micro Search (`micro_search/`)

MCP Logger (`mcp_logger.py`)

1. Web Scraping (`web_scrape`)

2. Search (`search`)

3. Insert Knowledge (`insert_knowledge`)

4. Read Metadata (`read_metadata`)

1. Output File Resource (`contextnest://output/{filename}`)

2. Metadata Resource (`contextnest://metadata`)