Skip to main content

A commercial-grade MCP Server built on FastMCP, offering robust capabilities to read, extract, and localize (into Markdown) content from web pages and PDFs with both text and images. It is purpose-built for long-term deployment in enterprise environments.

Project description

English | 简体中文

Negentropy Perceives

The Perception Engine for AI Agents · Enterprise-Grade MCP Server

Distilling web pages and PDFs into clean Markdown nectar, ready to be fed directly to your LLM.

Python License PyPI Stars Alpha

6 MCP Tools · Pipeline Orchestration · 5-Engine PDF Decoding · LLM Smart Evaluation


✨ Why Negentropy Perceives?

In the vast ecosystem of AI agent projects, the "dirty work" of information perception often degenerates into fragile, unmaintainable chaos over time. Grounded in our core engineering philosophy of Orthogonal Decomposition and Entropy Reduction (Negentropy), we completely quarantine the mess of low-level network communications and format deconstruction. We only inject pure, undisputed certainty into your sandbox:

  • 🕵️ Web Page to Markdown: Facing heavily-rendered SPAs and fortified anti-scraping defenses? The engine comes armed with a built-in 5-tier penetration mechanism (ranging from hyper-concurrency to headless stealth browser rotation). "What You See Is What You Get" — tearing through waterfall setups is a walk in the park.
  • 📑 PDF to Markdown: Stop compromising over misaligned tables and mangled characters. Powered by our proprietary "Engine Arena" mechanism, engaging Smart mode summons an LLM as the ultimate referee. It coordinates 7 specialized engines (including Docling, PyMuPDF, etc.) performing concurrent deconstruction to precisely extract LaTeX formulas, gnarly table matrices, and deep layout structures.
  • 🦾 Heavy-Duty Infrastructure: Abandon toy-grade SDK wrappers. Our core is hardwired with resilient exponential backoffs, multi-layered rate-limiting circuit breakers, and aggressive memory caching mechanisms. Riding on full-duplex asyncio, it maxes out the absolute throughput limit of a single node.
  • 🔌 Native MCP Integration: We firmly embrace the pristine Model Context Protocol specification. Leveraging standard HTTP / STDIO / SSE transports, it abandons redundant glue code for seamless, zero-friction injection into Claude Desktop or Cursor environments.

Quick Start

1. Millisecond Loading

# We recommend using uv (Python 3.13+ required)
uv add negentropy-perceives

2. Ignite the Engine

uv run negentropy-perceives  # Defaults to listening on localhost:2992, HTTP mode

💡 Advanced Arsenal: Upon first launch, Negentropy Perceives will auto-generate its configuration at ~/.negentropy/perceives.config.yaml. Hidden inside are the switches for high-tier warfare.

3. Witness True Perception

import asyncio
from negentropy.perceives.sdk import NegentropyPerceivesClient

async def perceive_world():
    async with NegentropyPerceivesClient() as client:
        result = await client.parse_webpage_to_markdown(
            url="https://en.wikipedia.org/wiki/Entropy",
        )
        print("====== Pure Nectar Extracted ======")
        print(result.markdown_content[:250], "......\n")
        print(f"📊 Pure words retrieved from the noise: {result.word_count}")

asyncio.run(perceive_world())

4. Connect the MCP Client

Add the following to your claude_desktop_config.json in Claude Desktop:

{
  "mcpServers": {
    "negentropy-perceives": {
      "type": "http",
      "url": "http://localhost:2992/mcp"
    }
  }
}

Supports three transport modes: STDIO (local dev), HTTP (production-recommended), and SSE (compatibility mode). See the User Guide for the comprehensive configuration.


Core Capabilities

Toolkit Overview

Tool Function Use Case
discover_links Discover webpage links, supports domain filtering Site map discovery, link audits
inspect_page Inspect page metadata (status code, content type, etc.) Target page pre-flight check
parse_webpage_to_markdown Webpage to Markdown Granular single-page extraction
parse_webpages_to_markdown Batch Webpages to Markdown Knowledge base building, site archives
parse_pdf_to_markdown PDF to Markdown Academic papers, financial reports
parse_pdfs_to_markdown Batch PDFs to Markdown Mass document digitization

[!WARNING]

Please adhere to the targeted website's Terms of Service (TOS) and sensibly restrict request frequencies. This tool is intended exclusively for legal and compliant data acquisition.

Web Scraping Strategies

Method Description
auto Smart selection (Recommended)
simple Standard HTTP request, ideal for static pages
selenium Browser rendering, seamlessly executes JS
stealth_selenium Covert Selenium, shatters anti-scraping blocks
stealth_playwright Stealth Playwright, lightweight anti-detection

PDF Engines

Engine Specialty GPU Acceleration
Docling AI layout analysis, table recognition CUDA / MPS / XPU
MinerU Deep learning structure analysis, LaTeX CUDA / MLX
Marker Academic documents, Nougat model CUDA
PyMuPDF Lightning-fast text extraction
PyPDF Absolute baseline fallback

In auto mode, the system cascades through a graceful degradation chain: Docling → MinerU → Marker → PyMuPDF → PyPDF. Activating smart mode enlists an LLM to orchestrate a competitive parallel run across engines, ultimately fusing the optimum output.


Architectural Landscape

graph TD
    A["SDK Layer<br/>NegentropyPerceivesClient"] -.->|"HTTP Transport"| T["MCP Tool Layer<br/>6 Tools · @app.tool()"]
    T --> P["Pipeline Layer<br/>Stage Orchestration · Competition/Fallback"]
    T --> B["Processing Engine Layer<br/>Scraping · PDF · Markdown"]
    P --> B
    B --> C["Infrastructure Layer<br/>RateLimiter · Cache · Metrics · ErrorHandler · Retry"]
    C --> D["Configuration Layer<br/>pydantic-settings · Env Vars"]

    style A fill:#4c1d95,stroke:#a78bfa,color:#ffffff
    style T fill:#1e3a8a,stroke:#3b82f6,color:#ffffff
    style P fill:#b45309,stroke:#f59e0b,color:#ffffff
    style B fill:#166534,stroke:#22c55e,color:#ffffff
    style C fill:#134e4a,stroke:#14b8a6,color:#ffffff
    style D fill:#581c87,stroke:#9333ea,color:#ffffff

A 5-tier orthogonal architecture: SDK → MCP Tools → Pipeline Orchestration → Processing Engines → Infrastructure, with the Configuration Layer interweaving through everything. Featuring a 10-Stage PDF Pipeline and a 12-Stage WebPage Pipeline that strictly enforce both fallback and competitive execution models.


Documentation Navigator

Document Content Who is it for
User Guide Deep dive into 6 tools, MCP Server setup, SDK interfaces, advanced tweaks All Users
Architecture Design 5-tier architecture, Pipeline orchestration, engine fallbacks, Smart Mode Architects / Contributors
Developer Guide Environment setup, test framework, CI/CD, PR guidelines Developers
Changelog Release history and change logs Everyone

Community & Contributions

Beyond the World Wide Web and massive unstructured texts lies an abyss of noise. Only through relentless code evolution can we forge ahead steadily. If you hold the inspiration to pull chaos back into order, please do not hesitate to share:

  1. Before striking your keyboard, flip through the Developer Guide along the way.
  2. Hurl your paradigm-shifting ideas at our Issues or directly submit a Pull Request armed with game-changing power.

MIT License, © 2026 ThreeFish-AI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

negentropy_perceives-0.2.0a3.tar.gz (222.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

negentropy_perceives-0.2.0a3-py3-none-any.whl (299.1 kB view details)

Uploaded Python 3

File details

Details for the file negentropy_perceives-0.2.0a3.tar.gz.

File metadata

  • Download URL: negentropy_perceives-0.2.0a3.tar.gz
  • Upload date:
  • Size: 222.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for negentropy_perceives-0.2.0a3.tar.gz
Algorithm Hash digest
SHA256 f9df0026d447e77743d1eec903d896fb4840e003816e927e5f6d297d063032b6
MD5 07024044494a2c65a4db336d96ebfc04
BLAKE2b-256 5a19b9f0ae9fa3007655b319cac3863ab309cd14794d10a9a81df5d430300efd

See more details on using hashes here.

Provenance

The following attestation bundles were made for negentropy_perceives-0.2.0a3.tar.gz:

Publisher: release.yml on ThreeFish-AI/negentropy-perceives

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file negentropy_perceives-0.2.0a3-py3-none-any.whl.

File metadata

File hashes

Hashes for negentropy_perceives-0.2.0a3-py3-none-any.whl
Algorithm Hash digest
SHA256 9a8e77bd9b1e450594836e8a40c725553bcf6ce0d04ff8c0abbc5d9e5f12422f
MD5 61ce200c3d9aaed17ac056b1d8f39f98
BLAKE2b-256 50d8729b3f1ef9d27e562c53d04d09d1df211b88e5e29e935151f55dcb286417

See more details on using hashes here.

Provenance

The following attestation bundles were made for negentropy_perceives-0.2.0a3-py3-none-any.whl:

Publisher: release.yml on ThreeFish-AI/negentropy-perceives

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page