Skip to main content

MCP server exposing raw envbert EDD classification to Claude, no LLM fallback, no Ollama/Azure dependency

Project description

envbert-mcp

MCP server exposing raw envbert (DistilBERT EDD classification) to Claude — standalone, no LLM fallback, no Ollama/Azure dependency.


Why raw envbert, not envbert-agent

envbert-agent adds an LLM fallback step for low-confidence classifications — useful when there's no other reasoning layer downstream (CLI usage, batch pipelines writing straight to a CSV). But Claude is a reasoning layer already sitting right there. Routing through a second, hidden LLM call inside the tool — adding ~10s and, if configured for Azure, real cost — is redundant when Claude can look at a low-confidence label directly and decide what to do with it.

This server calls envbert.due_diligence.envbert_predict() directly. No network hop, no Ollama, no Azure, no envbert-api. Sub-second responses.

If you need the LLM-fallback behaviour (e.g. building a non-LLM pipeline), use envbert-agent or envbert-api instead — see the cross-comparison below.


Architecture

Claude / Claude Code
        │ MCP (stdio)
        ▼
envbert_mcp/server.py     ←── this file, single process
        │ in-process call
        ▼
envbert.due_diligence.envbert_predict()
        │
        ▼
DistilBERT (d4data/environmental-due-diligence-model)

Everything runs in one process. The model loads once at startup (warmup, ~10-40s on a cold HuggingFace cache) and stays in memory for the life of the server.


Tools

Tool Purpose
check_envbert_status Confirm the model is loaded before relying on fast responses — the very first call in a session may be slow while warmup is still in progress
classify_environmental_text Classify one sentence/paragraph — label + confidence, no LLM step
classify_environmental_document Classify all paragraphs of a document concurrently, with category distribution and low-confidence flagging

Why low-confidence flagging matters here

Without an LLM fallback, a low envbert confidence score (e.g. 0.42) is the final answer — there's no second opinion baked in. classify_environmental_document surfaces a low_confidence_items list explicitly so Claude can apply its own judgement to exactly those paragraphs, rather than treating every result as equally reliable.

{
  "category_distribution": {"Geology": 4, "Contaminants": 2},
  "low_confidence_count": 1,
  "low_confidence_items": [
    {"index": 3, "label": "Remediation Standards", "confidence": 0.42}
  ],
  "results": [ ... ]
}

Quickstart

pip install envbert envbert-mcp

Add to ~/.claude/mcp_config.json:

{
  "mcpServers": {
    "envbert": {
      "command": "envbert-mcp"
    }
  }
}

Restart Claude Code. The model warms up in the background on first launch — check_envbert_status will report "loading": true until ready.

Example prompts:

  • "Is envbert ready?"
  • "Classify this: 'weathered shale was encountered below the surface with fluvial deposits'"
  • "Here's a 12-paragraph site report — classify each section and flag anything you're not confident about."

envbert-mcp vs envbert-agent / envbert-api — which to use

This package (raw envbert) envbert-agent / envbert-api
Used by Claude / MCP clients CLI, pipelines, non-LLM consumers
LLM fallback None Yes — Ollama (local) or Azure
Typical latency <1s always <1s confident, ~10s on fallback
External dependencies None beyond envbert Ollama or Azure OpenAI
Confidence on ambiguous text Raw model score only LLM-resolved final label
Why this shape Claude can reason over raw scores itself — a second hidden LLM call is redundant No reasoning layer downstream; the agent must resolve ambiguity itself

Both are legitimate — they're solving for different consumers of the classification, not competing implementations of the same thing.


Configuration

Variable Default Description
LOG_LEVEL INFO Logging verbosity

No other configuration needed — there's no backend URL, no LLM provider, no API keys. That's the point.


Development

pip install -e ".[dev]"
pytest tests/ -v

Tests mock envbert_predict() directly — no real model download needed to run the suite.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

envbert_mcp-3.0.0.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

envbert_mcp-3.0.0-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file envbert_mcp-3.0.0.tar.gz.

File metadata

  • Download URL: envbert_mcp-3.0.0.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for envbert_mcp-3.0.0.tar.gz
Algorithm Hash digest
SHA256 4fd3fddf808e7d1730bcdc5920d0810eb94ef0ca5ee654fbbf25c801f18b6b39
MD5 bacf7c5167400c30d92d9cdf3cb2e65d
BLAKE2b-256 7e7aaa86aeab2fc88ce670e0f44da56d57fee2802a54db4f34939b69b25e7ba0

See more details on using hashes here.

File details

Details for the file envbert_mcp-3.0.0-py3-none-any.whl.

File metadata

  • Download URL: envbert_mcp-3.0.0-py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.7

File hashes

Hashes for envbert_mcp-3.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f63f1908d16e47ddad0c7e3456f35b1c7ebbe7ea86dcb98f00852ae40786d084
MD5 24560e9ec55045cbbcbd05d14a1fee5b
BLAKE2b-256 c66983950a201797a8b3eb0c628de9ebf55183c53103f5a8be0f49afcefe657a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page