Skip to main content

Normalize LangChain, MCP, and multimodal content blocks into provider-ready text and image payloads.

Project description

langchain-content-normalizer

CI PyPI License: MIT Python

Normalize the messy content shapes produced by LangChain, MCP tools, Anthropic content blocks, and multimodal chat APIs.

The package has no runtime dependencies. It works by duck typing instead of importing LangChain or MCP classes.

What it solves

LLM agent stacks often receive content as one of many incompatible shapes:

Source Example shape Output
Classic chat "plain text" "plain text"
Anthropic blocks [{"type": "text", "text": "hi"}] "hi"
OpenAI Responses text [{"type": "output_text", "text": "hi"}] "hi"
Tool calls [{"type": "tool_use", ...}] skipped by default
MCP tool results [{"type": "tool_result", "content": [...]}] flattened text
MCP objects objects exposing .text extracted text
Message wrappers objects exposing .content recursively normalized

Install

uv add langchain-content-normalizer

Text normalization

from lc_content_normalizer import extract_text_content, normalize_tool_output

content = [
    {"type": "text", "text": "Reading logs..."},
    {"type": "tool_use", "name": "tail_logs", "input": {"service": "api"}},
]

assert extract_text_content(content) == "Reading logs..."
assert "tail_logs" in extract_text_content(content, skip_tool_use=False)
assert extract_text_content(content, separator="\n") == "Reading logs..."

safe_output = normalize_tool_output(huge_tool_payload, max_chars=50_000, separator="\n")

Vision format routing

from lc_content_normalizer import build_human_message_content, detect_vision_format

vision_format = detect_vision_format("anthropic", "claude-3-5-sonnet")
content = build_human_message_content(
    "Explain this alert screenshot",
    images=[{"data_url": "data:image/png;base64,...", "mime_type": "image/png"}],
    vision_format=vision_format,
)

detect_vision_format() returns:

Provider/model Format
anthropic native Anthropic image block with source.base64
ollama + known vision model marker (llava, bakllava, moondream, minicpm-v, qwen2-vl, llama3.2-vision, vision) OpenAI-compatible image_url block
ollama text-only model none, images are dropped
OpenAI-compatible providers OpenAI-compatible image_url block

Examples

  • examples/normalize_mcp_output.py shows how MCP-style tool results are flattened.
  • examples/build_vision_content.py shows provider-aware image block generation.

Roadmap

  • Add provider-specific adapters as content formats evolve.
  • Keep runtime dependencies at zero.

Strict mode

By default, unknown non-empty content is preserved with str(...) so tool output is not silently lost. Use strict mode when unknown shapes should fail fast:

from lc_content_normalizer import UnknownContentBlockError, extract_text_content

try:
    extract_text_content([{"type": "custom", "payload": "..."}], strict=True)
except UnknownContentBlockError:
    ...

Development

uv sync --dev
uv run ruff check .
uv run pytest
uv run python scripts/smoke.py
uv build

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_content_normalizer-0.1.8.tar.gz (15.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_content_normalizer-0.1.8-py3-none-any.whl (7.1 kB view details)

Uploaded Python 3

File details

Details for the file langchain_content_normalizer-0.1.8.tar.gz.

File metadata

File hashes

Hashes for langchain_content_normalizer-0.1.8.tar.gz
Algorithm Hash digest
SHA256 c767be702678a7f4826ab1d7873c0c41899480dda2c758539d8840adb311764a
MD5 bd15d1226445dd502a42bf922869318f
BLAKE2b-256 f56df5590f93ee8ac4f1c17a44cbaa3b047bd8cac481554afcefe5acbd072f8e

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_content_normalizer-0.1.8.tar.gz:

Publisher: publish.yml on BenjaminJornet/langchain-content-normalizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file langchain_content_normalizer-0.1.8-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_content_normalizer-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 ebddbeeea9e9ce67d1da268632531ae10aea02690e7899a3c9957067aa62f096
MD5 f5067340bbff4a5f73f06b6f25cc3b84
BLAKE2b-256 b03e36ac431e7a26e11c1f315bcf6122284a497a31da7585b0f83146b7dbb74a

See more details on using hashes here.

Provenance

The following attestation bundles were made for langchain_content_normalizer-0.1.8-py3-none-any.whl:

Publisher: publish.yml on BenjaminJornet/langchain-content-normalizer

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page