Normalize LangChain, MCP, and multimodal content blocks into provider-ready text and image payloads.
Project description
langchain-content-normalizer
Normalize the messy content shapes produced by LangChain, MCP tools, Anthropic content blocks, and multimodal chat APIs.
The package has no runtime dependencies. It works by duck typing instead of importing LangChain or MCP classes.
What it solves
LLM agent stacks often receive content as one of many incompatible shapes:
| Source | Example shape | Output |
|---|---|---|
| Classic chat | "plain text" |
"plain text" |
| Anthropic blocks | [{"type": "text", "text": "hi"}] |
"hi" |
| Tool calls | [{"type": "tool_use", ...}] |
skipped by default |
| MCP tool results | [{"type": "tool_result", "content": [...]}] |
flattened text |
| MCP objects | objects exposing .text |
extracted text |
| Message wrappers | objects exposing .content |
recursively normalized |
Install
uv add langchain-content-normalizer
Text normalization
from lc_content_normalizer import extract_text_content, normalize_tool_output
content = [
{"type": "text", "text": "Reading logs..."},
{"type": "tool_use", "name": "tail_logs", "input": {"service": "api"}},
]
assert extract_text_content(content) == "Reading logs..."
assert "tail_logs" in extract_text_content(content, skip_tool_use=False)
safe_output = normalize_tool_output(huge_tool_payload, max_chars=50_000)
Vision format routing
from lc_content_normalizer import build_human_message_content, detect_vision_format
vision_format = detect_vision_format("anthropic", "claude-3-5-sonnet")
content = build_human_message_content(
"Explain this alert screenshot",
images=[{"data_url": "data:image/png;base64,...", "mime_type": "image/png"}],
vision_format=vision_format,
)
detect_vision_format() returns:
| Provider/model | Format |
|---|---|
anthropic |
native Anthropic image block with source.base64 |
ollama + llava/vision model name |
OpenAI-compatible image_url block |
ollama text-only model |
none, images are dropped |
| OpenAI-compatible providers | OpenAI-compatible image_url block |
Examples
examples/normalize_mcp_output.pyshows how MCP-style tool results are flattened.examples/build_vision_content.pyshows provider-aware image block generation.
Roadmap
- Add strict mode for unknown content blocks.
- Add more MCP fixture coverage.
- Add provider-specific adapters as content formats evolve.
- Keep runtime dependencies at zero.
Development
uv sync --dev
uv run ruff check .
uv run pytest
uv run python scripts/smoke.py
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file langchain_content_normalizer-0.1.2.tar.gz.
File metadata
- Download URL: langchain_content_normalizer-0.1.2.tar.gz
- Upload date:
- Size: 12.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b92576b05ebcb30d067529b78bbf1649bf8598fa09f0ebd71a8e9af01b1e5301
|
|
| MD5 |
38709f82ca8e16467c9b3931e5426cd5
|
|
| BLAKE2b-256 |
e12010f64c326dd956a2ef57c8e901e507b7bc8e76156a70184769fb2bdff98b
|
Provenance
The following attestation bundles were made for langchain_content_normalizer-0.1.2.tar.gz:
Publisher:
publish.yml on BenjaminJornet/langchain-content-normalizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langchain_content_normalizer-0.1.2.tar.gz -
Subject digest:
b92576b05ebcb30d067529b78bbf1649bf8598fa09f0ebd71a8e9af01b1e5301 - Sigstore transparency entry: 1697974482
- Sigstore integration time:
-
Permalink:
BenjaminJornet/langchain-content-normalizer@73fcce42588a2f2de41462fefe51eac321e16509 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/BenjaminJornet
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@73fcce42588a2f2de41462fefe51eac321e16509 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file langchain_content_normalizer-0.1.2-py3-none-any.whl.
File metadata
- Download URL: langchain_content_normalizer-0.1.2-py3-none-any.whl
- Upload date:
- Size: 6.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
222ad10dcb83a2d4aaffec5c21603ef57e3cfbb4cd0c3154d0dfce65a19dfe94
|
|
| MD5 |
22db85d5d97389f458aa72c66a108b45
|
|
| BLAKE2b-256 |
27b0d1d2010f117ca31c1c98154bb7f9fe48d47170a346c60aed573db39a47d7
|
Provenance
The following attestation bundles were made for langchain_content_normalizer-0.1.2-py3-none-any.whl:
Publisher:
publish.yml on BenjaminJornet/langchain-content-normalizer
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
langchain_content_normalizer-0.1.2-py3-none-any.whl -
Subject digest:
222ad10dcb83a2d4aaffec5c21603ef57e3cfbb4cd0c3154d0dfce65a19dfe94 - Sigstore transparency entry: 1697974538
- Sigstore integration time:
-
Permalink:
BenjaminJornet/langchain-content-normalizer@73fcce42588a2f2de41462fefe51eac321e16509 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/BenjaminJornet
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@73fcce42588a2f2de41462fefe51eac321e16509 -
Trigger Event:
workflow_dispatch
-
Statement type: