Skip to main content

Convert Markdown to Telegram plain text + MessageEntity pairs

Project description

telegramify-markdown

GitHub Repo stars PyPI version Downloads

Effortlessly convert raw Markdown to Telegram plain text + MessageEntity pairs.

Say goodbye to MarkdownV2 escaping headaches! This library parses Markdown (including LLM output, GitHub READMEs, etc.) and produces (text, entities) tuples that can be sent directly via the Telegram Bot API — no parse_mode needed.

  • No matter the format or length, it can be easily handled!
  • Entity offsets are measured in UTF-16 code units, exactly as Telegram requires.
  • We also support LaTeX-to-Unicode conversion, expandable block quotes, and Mermaid diagram rendering.
  • Built on pyromark (Rust pulldown-cmark bindings) for speed and correctness.

[!NOTE] v1.0.0 is a breaking change from 0.x. The output is now (str, list[MessageEntity]) instead of a MarkdownV2 string. The old markdownify() and standardize() functions have been removed.

Currently in release candidate. Install with pip install telegramify-markdown --pre to try it. The default pip install telegramify-markdown (without --pre) still installs the stable 0.5.x version.

👀 Use case

convert() convert() telegramify()
result result result

🪄 Quick Start

Install

Requires Python 3.10+. Currently in release candidate — use the pre-release flag for your package manager.

# uv (recommended)
uv add telegramify-markdown --prerelease=allow
uv add "telegramify-markdown[mermaid]" --prerelease=allow

# pip
pip install telegramify-markdown --pre
pip install "telegramify-markdown[mermaid]" --pre

# PDM
pdm add telegramify-markdown --prerelease
pdm add "telegramify-markdown[mermaid]" --prerelease

# Poetry
poetry add telegramify-markdown --allow-prereleases
poetry add "telegramify-markdown[mermaid]" --allow-prereleases

🤔 What you want to do?

  • If you just want to send static text and don't want to worry about formatting → use convert()
  • If you are developing an LLM application or need to send potentially super-long text → use telegramify()
  • If you need to split convert() output manually → use split_entities()
  • If your API only supports parse_mode="MarkdownV2" (no entities parameter) → use entities_to_markdownv2()

convert() — single message

from telebot import TeleBot
from telegramify_markdown import convert

bot = TeleBot("YOUR_TOKEN")

md = "**Bold**, _italic_, and `code`."
text, entities = convert(md)

bot.send_message(
    chat_id,
    text,
    entities=[e.to_dict() for e in entities],
)

No parse_mode parameter — Telegram reads the entities directly.

telegramify() — long messages, code files, diagrams

For LLM output or long documents, telegramify() splits text, extracts code blocks as files, and renders Mermaid diagrams as images:

import asyncio
from telebot import TeleBot
from telegramify_markdown import telegramify
from telegramify_markdown.content import ContentType

bot = TeleBot("YOUR_TOKEN")

md = """
# Report

Here is some analysis with **bold** and _italic_ text.

```python
print("hello world")
```

And a diagram:

```mermaid
graph TD
    A-->B
```
"""

async def send():
    results = await telegramify(md, max_message_length=4090)
    for item in results:
        if item.content_type == ContentType.TEXT:
            bot.send_message(
                chat_id,
                item.text,
                entities=[e.to_dict() for e in item.entities],
            )
        elif item.content_type == ContentType.PHOTO:
            bot.send_photo(
                chat_id,
                (item.file_name, item.file_data),
                caption=item.caption_text or None,
                caption_entities=[e.to_dict() for e in item.caption_entities] or None,
            )
        elif item.content_type == ContentType.FILE:
            bot.send_document(
                chat_id,
                (item.file_name, item.file_data),
                caption=item.caption_text or None,
                caption_entities=[e.to_dict() for e in item.caption_entities] or None,
            )

asyncio.run(send())

split_entities() — manual splitting

If you use convert() but need to split long output yourself:

from telegramify_markdown import convert, split_entities

text, entities = convert(long_markdown)

for chunk_text, chunk_entities in split_entities(text, entities, max_utf16_len=4096):
    bot.send_message(
        chat_id,
        chunk_text,
        entities=[e.to_dict() for e in chunk_entities],
    )

entities_to_markdownv2() — reverse conversion to MarkdownV2

If your middleware API does not support the entities parameter and only accepts parse_mode="MarkdownV2", you can convert the (text, entities) output back to a MarkdownV2 string:

from telegramify_markdown import convert, entities_to_markdownv2

text, entities = convert("**Bold** and `code`")
mdv2 = entities_to_markdownv2(text, entities)

bot.send_message(chat_id, mdv2, parse_mode="MarkdownV2")

This handles all MarkdownV2 escaping rules correctly (different escaping for normal text, code/pre blocks, and URLs).

⚙️ Configuration

Customize heading symbols, link symbols, and expandable citation behavior:

from telegramify_markdown.config import get_runtime_config

cfg = get_runtime_config()
cfg.markdown_symbol.heading_level_1 = "📌"
cfg.markdown_symbol.link = "🔗"
cfg.cite_expandable = True  # Long quotes become expandable_blockquote

# For clean output without emoji heading prefixes:
# cfg.markdown_symbol.heading_level_1 = ""
# cfg.markdown_symbol.heading_level_2 = ""
# cfg.markdown_symbol.heading_level_3 = ""
# cfg.markdown_symbol.heading_level_4 = ""

📖 API Reference

convert(markdown, *, latex_escape=True) -> tuple[str, list[MessageEntity]]

Synchronous. Converts a Markdown string to plain text and a list of MessageEntity objects.

Parameter Type Default Description
markdown str required Raw Markdown text
latex_escape bool True Convert LaTeX \(...\) and \[...\] to Unicode symbols

Returns (text, entities) where text is plain text and entities is a list of MessageEntity.

telegramify(content, *, max_message_length=4096, latex_escape=True) -> list[Text | File | Photo]

Async. Full pipeline: converts Markdown, splits long messages, extracts code blocks as files, renders Mermaid diagrams as images.

Parameter Type Default Description
content str required Raw Markdown text
max_message_length int 4096 Max UTF-16 code units per text message
latex_escape bool True Convert LaTeX to Unicode

Returns an ordered list of Text, File, or Photo objects.

split_entities(text, entities, max_utf16_len) -> list[tuple[str, list[MessageEntity]]]

Split text + entities into chunks within a UTF-16 length limit. Splits at newline boundaries; entities spanning a split point are clipped into both chunks.

entities_to_markdownv2(text, entities=None) -> str

Reverse conversion: takes plain text and entities, returns a MarkdownV2 string with correct escaping. Useful when your API only supports parse_mode="MarkdownV2" and cannot pass entities directly.

Parameter Type Default Description
text str required Plain text content
entities list[MessageEntity] | None None Entity list (UTF-16 offsets)

MessageEntity

@dataclasses.dataclass(slots=True)
class MessageEntity:
    type: str           # "bold", "italic", "code", "pre", "text_link", etc.
    offset: int         # Start position in UTF-16 code units
    length: int         # Length in UTF-16 code units
    url: str | None     # For "text_link" entities
    language: str | None       # For "pre" entities (code block language)
    custom_emoji_id: str | None  # For "custom_emoji" entities

    def to_dict(self) -> dict: ...

Content Types

Class Fields Description
Text text, entities, content_trace A text message segment
File file_name, file_data, caption_text, caption_entities, content_trace An extracted code block
Photo file_name, file_data, caption_text, caption_entities, content_trace A rendered Mermaid diagram

utf16_len(text) -> int

Returns the length of a string in UTF-16 code units (what Telegram uses for offsets).

🔨 Supported Markdown Features

  • Headings (Levels 1-6: H1-H2 bold+underline, H3-H4 bold, H5-H6 italic; H1-H4 with emoji prefix)
  • **Bold**, *Italic*, ~~Strikethrough~~
  • ||Spoiler||
  • [Links](url) and ![Images](url)
  • Telegram custom emoji ![emoji](tg://emoji?id=...)
  • Inline code and fenced code blocks
  • Block quotes > (with expandable citation support)
  • Tables (rendered as monospace pre blocks)
  • Ordered and unordered lists
  • Task lists - [x] / - [ ]
  • Horizontal rules ---
  • LaTeX math \(...\) and \[...\] (converted to Unicode)
  • Mermaid diagrams (rendered as images, requires [mermaid] extra)

🤖 For AI Coding Assistants

Copy this block into your AI assistant's context (e.g. CLAUDE.md, Cursor Rules, etc.) to get accurate code generation for telegramify-markdown:

Click to expand context block
# telegramify-markdown integration guide

## Install
uv add telegramify-markdown --prerelease=allow  # or: pip install telegramify-markdown --pre

## API (v1.0.0+) — outputs plain text + MessageEntity, NOT MarkdownV2 strings

### convert() — sync, single message
from telegramify_markdown import convert
text, entities = convert("**bold** and _italic_")
bot.send_message(chat_id, text, entities=[e.to_dict() for e in entities])
# Do NOT set parse_mode — entities replace it entirely.

### telegramify() — async, auto-splits long text, extracts code blocks as files
from telegramify_markdown import telegramify
from telegramify_markdown.content import ContentType
results = await telegramify(md, max_message_length=4090)
for item in results:
    if item.content_type == ContentType.TEXT:
        bot.send_message(chat_id, item.text, entities=[e.to_dict() for e in item.entities])
    elif item.content_type == ContentType.FILE:
        bot.send_document(chat_id, (item.file_name, item.file_data))
    elif item.content_type == ContentType.PHOTO:
        bot.send_photo(chat_id, (item.file_name, item.file_data))

### split_entities() — manual splitting for convert() output
from telegramify_markdown import convert, split_entities
text, entities = convert(long_md)
for chunk_text, chunk_entities in split_entities(text, entities, max_utf16_len=4096):
    bot.send_message(chat_id, chunk_text, entities=[e.to_dict() for e in chunk_entities])

### entities_to_markdownv2() — reverse to MarkdownV2 string
from telegramify_markdown import convert, entities_to_markdownv2
text, entities = convert("**Bold** and `code`")
mdv2 = entities_to_markdownv2(text, entities)
bot.send_message(chat_id, mdv2, parse_mode="MarkdownV2")
# Use when your API only supports parse_mode, not entities parameter.

### Configuration
from telegramify_markdown.config import get_runtime_config
cfg = get_runtime_config()
cfg.markdown_symbol.heading_level_1 = "📌"
cfg.cite_expandable = True

## Critical rules
- entities must be passed as list[dict] via [e.to_dict() for e in entities], NEVER as JSON string
- NEVER set parse_mode when sending with entities — they are mutually exclusive
- All entity offsets are UTF-16 code units. Use utf16_len() to measure text length.
- Requires Python 3.10+

🧸 Acknowledgement

This library is inspired by npm:telegramify-markdown.

LaTeX escape is inspired by latex2unicode and @yym68686.

📜 License

This project is licensed under the MIT License — see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

telegramify_markdown-1.0.0.tar.gz (57.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

telegramify_markdown-1.0.0-py3-none-any.whl (40.6 kB view details)

Uploaded Python 3

File details

Details for the file telegramify_markdown-1.0.0.tar.gz.

File metadata

  • Download URL: telegramify_markdown-1.0.0.tar.gz
  • Upload date:
  • Size: 57.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: pdm/2.26.6 CPython/3.12.3 Linux/6.14.0-1017-azure

File hashes

Hashes for telegramify_markdown-1.0.0.tar.gz
Algorithm Hash digest
SHA256 d4c266b7e93a74771ccc953a3efb4a0f933daa605e450d1970d8c39e62b70d55
MD5 5aa49800a80f083a882df45fd6d7383a
BLAKE2b-256 a6f425dbb8c01dea6f02d3ceac6095ef9aa34ddeaf2ad333b6ef5a573f000f43

See more details on using hashes here.

File details

Details for the file telegramify_markdown-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: telegramify_markdown-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 40.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: pdm/2.26.6 CPython/3.12.3 Linux/6.14.0-1017-azure

File hashes

Hashes for telegramify_markdown-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 efb18bbce52ac309dcd9698d61924dec14ba41f293be4a3dcc1af4144b87ae7c
MD5 902cefa2515d8a519ad2f9fa0e02728f
BLAKE2b-256 54ebd5215f6f0299bfece60deba1c95c4bf67162d59b4c69db4df4aa29fc6059

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page