Skip to main content

Converter HTML to Telegram Entities

Project description

Sulguk - HTML to telegram entities converter

PyPI version downloads license

Need to deliver formatted content to your bot clients? Having a hangover after trying to fit HTML into telegram? Beautifulsoup is too complicated and not helping with messages?

Try sulguk (술국, a hangover soup) - delivered since 1800s.

Problem

Telegram supports parse_mode="html", but:

  • Telegram processes spaces and new lines incorrectly. So we cannot format HTML source for more readability.
  • Amount of supported tags is very low
  • It does not ignore additional attributes in supported tags.

Let's imagine we have HTML like this:

<b>This is a demo of <a href="https://github.com/tishka17/sulguk">Sulguk</a></b>

  <u>Underlined</u>
  <i>Italic</i>
  <b>Bold</b>

This is how it is rendered in browser (expected behavior):

But this is how it is rendered in Telegram with parse_mode="html":

To solve this we can convert HTML to telegram entities with sulguk. So that's how it looks now:

Example

  1. Create your nice HTML:
<ol start="10">
    <li>some item</li>
    <li>other item</li>
</ol>
<p>Some <b>text</b> in a paragraph</p>
  1. Convert it into text and entities
result = transform_html(raw_html)
  1. Send it to telegram.

Depending on your library you may need to convert entities from dict into proper type

await bot.send_message(
    chat_id=CHAT_ID,
    text=result.text,
    entities=result.entities,
)

Example for aiogram users

  1. Add SulgukMiddleware to your bot
from sulguk import AiogramSulgukMiddleware

bot.session.middleware(AiogramSulgukMiddleware())
  1. Create your nice HTML:
<ol start="10">
    <li>some item</li>
    <li>other item</li>
</ol>
<p>Some <b>text</b> in a paragraph</p>
  1. Send it using sulguk as a parse_mode:
from sulguk import SULGUK_PARSE_MODE

await bot.send_message(
    chat_id=CHAT_ID,
    text=raw_html,
    parse_mode=SULGUK_PARSE_MODE,
)

Supported tags:

For all supported tags unknown attributes are ignored as well as unknown classes. Unsupported tags are raising an error.

Standard telegram tags (with some changes):

  • <a> - a hyperlink with href attribute
  • <b>, <strong> - a bold text
  • <i>, <em> - an italic text
  • <s>, <strike>, <del> - a strikethrough text
  • <u>, <ins> - an underlined text
  • <span> - an inline element with optional attribute class="tg-spoiler" to make a spoiler
  • <tg-spoiler> - a telegram spoiler
  • <pre> with optional class="language-<name>" - a preformatted block with code. <name> will be sent as a language attribute in telegram.
  • <code> - an inline preformatted element.
  • <details> - rendered as an expandable blockquote
  • <summary> - treated as a paragraph, typically used as first child of <details> for the heading

Note: In standard Telegram HTML you can set a preformatted text language nesting <code class="language-<name>"> in <pre> tag. This works when it is an only child. But any additional symbol outside of <code> breaks it. The same behavior is supported in sulguk. Otherwise, you can set the language on <pre> tag itself.

Additional tags:

  • <br/> - new line
  • <hr/> - horizontal line
  • <wbr/> - word break opportunity
  • <ul> - unordered list
  • <ol> - ordered list with optional attributes
    • reversed - to reverse numbers order
    • type (1/a/A/i/I) - to set numbering style
    • start - to set starting number
  • <li> - list item, with optional value attribute to change number. Nested lists have indentation
  • <div> - a block (not inline) element
  • <p> - a paragraph, emphasized with empty lines
  • <q> - a quoted text
  • <blockquote> - a block quote. Like a paragraph with indentation
  • <blockquote expandable> - a block quote with expandable
  • <h1>-<h6> - text headers, styled using available telegram options
  • <noscirpt> - contents is shown as not scripting is supported
  • <cite>, <var> - italic
  • <progress>, <meter> are rendered using emoji (🟩🟩🟩🟨⬜️⬜️)
  • <kbd>, <samp> - preformatted text
  • <img> - as a link with picture emoji before. alt text is used if provided.
  • <tt> - as italic text
  • <input> - as symbols ✅⬜️(checkbox), 🔘⚪️(radio) or _______/value in other cases

Tags which are treated as block elements (like <div>):

<footer>, <header>, <main>, <nav>, <section>

Tags which are treated as inline elements (like <span>):

<html>, <body>, <output>, <data>, <time>

Tags which contents is ignored:

<head>, <link>, <meta>, <script>, <style>, <template>, <title>

Command line utility for channel management

  1. Install with addons
pip install 'sulguk[cli]'
  1. Set environment variable BOT_TOKEN
export BOT_TOKEN="your telegram token"
  1. Send HTML file as a message to your channel. Additional files will be sent as comments to the first one. You can provide a channel name or a public link
sulguk send @chat_id file.html
  1. If you want to, edit using the link from shell or from your tg client. Edition of comments is supported as well.
sulguk edit 'https://t.me/channel/1?comment=42' file.html

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sulguk-0.12.0.tar.gz (31.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sulguk-0.12.0-py3-none-any.whl (31.9 kB view details)

Uploaded Python 3

File details

Details for the file sulguk-0.12.0.tar.gz.

File metadata

  • Download URL: sulguk-0.12.0.tar.gz
  • Upload date:
  • Size: 31.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for sulguk-0.12.0.tar.gz
Algorithm Hash digest
SHA256 826b3cbc1ac3e9df215dc4d8a4c564569e0c1777e4f1f459e6829798dad4a93c
MD5 50269c95090ad402ca4f4529378c07f4
BLAKE2b-256 7a3477fbb1fde50a5af4a51189d29567acfd90391333b676eb95d77a156a6ae5

See more details on using hashes here.

File details

Details for the file sulguk-0.12.0-py3-none-any.whl.

File metadata

  • Download URL: sulguk-0.12.0-py3-none-any.whl
  • Upload date:
  • Size: 31.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for sulguk-0.12.0-py3-none-any.whl
Algorithm Hash digest
SHA256 22981ac9a9083d498882e200efcf9527c3c6def24f6b8677b7b0ed7a7cfbbd68
MD5 ae3bc376693d258aef75dc2207c59ce6
BLAKE2b-256 ca654ee61d3d4856230d2abda454e7511e0675a01024dc19101bb90d397ebd59

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page