Python bindings for MD4C
Project description
PyMD4C
Python bindings for the very fast MD4C Markdown parsing and rendering library.
The MD4C C library provides a SAX-like parser that uses callbacks to return the various blocks, inlines, and text it parses from the Markdown input. In addition, it provides an HTML renderer that wraps the generic parser to provide HTML output directly.
Accordingly, this Python module provides two classes:
md4c.GenericParser
- Wraps the generic SAX-like MD4C parser. Requires Python functions (or other callables) as callbacks.md4c.HTMLRenderer
- Wraps the HTML renderer. Produces HTML output directly.
If other renderers are added to MD4C, they will get their own Python class as
well, similar to the HTMLRenderer
.
Install from PyPI
PyMD4C is available on PyPI under the name pymd4c
. Install it with pip like
this:
pip install pymd4c
This is the recommended method to obtain PyMD4C on Linux. It should work well on most distributions. Unfortunately, Windows and macOS packages are not currently built automatically, so users on those platforms will need to build from source. The instructions below should assist.
Note that some more esoteric distributions or non-x86/x86_64 architectures may not be supported by the manylinux packages. If either of those apply to your system, you may also need to build from source. (However, if you are running Linux on arm64, ppc64le, or s390x, consider opening a new GitHub issue--it may be possible to add your architecture.)
Build and Install from Source
Prerequisites
This package depends on the MD4C library. It may be available through your package manager. Otherwise, it can be built from source as follows (note that the below instructions are for Unix-like systems, but theoretically there are ways to build on Windows as well):
-
Download and extract the matching release from the releases page (e.g. for PyMD4C version W.X.Y.Z, download MD4C version W.X.Y).
-
Inside the extracted file, run the following:
mkdir build cd build cmake .. make # Do as root: make install
The install step must be run as root. The library will install to /usr/local by default.
-
You may need to rebuild the ldconfig cache (also as root):
ldconfig
In addition, the pkg-config
tool and the Python pkgconfig
package must be
available to build PyMD4C, but they are not required after that (i.e., they are
not a prerequisite for actually using PyMD4C). The pkg-config
tool is
likely available on your system already, and the Python pkgconfig
package
will be fetched automatically by setup.py
.
Finally, note that since this package uses C extensions, development headers
for Python must be installed for the build to succeed. If you are using Linux,
some distributions split these off from the main Python package. Install
python-dev
or python-devel
to get them.
Build/Install
Build and install with setup.py
as you would for any Python source
repository:
pip install .
Class GenericParser
import md4c
generic_parser = md4c.GenericParser(parser_flags)
Initialize a new GenericParser
. Parameters:
parser_flags
- Anint
made up of some combination of the parser option flags or'd together, e.g.md4c.MD_FLAG_TABLES | md4c.MD_FLAG_STRIKETHROUGH
. For the default options, use0
, which will parse according to the base CommonMark specification. See the "Module-Wide Constants" section below for a full list of parser option flags.
Note: If the end goal of parsing is to produce HTML, strongly consider
using an HTMLRenderer
instead. All rendering will be performed by native C
code, which will be much faster.
Parse Method
import md4c
generic_parser = md4c.GenericParser(...)
generic_parser.parse(input,
enter_block_callback,
leave_block_callback,
enter_span_callback,
leave_span_callback,
text_callback)
Parse markdown text using the provided callbacks. Parameters:
input
- Astr
orbytes
containing the Markdown document to parse. If abytes
, it must be UTF-8 encoded.enter_block_callback
- A function (or other callable) to be called whenever the parser enters a new block element in the Markdown source.leave_block_callback
- A function (or other callable) to be called whenever the parser leaves a block element in the Markdown source.enter_span_callback
- A function (or other callable) to be called whenever the parser enters a new inline element in the Markdown source.leave_span_callback
- A function (or other callable) to be called whenever the parser leaves an inline element in the Markdown source.text_callback
- A function (or other callable) to be called whenever the parser has text to add to the current block or inline element.
The parse()
method will raise md4c.ParseError
in the event of a problem
during parsing, such as running out of memory. This does not signal invalid
syntax, as there is no such thing in Markdown. It can also emit any exception
raised by any of the callbacks (except md4c.StopParsing
, which is caught and
handled quietly).
Callback Details
enter_block_callback
, leave_block_callback
, enter_span_callback
, and
leave_span_callback
all must accept two parameters:
-
type
- Anmd4c.BlockType
ormd4c.SpanType
representing the type of block or span. See the "Enums" section for more info. -
details
- Adict
that contains extra information for certain types of blocks and spans, for example, the level of a heading. Keys arestr
s. Values areint
s, single-characterstr
s, or (forMD_ATTRIBUTE
) lists of tuples.See the
MD_BLOCK_*_DETAIL
andMD_SPAN_*_DETAIL
structs in MD4C'smd4c.h
for information on exactly what thisdict
will contain.Regarding
MD_ATTRIBUTE
s: These are used where a block or span can contain some associated text, such as link titles and code block language references. Such attributes may contain multiple text sub-elements (e.g. some regular text, an HTML entity, and then some more regular text). Thus, anMD_ATTRIBUTE
value indetails
consists of a list of 2-tuples:(text_type, text)
wheretext_type
is anmd4c.TextType
(see "Enums" below) andtext
is the actual text as astr
.
text_callback
must also accept two parameters, but they are different:
type
- Anmd4c.TextType
representing the type of text element. See the "Enums" section for more info.text
- The actual text, as astr
.
Callbacks need not return anything specific; their return values are ignored.
To cancel parsing, callbacks can raise md4c.StopParsing
. This will be caught
by the parse()
method and immediately halt parsing quietly. All other
exceptions raised by callbacks will abort parsing and will be propagated back
to the caller of parse()
.
Class HTMLRenderer
import md4c
html_renderer = md4c.HTMLRenderer(parser_flags, renderer_flags)
Initialize a new HTMLRenderer
. Parameters:
parser_flags
- Anint
made up of some combination of the parser option flags or'd together, e.g.md4c.MD_FLAG_TABLES | md4c.MD_FLAG_STRIKETHROUGH
. For the default options, use0
, which will parse according to the base CommonMark standard. See the "Module-Wide Constants" section below for a full list of parser option flags.renderer_flags
- Anint
made up of some combination of the HTML renderer option flags or'd together. These are also listed in the "Module-Wide Constants" section below.
Parse Method
import md4c
html_renderer = md4c.HTMLRenderer(...)
html_renderer.parse(input)
Parse markdown text and return a str
with rendered HTML. Parameters:
input
- Astr
orbytes
containing the Markdown document to parse. If abytes
, it must be UTF-8 encoded.
This method will raise md4c.ParseError
in the event of a problem during
parsing, such as running out of memory. This does not signal invalid syntax, as
there is no such thing in Markdown.
Module-Wide Constants
The MD4C library provides various option flags for parsers and renderers as named constants. These are made available as module-level constants in PyMD4C.
Parser Option Flags
Basic option flags:
md4c.MD_FLAG_COLLAPSEWHITESPACE
- In normal text, collapse non-trivial whitespace into a single space.md4c.MD_FLAG_PERMISSIVEATXHEADERS
- Do not requite a space in ATX headers (e.g.###Header
).md4c.MD_FLAG_PERMISSIVEURLAUTOLINKS
- Convert URLs to links even without<
and>
.md4c.MD_FLAG_PERMISSIVEEMAILAUTOLINKS
- Convert email addresses to links even without<
,>
, andmailto:
.md4c.MD_FLAG_NOINDENTEDCODEBLOCKS
- Disable indented code blocks. (Only allow fenced code blocks.)md4c.MD_FLAG_NOHTMLBLOCKS
- Disable raw HTML blocks.md4c.MD_FLAG_NOHTMLSPANS
- Disable raw HTML inlines.md4c.MD_FLAG_TABLES
- Enable tables extension.md4c.MD_FLAG_STRIKETHROUGH
- Enable strikethrough extension.md4c.MD_FLAG_PERMISSIVEWWWAUTOLINKS
- Enable www autolinks (even without any scheme prefix, as long as they begin withwww.
).md4c.MD_FLAG_TASKLISTS
- Enable task lists extension.md4c.MD_FLAG_LATEXMATHSPANS
- Enable$
and$$
containing LaTeX equations.md4c.MD_FLAG_WIKILINKS
- Enable wiki links extension.md4c.MD_FLAG_UNDERLINE
- Enable underline extension (and disable_
for regular emphasis).
Combination option flags:
md4c.MD_FLAG_PERMISSIVEAUTOLINKS
- Enables all varieties of autolinks:MD_FLAG_PERMISSIVEURLAUTOLINKS
,MD_FLAG_PERMISSIVEEMAILAUTOLINKS
, andMD_FLAG_PERMISSIVEWWWAUTOLINKS
md4c.MD_FLAG_NOHTML
- Disables all raw HTML tags:MD_FLAG_NOHTMLBLOCKS
andMD_FLAG_NOHTMLSPANS
Dialect option flags (note that not all features of a dialect may be supported, but these flags will cause MD4C to parse as many features of the dialect as it supports):
-
md4c.MD_DIALECT_COMMONMARK
- This is the default behavior of MD4C, so no additional flags are enabled. -
md4c.MD_DIALECT_GITHUB
- Parse GitHub-Flavored Markdown, which enables the following flags:MD_FLAG_PERMISSIVEAUTOLINKS
MD_FLAG_TABLES
MD_FLAG_STRIKETHROUGH
MD_FLAG_TASKLISTS
HTML Renderer Option Flags
md4c.MD_HTML_FLAG_DEBUG
- For development use, send MD4C debug output to stderr.md4c.MD_HTML_FLAG_VERBATIM_ENTITIES
- Do not replace HTML entities with the actual character (e.g.©
with ©).md4c.MD_HTML_FLAG_SKIP_UTF8_BOM
- Omit BOM from start of UTF-8 input.md4c.MD_HTML_FLAG_XHTML
- Generate XHTML instead of HTML.
Enums
The MD4C library uses various enums to provide data to callbacks. PyMD4C uses
IntEnum
s to encapsulate these.
See md4c.h
from the MD4C project for more
information on these enums and associated types.
Block Types - class BlockType
md4c.BlockType.DOC
- Documentmd4c.BlockType.QUOTE
- Block quotemd4c.BlockType.UL
- Unordered listmd4c.BlockType.OL
- Ordered listmd4c.BlockType.LI
- List itemmd4c.BlockType.HR
- Horizontal rulemd4c.BlockType.H
- Headingmd4c.BlockType.CODE
- Code blockmd4c.BlockType.HTML
- Raw HTML blockmd4c.BlockType.P
- Paragraphmd4c.BlockType.TABLE
- Tablemd4c.BlockType.THEAD
- Table header rowmd4c.BlockType.TBODY
- Table bodymd4c.BlockType.TR
- Table rowmd4c.BlockType.TH
- Table header cellmd4c.BlockType.TD
- Table cell
Span Types - class SpanType
md4c.SpanType.EM
- Emphasismd4c.SpanType.STRONG
- Strongmd4c.SpanType.A
- Linkmd4c.SpanType.IMG
- Imagemd4c.SpanType.CODE
- Inline codemd4c.SpanType.DEL
- Strikethroughmd4c.SpanType.LATEXMATH
- Inline mathmd4c.SpanType.LATEXMATH_DISPLAY
- Display mathmd4c.SpanType.WIKILINK
- Wiki linkmd4c.SpanType.U
- Underline
Text Types - class TextType
md4c.TextType.NORMAL
- Normal textmd4c.TextType.NULLCHAR
- NULL charactermd4c.TextType.BR
- Line breakmd4c.TextType.SOFTBR
- Soft line breakmd4c.TextType.ENTITY
- HTML Entitymd4c.TextType.CODE
- Text inside a code block or inline codemd4c.TextType.HTML
- Raw HTML (inside an HTML block or simply inline HTML)md4c.TextType.LATEXMATH
- Text inside an equation
Table Alignments - class Align
md4c.Align.DEFAULT
md4c.Align.LEFT
md4c.Align.CENTER
md4c.Align.RIGHT
Exceptions
-
md4c.ParseError
- Raised by one of theparse()
methods when there is an error during parsing, such as running out of memory. There is no such thing as invalid syntax in Markdown, so this really only signals some sort of system error. -
md4c.StopParsing
- A callback can raise this to stop parsing early.GenericParser
'sparse()
method will catch it and abort quietly.
License
This project is licensed under the MIT license. See the LICENSE.md
file for
details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for pymd4c-0.4.4.0b1-cp39-cp39-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dac9a1df38d5b8f504ed8a55c103264947221657cc1a2802e28e65cb62db50db |
|
MD5 | 99f13be5e8780ea1b2197329bc32ce38 |
|
BLAKE2b-256 | 9a042df4ae5711aa5768d0e81a9ec8ffb2dcdc3b32a4b71791b07ec1f42ccd6c |
Hashes for pymd4c-0.4.4.0b1-cp39-cp39-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0d4bc868979ff25438e59a9668ebc2eea4f37763d74a2b46ce6c27198e824a3e |
|
MD5 | 4034eb13740bafd6760e7f0321ac4140 |
|
BLAKE2b-256 | 73a0f7eb6fea54db14e9737f3eb43e0525faa39ed080002083d0bbf771e7cc20 |
Hashes for pymd4c-0.4.4.0b1-cp39-cp39-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bc468c6ec6465c093da6df57c2b5cfa9f123ac090bde6cdac1ac6161230b3b67 |
|
MD5 | 77c6334ab919aaf946715c882e0b0da6 |
|
BLAKE2b-256 | 086f7933050884c49cbdacb9e3768e5a5e76afd1d17a7b326e0a6c7fb4f6ba62 |
Hashes for pymd4c-0.4.4.0b1-cp39-cp39-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ae083bf2f061363bb02bb5a13c5b74c9e948f1c4a99d1917cdf00214148e245 |
|
MD5 | 878dd4f313b0e8892eff931f5fbefb40 |
|
BLAKE2b-256 | 5bd5b5b985d269024db5ee06556e2eaa14faa6f43e8d0562781314108fb4da9a |
Hashes for pymd4c-0.4.4.0b1-cp38-cp38-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 223a6bce823ff84241b779dd1789e8dad302a408fdf0e96e13777a94d35a3934 |
|
MD5 | 41ca55194b55b77ee908a5e709812b7b |
|
BLAKE2b-256 | c2b7ba79ded6526d430640a662039247e54f8829af65dbb64b10ff809191bea6 |
Hashes for pymd4c-0.4.4.0b1-cp38-cp38-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b80207f6b77365a5e1b64e991f4d008c05178cfc5e0bc4c452c031230c93ac5c |
|
MD5 | c7c5b44032ec2fce9e8d5a68da0d32e0 |
|
BLAKE2b-256 | 0ad80de0a46ff2cef57f20b2a4cb9d39d9bd874d398a95328799376881a19627 |
Hashes for pymd4c-0.4.4.0b1-cp38-cp38-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7058dd7f64fd3f5ce1884dac6f7e0e4df4a984526347bdcd9ee0847b536b75b0 |
|
MD5 | 7485382a94010c5fe58ad6e6ba342ca6 |
|
BLAKE2b-256 | d6db7efde9bbbc3d71f66eb9ecf1f43cc0dd28090444387b5d50506171ad860f |
Hashes for pymd4c-0.4.4.0b1-cp38-cp38-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d54af4aecc5b515fd70332fa2bc1c02de6aab4157c71fd619b2ee43af566388d |
|
MD5 | c4de973a092262de91105c9723afd599 |
|
BLAKE2b-256 | bf5ed9c7a8146d0b85f03df88de85371ac3b8a49c32b21c8c91d88aed76d8906 |
Hashes for pymd4c-0.4.4.0b1-cp38-cp38-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9c521520cd8dac9f0011e3db8a3d146633cf80467aaf77118b63da27be09d4e0 |
|
MD5 | 7aa778acf8628c7056bf043fe8b5cc26 |
|
BLAKE2b-256 | 72a97dcb25f1f00d7d29f18dbddab0503f65ebe2bb9ddf3aaea74c59390f68e1 |
Hashes for pymd4c-0.4.4.0b1-cp37-cp37m-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 817dabd1269896bd44e6194f0aa308aa1bc32b92cbe4a3cc7c224d349adb1f8a |
|
MD5 | 021a32da4155dedc6c6751d9df32501c |
|
BLAKE2b-256 | 652cb60748b69f4aa48c15c1a285f06ff5471e79d7656350efde00fbab445f4f |
Hashes for pymd4c-0.4.4.0b1-cp37-cp37m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6de82123c5714dc1419aac1e56a83ed105241de2e4e72a18a57d018682f24d0a |
|
MD5 | 3ecc8e69cd5985369bf5459cae4eab51 |
|
BLAKE2b-256 | 6b76fca60b88e6a4cc5e6e0413dadb04fdbe8c56c5ef76b8645401ecb4a45d16 |
Hashes for pymd4c-0.4.4.0b1-cp37-cp37m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | cf82097cc8b1ce294af332a7ef60d2222713dcb0ca889a7b74d894132ff6e392 |
|
MD5 | be48817d964453acf7c4831c78dd5f61 |
|
BLAKE2b-256 | 4a28218fb8319d6da488e9f2feda707c1855cb139c2602765237e701d2f49326 |
Hashes for pymd4c-0.4.4.0b1-cp37-cp37m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5f3db8d8a70767ed92f088656d68ce14227a79091b61f1e5ff7b334dc37f402c |
|
MD5 | 9aef11533d37cb8836289b5518a6fdc7 |
|
BLAKE2b-256 | 4ad5969b2369cfca75d3592e13db67ee4ef8a3d4543951a22481ec06f990a532 |
Hashes for pymd4c-0.4.4.0b1-cp37-cp37m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b662375ba0159af13b52718e1ce30f0b230c11f7ae3a177ebbdd674c4bd9b4d4 |
|
MD5 | 76b63977364acc886238354db9b89bdd |
|
BLAKE2b-256 | 7d91ca5590fb9a98cfe5b782713982ad6b8cddd5fc09226c7641d92e2adb2f90 |
Hashes for pymd4c-0.4.4.0b1-cp36-cp36m-manylinux2014_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 67bd58fc29344803a6c50cd2e529d86fd29083f582be76dbd51678ebfb82557b |
|
MD5 | 21f6864d4cda2208508885c03dc7ac1a |
|
BLAKE2b-256 | 2830c59ae7b8650a32bd17bbd3ef295a7dff7313aa870740bda74709c4faff87 |
Hashes for pymd4c-0.4.4.0b1-cp36-cp36m-manylinux2010_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 13a8316fba3bd42960eb047d169a2c2ba79c5be19a6861a833b2162da9ad1123 |
|
MD5 | 5fed9d99be32ea457b39ba60a5108bc4 |
|
BLAKE2b-256 | 43f5ed3cf7d751cefa8cf8d04134371ddfc280eb0f1b989ae0ea396b45dbb8e4 |
Hashes for pymd4c-0.4.4.0b1-cp36-cp36m-manylinux2010_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a14f7752751fd50ca388f6be2fd5a7b2e057ff34f1cea91a23c3bbea17893308 |
|
MD5 | cdc34c30345e2c2e71b6ce965d312bd7 |
|
BLAKE2b-256 | 1f6a4bab5fa8fa8dd9b7ed446546df1b46777ad7d70a3a13969f7361693f141d |
Hashes for pymd4c-0.4.4.0b1-cp36-cp36m-manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 36c783f0e414ef1a265ef02fb5f97dfe2e740b28446f10275bff23e674a5029a |
|
MD5 | aa427eeb628e2627df07622392b1a3d4 |
|
BLAKE2b-256 | de4e57ddc5fdf1ce1dadfb6360be957b5dbd2fabd56e3fb3e2236c2f06197d04 |
Hashes for pymd4c-0.4.4.0b1-cp36-cp36m-manylinux1_i686.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f0e6de8746a5bdb3ae08ef8c0f65f2eb1651db07d04404a7cfe63b2a89fc4a1 |
|
MD5 | 70a880cbdddd4493ac1177b63d877f30 |
|
BLAKE2b-256 | 19693289294981b4c7da1c1761f316e27f4ef687637d58c9e426047fd01105d9 |