Markdown parser
Project description
🤔 CompreheMD
CompreheMD is a Python package for parsing Markdown documents.
Installation
CompreheMD requires Python 3.8 or later.
Install CompreheMD via pip:
pip install comprehemd
MarkdownParser class
Parsing a stream
The Markdown document parsed in this example is example.md.
To read an entire text stream, call .read(reader: IO[str])
. The method yields blocks until the stream ends.
from comprehemd import MarkdownParser
with open("docs/example.md", "r") as fp:
for block in MarkdownParser().read(fp):
print(block)
HeadingBlock (1): An Example Document
EmptyBlock
HeadingBlock (2): Introduction
EmptyBlock
Block: This is just a short example document.
EmptyBlock
HeadingBlock (2): Block examples
EmptyBlock
Block: Here's some backtick-fenced code:
EmptyBlock
CodeBlock (python): print("Hello, world!")
EmptyBlock
Block: Here's some tilde-fenced code:
EmptyBlock
CodeBlock (python): print("Hello, galaxy!")
EmptyBlock
Block: Here's some indented code:
EmptyBlock
CodeBlock (<None>): print("Hello, multiverse!")
EmptyBlock
Block: That's your lot!
Parsing chunks
The parser can be fed ad-hoc chunks of Markdown. The .feed(chunk: str)
method yields all the blocks that the chunk completed.
After feeding the final chunk, you must call .close()
to flush and yield any buffered blocks.
from comprehemd import CodeBlock, HeadingBlock, MarkdownParser
def tease(chunk: str) -> None:
escaped = chunk.replace("\n", "\\n")
for block in parser.feed(chunk):
print(f'After "{escaped}", the parser yielded:')
print(block)
print()
else:
print(f'After "{escaped}", the parser did not yield.')
print()
parser = MarkdownParser()
tease("# Feeding exam")
tease("ple\n\nThis de")
tease("monstrates chu")
tease("nked feeding.")
for block in parser.close():
print("After closing, the parser yielded:")
print(block)
print()
After "# Feeding exam", the parser did not yield.
After "ple\n\nThis de", the parser yielded:
HeadingBlock (1): Feeding example
After "ple\n\nThis de", the parser yielded:
EmptyBlock
After "ple\n\nThis de", the parser did not yield.
After "monstrates chu", the parser did not yield.
After "nked feeding.", the parser did not yield.
After closing, the parser yielded:
Block: This demonstrates chunked feeding.
Outline class
Generating an outline from a stream
The Markdown document parsed in this example is example.md.
The Outline
class keeps track of headings to generate an outline of a Markdown document.
The simplest way to generate an outline is to pass a text stream into the read_outline()
function:
from comprehemd import read_outline, OutlineItem
with open("docs/example.md", "r") as fp:
outline = read_outline(fp)
def log(indent: int, item: OutlineItem) -> None:
indent_str = " " * indent
print(f"{indent_str}{item.block}")
for child in item.children:
log(indent+1, child)
for item in outline.root:
log(0, item)
HeadingBlock (1): An Example Document
HeadingBlock (2): Introduction
HeadingBlock (2): Block examples
Generating an outline via a MarkdownParser
The Markdown document parsed in this example is example.md.
If you're already parsing a document and would prefer to generate the outline as you go rather than read the document again then you can add headings manually:
from comprehemd import (
HeadingBlock,
MarkdownParser,
Outline,
OutlineItem,
)
outline = Outline()
with open("docs/example.md", "r") as fp:
for block in MarkdownParser().read(fp):
if isinstance(block, HeadingBlock):
outline.add(block)
def log(indent: int, item: OutlineItem) -> None:
indent_str = " " * indent
print(f"{indent_str}{item.block}")
for child in item.children:
log(indent+1, child)
for item in outline.root:
log(0, item)
HeadingBlock (1): An Example Document
HeadingBlock (2): Introduction
HeadingBlock (2): Block examples
Rendering an outline
An outline can be rendered to Markdown by either treating the instance as a string or by calling .render(writer: IO[str])
.
from comprehemd import read_outline, OutlineItem
with open("docs/example.md", "r") as fp:
outline = read_outline(fp)
print(outline)
- [An Example Document](#an-example-document)
- [Introduction](#introduction)
- [Block examples](#block-examples)
Blocks classes
Block
The Block
class is the base of all blocks.
source
returns the original Markdown source for the block.text
returns the meaningful text representation of the block.
CodeBlock
The CodeBlock
class represents a code block.
language
returns the language hint if one was specified.- The block can be rendered back to Markdown by calling
render(writer: IO[str], fence: Fence)
.
EmptyBlock
EmptyBlock
represents an empty line.
HeadingBlock
The HeadingBlock
class represents a heading.
anchor
returns the heading's anchor.level
returns the heading's level (i.e. 1 for the top-most heading, down to 6 for the lowest).
Project
Contributing
To contribute a bug report, enhancement or feature request, please raise an issue at github.com/cariad/comprehemd/issues.
If you want to contribute a code change, please raise an issue first so we can chat about the direction you want to take.
Licence
CompreheMD is released at github.com/cariad/comprehemd under the MIT Licence.
See LICENSE for more information.
Author
Hello! 👋 I'm Cariad Eccleston and I'm a freelance DevOps and backend engineer. My contact details are available on my personal wiki at cariad.earth.
Please consider supporting my open source projects by sponsoring me on GitHub.
Acknowledgements
- Epic ❤️ to John Gruber for developing the original Markdown specification.
- This documentation was pressed by Edition.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file comprehemd-1.0.0a2-py3-none-any.whl
.
File metadata
- Download URL: comprehemd-1.0.0a2-py3-none-any.whl
- Upload date:
- Size: 15.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.8.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.10.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fb562f607be5d117c8e1912ba8079496e37e6f84960b6ebcaa9592e2d6a13f36 |
|
MD5 | 94bbfd553d37a64532bc7a2a892236c0 |
|
BLAKE2b-256 | 980378898ea50bb87c184be4a501eeb883cccfd0c3b8b264d6a54ee9502a14ce |