A fast, extensible Markdown parser in pure Python.
Project description
mistletoe-ebp
This is a version of mistletoe maintained by the Excutable Book Project (EBP). It tracks the myst
branch of ExecutableBookProject/mistletoe
which eventually, it is hoped, will be merged into mistletoe itself.
mistletoe is a Markdown parser in pure Python, designed to be fast, spec-compliant and fully customizable.
Apart from being the fastest CommonMark-compliant Markdown parser implementation in pure Python, mistletoe also supports easy definitions of custom tokens. Parsing Markdown into an abstract syntax tree also allows us to swap out renderers for different output formats, without touching any of the core components.
Unfortunately, mistletoe is not currently being actively maintained (as of June 8th 2019), and so this fork has been created to allow for a deployed release that can be utilised by EBP. Here is a working list of 'up-streamable' changes that would be desired of mistletoe that this version has begun to implement:
- Move testing from
unittest
topytest
:pytest
is now the de facto testing architecture and vastly improves the usability/flexibility of testing. - Introduce
pre-commit
code linting and formatting: This standardizes the code style across the package, and ensures that new commits and Pull Requests also conform to it. - Introduce
ReadTheDocs
documentation - Add a conda-forge distribution of the package
- Improve the AST API and documentation: I view panflute's implementation of the pandoc API in python, as the gold standard for how a pythonic AST API should be written and documented. Some tweaks to the current token class objects, and creating auto-generated RTD documentation, could achieve this.
- Storage of source line/column ranges: LSP and good rendering reporting of warnings/errors, requires access to the source line and column ranges of each parsed token.
- Asynchronous parsing: LSP requires documents to be parsed asynchronously. Currently, mistletoe contains a number of global state objects, which make parsing inherently not thread-safe. The simple solution to this is to store these items as
threading.local
objects. A related but slightly more complete solution is to introduce the idea of a 'scoped session', similar to that used by sqlalchemy for database access: Contextual/Thread-local Sessions - Improve extensibility of block tokens: A Markdown parser is inherently a Finite State-Machine + Memory (a.k.a Push-down Automata (PDA)), with parsing tokens as states (for a good example of a python state-machine see pytransitions). The problem with extensibility, is that inherently states are interdependent; when introducing a new state/token you must provide logic to all the other tokens, w.r.t to when to transition to this new token. Currently, MyST Parser sub-classes nearly all the Mistletoe block tokens to implement the extensions it requires, but it would be ideal if there was a more systematic approach for this.
- Improve extensibility for span tokens: Mistletoe does allow for span token extensions to be added, at least in a simple way. However, as with block tokens above, there is often an interconnectivity to them, especially when considering nested span tokens. As of 7cc2c92, MyST-Parser now overrides some of Mistetoe's core logic to achieve correct parsing of Math tokens, but if possible this should be made more general.
- Improve rendering logic: Currently, there is no concept of recursive walk-throughs or 'visitor' patterns in the Misteltoe
BaseRenderer
, which is a better method for rendering tree like structures (as used by docutils/panflute). Also, the current token instantiating (within context managers) needs improvement (see miyuchina/mistletoe#56).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for mistletoe_ebp-0.10.0a2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | afec90ed2166863f3cc35497a0f36275cb8ae6809e51db9a8c211c67a4998b49 |
|
MD5 | 748fba9f1bc9385ada725df0be246e92 |
|
BLAKE2b-256 | 3146d3fa1ebb066f84026ea7e856fe880110b3bd94949d373322363a779c2e0a |