A customized pandoc filters set that can be used to generate a useful pandoc python filter.
Project description
pandoc-filter
This project is a customized pandoc filters set that can be used to generate a useful pandoc python filter. Recently, it only supports some features of markdown-to-markdown
(normalizing markdown files) and markdown-to-html
(generating web pages). But more features will be added later as my scenario and the user's feedback.
Backgrounds
I'm used to taking notes with markdown and clean markdown syntax. Then, I usually post these notes on my site as web pages. So, I need to convert markdown to html. There were many tools to achieve the converting and I chose pandoc at last due to its powerful features.
But sometimes, I need many more features when converting from md
to html
, where pandoc filters are needed. I have written some pandoc python filters with some advanced features by panflute and many other tools. And now, I think it's time to gather these filters into a combined toolset as this project.
Please see Main Features for the concrete features.
Please see Usage for the recommend usage.
Main Features
Mainly for converting markdown to html, I divided this process into two processes, i.e., markdown-to-markdown
(normalizing markdown files) and markdown-to-html
(generating web pages).
markdown-to-markdown
supports:- math filter
- Adapt AMS rule for math formula. (Auto numbering markdown formulations within
\begin{equation} \end{equation}
, as in Typora) - Allow multiple tags, but only take the first one.
- Allow multiple labels, but only take the first one.
- Adapt AMS rule for math formula. (Auto numbering markdown formulations within
- figure filter
- Manager local pictures, sync them to
Aliyun OSS
, and replace the originalsrc
with the new one.
- Manager local pictures, sync them to
- footnote filter
- Normalize footnotes. (Remove
\n
in the footnote content.)
- Normalize footnotes. (Remove
- internal link filter
- Normalize internal links with a very special rule. (Decode the URL-encoded links)
- math filter
markdown-to-html
- anchor filter
- Normalize anchors with a very special rule. (replace its
id
with itshash
as Notion does, and numbering it with-x
)
- Normalize anchors with a very special rule. (replace its
- internal link recorder and filter
- Globally manage and normalize internal links. (Make it match the behavior of
anchor filter
)
- Globally manage and normalize internal links. (Make it match the behavior of
- link like filter
- Process a string that may be like a link. (Make it a link)
- anchor filter
Note: The division of filters is just my opinion on code organization, it doesn't mean they can only be used for a certain class. As long as the user understands the effect of the filter, all filters are not restricted to use in any scenario. So, it is recommended to read a filter's source codes directly when using it.
Installation
pip install -i https://pypi.org/simple/ --pre -U pandoc-filter
Usage
Here are 2 basic examples
Convert markdown to markdown (Normalization)
Normalize internal link
-
Inputs(
./input.md
): refer totest_md2md_internal_link.md
.## 带空格 和`特殊字符` [链接](http://typora.io) 用于%%%%¥¥¥¥跳转测试 空格 ### aAa-b cC `Dd`, a#%&[xxx](yyy) Zzz [xx] (yy) [带空格 和`特殊字符` [链接](http://typora.io) 用于%%%%¥¥¥¥跳转测试 空格](#####带空格 和`特殊字符` [链接](http://typora.io) 用于%%%%¥¥¥¥跳转测试 空格) [aAa-b cC `Dd`, a#%&[xxx](yyy) Zzz [xx] (yy)](#####aAa-b cC `Dd`, a#%&[xxx](yyy) Zzz [xx] (yy)) <a href="###带空格 和`特殊字符` [链接](http://typora.io) 用于%%%%¥¥¥¥跳转测试 空格">带空格 和`特殊字符`...</a> <a href="#aAa-b cC `Dd`, a#%&[xxx](yyy) Zzz [xx] (yy)">aAa-b...</a>
-
Coding:
import pathlib import logging import panflute as pf from pandoc_filter.utils import TracingLogger from pandoc_filter.md2md_filters import internal_link_filter pathlib.Path("./logs").mkdir(parents=True, exist_ok=True) tracing_logger = TracingLogger(name="./logs/pf_log",level=logging.INFO) file_path = pathlib.Path("./input.md") with open(file_path,'r',encoding='utf-8') as f: markdown_content = f.read() output_path = pathlib.Path("./output.md") doc = pf.convert_text(markdown_content,input_format='markdown',output_format='panflute',standalone=True) doc = pf.run_filter(action=internal_link_filter,doc=doc,tracing_logger=tracing_logger) with open(output_path, "w", encoding="utf-8") as f: f.write(pf.convert_text(doc,input_format='panflute',output_format='gfm',standalone=True))
-
Outputs(
./output.md
): refer totest_md2md_internal_link.md
.## 带空格 和`特殊字符` [链接](http://typora.io) 用于%%%%¥¥¥¥跳转测试 空格 ### aAa-b cC `Dd`, a#%&[xxx](yyy) Zzz \[xx\] (yy) [带空格 和`特殊字符` \[链接\](http://typora.io) 用于%%%%¥¥¥¥跳转测试 空格](#带空格 和`特殊字符` [链接](http://typora.io) 用于%%%%¥¥¥¥跳转测试 空格) [aAa-b cC `Dd`, a#%&\[xxx\](yyy) Zzz \[xx\] (yy)](#aAa-b cC `Dd`, a#%&[xxx](yyy) Zzz \[xx\] (yy)) <a href="#带空格 和`特殊字符` [链接](http://typora.io) 用于%%%%¥¥¥¥跳转测试 空格">带空格 和`特殊字符`…</a> <a href="#aAa-b cC `Dd`, a#%&[xxx](yyy) Zzz \[xx\] (yy)">aAa-b…</a>
Normalize footnotes
-
Inputs(
./input.md
): refer totest_md2md_footnote.md
.which1.[^1] which2.[^2] which3.[^3] [^1]: Deep Learning with Intel® AVX-512 and Intel® DL Boost https://www.intel.cn/content/www/cn/zh/developer/articles/guide/deep-learning-with-avx512-and-dl-boost.html www.intel.cn [^2]: Deep Learning with Intel® AVX-512222 and Intel® DL Boost https://www.intel.cn/content/www/cn/zh/developer/articles/guide/deep-learning-with-avx512-and-dl-boost.html www.intel.cn [^3]: Deep Learning with Intel® AVX-512 and Intel® DL Boost https://www.intel.cn/content/www/cn/zh/developer/articles/guide/deep-learning-with-avx512-and-dl-boost.html www.intel.cn
-
Coding:
import pathlib import logging import panflute as pf from pandoc_filter.utils import TracingLogger from pandoc_filter.md2md_filters import footnote_filter pathlib.Path("./logs").mkdir(parents=True, exist_ok=True) tracing_logger = TracingLogger(name="./logs/pf_log",level=logging.INFO) file_path = pathlib.Path("./input.md") with open(file_path,'r',encoding='utf-8') as f: markdown_content = f.read() output_path = pathlib.Path("./output.md") doc = pf.convert_text(markdown_content,input_format='markdown',output_format='panflute',standalone=True) doc = pf.run_filter(action=footnote_filter,doc=doc,tracing_logger=tracing_logger) with open(output_path, "w", encoding="utf-8") as f: f.write(pf.convert_text(doc,input_format='panflute',output_format='gfm',standalone=True))
-
Outputs(
./output.md
): refer totest_md2md_footnote.md
.which1.[^1] which2.[^2] which3.[^3] [^1]: Deep Learning with Intel® AVX-512 and Intel® DL Boost https://www.intel.cn/content/www/cn/zh/developer/articles/guide/deep-learning-with-avx512-and-dl-boost.html www.intel.cn [^2]: Deep Learning with Intel® AVX-512222 and Intel® DL Boost https://www.intel.cn/content/www/cn/zh/developer/articles/guide/deep-learning-with-avx512-and-dl-boost.html www.intel.cn [^3]: Deep Learning with Intel® AVX-512 and Intel® DL Boost https://www.intel.cn/content/www/cn/zh/developer/articles/guide/deep-learning-with-avx512-and-dl-boost.html www.intel.cn
Adapt AMS rule for math formula
-
Inputs(
./input.md
): refer totest_md2md_math.md
.$$ \begin{equation}\tag{abcd}\label{lalla} e=mc^2 \end{equation} $$ $$ \begin{equation} e=mc^2 \end{equation} $$ $$ e=mc^2 $$ $$ \begin{equation}\label{eq1} e=mc^2 \end{equation} $$
-
Coding:
import pathlib import logging import panflute as pf from pandoc_filter.utils import TracingLogger from pandoc_filter.md2md_filters import math_filter pathlib.Path("./logs").mkdir(parents=True, exist_ok=True) tracing_logger = TracingLogger(name="./logs/pf_log",level=logging.INFO) file_path = pathlib.Path("./input.md") with open(file_path,'r',encoding='utf-8') as f: markdown_content = f.read() output_path = pathlib.Path("./output.md") doc = pf.convert_text(markdown_content,input_format='markdown',output_format='panflute',standalone=True) doc = pf.run_filter(action=math_filter,doc=doc,tracing_logger=tracing_logger) with open(output_path, "w", encoding="utf-8") as f: f.write(pf.convert_text(doc,input_format='panflute',output_format='gfm',standalone=True))
-
Outputs(
./output.md
): refer totest_md2md_math.md
.$$ \begin{equation}\label{lalla}\tag{abcd} e=mc^2 \end{equation} $$ $$ \begin{equation}\tag{1} e=mc^2 \end{equation} $$ $$ e=mc^2 $$ $$ \begin{equation}\label{eq1}\tag{2} e=mc^2 \end{equation} $$
Sync local images to Aliyun OSS
-
Prerequisites:
-
Consider the bucket domain is
raw.little-train.com
-
Consider the environment variables have been given:
-
OSS_ENDPOINT_NAME = "oss-cn-taiwan.aliyuncs.com"
-
OSS_BUCKET_NAME = "test"
-
OSS_ACCESS_KEY_ID = "123456781234567812345678"
-
OSS_ACCESS_KEY_SECRET = "123456123456123456123456123456"
-
-
Consider images located in
./input.assets/
-
-
Inputs(
./input.md
): refer totest_md2md_figure.md
.![自定义头像](./input.assets/自定义头像.png) ![Level-of-concepts](./input.assets/Level-of-concepts.svg)
-
Coding:
import pathlib import logging import panflute as pf from pandoc_filter.utils import TracingLogger from pandoc_filter.utils import OssHelper from pandoc_filter.md2md_filters import figure_filter pathlib.Path("./logs").mkdir(parents=True, exist_ok=True) tracing_logger = TracingLogger(name="./logs/pf_log",level=logging.INFO) file_path = pathlib.Path("./input.md") with open(file_path,'r',encoding='utf-8') as f: markdown_content = f.read() output_path = pathlib.Path("./output.md") import os oss_endpoint_name = os.environ['OSS_ENDPOINT_NAME'] oss_bucket_name = os.environ['OSS_BUCKET_NAME'] assert os.environ['OSS_ACCESS_KEY_ID'] assert os.environ['OSS_ACCESS_KEY_SECRET'] oss_helper = OssHelper(oss_endpoint_name,oss_bucket_name) doc = pf.convert_text(markdown_content,input_format='markdown',output_format='panflute',standalone=True) doc.doc_path = file_path doc = pf.run_filter(action=figure_filter,doc=doc,tracing_logger=tracing_logger,oss_helper=oss_helper) with open(output_path, "w", encoding="utf-8") as f: f.write(pf.convert_text(doc,input_format='panflute',output_format='gfm',standalone=True))
-
Outputs(
./output.md
): refer totest_md2md_figure.md
.<figure> <img src="https://raw.little-train.com/111199e36daf608352089b12cec935fc5cbda5e3dcba395026d0b8751a013d1d.png" alt="自定义头像" /> <figcaption aria-hidden="true">自定义头像</figcaption> </figure> <figure> <img src="https://raw.little-train.com/20061af9ba13d3b92969dc615b9ba91abb4c32c695f532a70a6159d7b806241c.svg" alt="Level-of-concepts" /> <figcaption aria-hidden="true">Level-of-concepts</figcaption> </figure>
Convert markdown to html
Normalize anchors, internal links and link-like strings
-
Inputs(
./input.md
):Refer to
test_md2html_anchor_and_link.md
. -
Coding:
import pathlib import logging import functools import panflute as pf from pandoc_filter.utils import TracingLogger from pandoc_filter.md2html_filters import anchor_filter,internal_link_recorder,link_like_filter from pandoc_filter.md2md_filters import internal_link_filter pathlib.Path(f"./logs").mkdir(parents=True, exist_ok=True) tracing_logger = TracingLogger(name="./logs/pf_log",level=logging.INFO) def finalize(doc:pf.Doc,**kwargs): tracing_logger = kwargs['tracing_logger'] id_set = set() for k,v in doc.anchor_count.items(): for i in range(1,v+1): id_set.add(f"{k}-{i}") for patched_elem,url,guessed_url_with_num in doc.internal_link_record: if f"{url}-1" in id_set: patched_elem.sub(f"{url}-1",tracing_logger) elif guessed_url_with_num in id_set: # None is not in id_set patched_elem.sub(f"{guessed_url_with_num}",tracing_logger) else: tracing_logger.logger.warning(f"{patched_elem.elem}") tracing_logger.logger.warning(f"The internal link `{url}` is invalid and will not be changed because no target header is found.") file_path = pathlib.Path("./input.md") with open(file_path,'r',encoding='utf-8') as f: markdown_content = f.read() output_path = pathlib.Path("./output.html") doc = pf.convert_text(markdown_content,input_format='markdown',output_format='panflute',standalone=True) doc = pf.run_filter(action=internal_link_filter,doc=doc,tracing_logger=tracing_logger) _finalize = functools.partial(finalize,tracing_logger=tracing_logger) doc = pf.run_filters(actions=[anchor_filter,internal_link_recorder,link_like_filter],doc=doc,finalize=_finalize,tracing_logger=tracing_logger) with open(output_path, "w", encoding="utf-8") as f: f.write(pf.convert_text(doc,input_format='panflute',output_format='html',standalone=True))
-
Outputs(
./output.html
):Refer to
test_md2html_anchor_and_link.html
.
Contribution
Contributions are welcome. But recently, the introduction and documentation are not complete. So, please wait for a while.
A simple way to contribute is to open an issue to report bugs or request new features.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pandoc_filter-0.1.0b1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 18b5a3c78a358143ac926351847eff04474b002ba91f30d37d8b4fe2ba423715 |
|
MD5 | 231ca5151fc4a0cbfc3632ae5e311c25 |
|
BLAKE2b-256 | ef4ad76c7502cbffda1a0516a12bb8e1b211e535d9320b373bbf6bcc5a8ba111 |