A customized pandoc filters set that can be used to generate a useful pandoc python filter.

These details have not been verified by PyPI

Project links

Project description

pandoc-filter

This project is a customized pandoc filters set that can be used to generate a useful pandoc python filter. Recently, it only supports some features of markdown-to-markdown (normalizing markdown files) and markdown-to-html (generating web pages). But more features will be added later as my scenario and the user's feedback.

Backgrounds

I'm used to taking notes with markdown and clean markdown syntax. Then, I usually post these notes on my site as web pages. So, I need to convert markdown to html. There were many tools to achieve the converting and I chose pandoc at last due to its powerful features.

But sometimes, I need many more features when converting from md to html, where pandoc filters are needed. I have written some pandoc python filters with some advanced features by panflute and many other tools. And now, I think it's time to gather these filters into a combined toolset as this project.

Please see Main Features for the concrete features.

Please see Usage for the recommend usage.

Main Features

Mainly for converting markdown to html, I divided this process into two processes, i.e., markdown-to-markdown (normalizing markdown files) and markdown-to-html (generating web pages).

markdown-to-markdown supports:
- math filter
  - Adapt AMS rule for math formula. (Auto numbering markdown formulations within \begin{equation} \end{equation}, as in Typora)
  - Allow multiple tags, but only take the first one.
  - Allow multiple labels, but only take the first one.
- figure filter
  - Manager local pictures, sync them to Aliyun OSS, and replace the original src with the new one.
- footnote filter
  - Normalize footnotes. (Remove \n in the footnote content.)
- internal link filter
  - Normalize internal links with a very special rule. (Decode the URL-encoded links)
markdown-to-html
- anchor filter
  - Normalize anchors with a very special rule. (replace its id with its hash as Notion does, and numbering it with -x)
- internal link recorder and filter
  - Globally manage and normalize internal links. (Make it match the behavior of anchor filter)
- link like filter
  - Process a string that may be like a link. (Make it a link)

Note: The division of filters is just my opinion on code organization, it doesn't mean they can only be used for a certain class. As long as the user understands the effect of the filter, all filters are not restricted to use in any scenario. So, it is recommended to read a filter's source codes directly when using it.

Installation

pip install -i https://pypi.org/simple/ --pre -U pandoc-filter

Usage

Here are 2 basic examples

Convert markdown to markdown (Normalization)

Normalize internal link

Inputs(./input.md): refer to test_md2md_internal_link.md.

## 带空格 和`特殊字符` [链接](http://typora.io) 用于%%%%￥￥￥￥跳转测试        空格

### aAa-b cC `Dd`, a#%&[xxx](yyy) Zzz [xx]  (yy)

[带空格 和`特殊字符` [链接](http://typora.io) 用于%%%%￥￥￥￥跳转测试        空格](#####带空格 和`特殊字符` [链接](http://typora.io) 用于%%%%￥￥￥￥跳转测试        空格)

[aAa-b cC `Dd`, a#%&[xxx](yyy) Zzz [xx]  (yy)](#####aAa-b cC `Dd`, a#%&[xxx](yyy) Zzz [xx]  (yy))

<a href="###带空格 和`特殊字符` [链接](http://typora.io) 用于%%%%￥￥￥￥跳转测试        空格">带空格 和`特殊字符`...</a>

<a href="#aAa-b cC `Dd`, a#%&[xxx](yyy) Zzz [xx]  (yy)">aAa-b...</a>

Coding:

import pathlib
import logging
import panflute as pf

from pandoc_filter.utils import TracingLogger
from pandoc_filter.md2md_filters import internal_link_filter

pathlib.Path("./logs").mkdir(parents=True, exist_ok=True)
tracing_logger = TracingLogger(name="./logs/pf_log",level=logging.INFO)

file_path = pathlib.Path("./input.md")
with open(file_path,'r',encoding='utf-8') as f:
    markdown_content = f.read()
output_path = pathlib.Path("./output.md")

doc = pf.convert_text(markdown_content,input_format='markdown',output_format='panflute',standalone=True)
doc = pf.run_filter(action=internal_link_filter,doc=doc,tracing_logger=tracing_logger)

with open(output_path, "w", encoding="utf-8") as f:
    f.write(pf.convert_text(doc,input_format='panflute',output_format='gfm',standalone=True))

Outputs(./output.md): refer to test_md2md_internal_link.md.

## 带空格 和`特殊字符` [链接](http://typora.io) 用于%%%%￥￥￥￥跳转测试 空格

### aAa-b cC `Dd`, a#%&[xxx](yyy) Zzz \[xx\] (yy)

[带空格 和`特殊字符` \[链接\](http://typora.io) 用于%%%%￥￥￥￥跳转测试
空格](#带空格 和`特殊字符` [链接](http://typora.io) 用于%%%%￥￥￥￥跳转测试 空格)

[aAa-b cC `Dd`, a#%&\[xxx\](yyy) Zzz \[xx\]
(yy)](#aAa-b cC `Dd`, a#%&[xxx](yyy) Zzz \[xx\] (yy))

<a href="#带空格 和`特殊字符` [链接](http://typora.io) 用于%%%%￥￥￥￥跳转测试 空格">带空格
和`特殊字符`…</a>

<a href="#aAa-b cC `Dd`, a#%&[xxx](yyy) Zzz \[xx\] (yy)">aAa-b…</a>

Normalize footnotes

Inputs(./input.md): refer to test_md2md_footnote.md.

which1.[^1]

which2.[^2]

which3.[^3]

[^1]: Deep Learning with Intel® AVX-512 and Intel® DL Boost
https://www.intel.cn/content/www/cn/zh/developer/articles/guide/deep-learning-with-avx512-and-dl-boost.html
www.intel.cn

[^2]: Deep Learning with Intel® AVX-512222 and Intel® DL Boost https://www.intel.cn/content/www/cn/zh/developer/articles/guide/deep-learning-with-avx512-and-dl-boost.html www.intel.cn

[^3]: Deep Learning with Intel®     AVX-512 and Intel® DL Boost https://www.intel.cn/content/www/cn/zh/developer/articles/guide/deep-learning-with-avx512-and-dl-boost.html www.intel.cn

Coding:

import pathlib
import logging
import panflute as pf

from pandoc_filter.utils import TracingLogger
from pandoc_filter.md2md_filters import footnote_filter

pathlib.Path("./logs").mkdir(parents=True, exist_ok=True)
tracing_logger = TracingLogger(name="./logs/pf_log",level=logging.INFO)

file_path = pathlib.Path("./input.md")
with open(file_path,'r',encoding='utf-8') as f:
    markdown_content = f.read()
output_path = pathlib.Path("./output.md")

doc = pf.convert_text(markdown_content,input_format='markdown',output_format='panflute',standalone=True)
doc = pf.run_filter(action=footnote_filter,doc=doc,tracing_logger=tracing_logger)

with open(output_path, "w", encoding="utf-8") as f:
    f.write(pf.convert_text(doc,input_format='panflute',output_format='gfm',standalone=True))

Outputs(./output.md): refer to test_md2md_footnote.md.

which1.[^1]

which2.[^2]

which3.[^3]

[^1]: Deep Learning with Intel® AVX-512 and Intel® DL Boost https://www.intel.cn/content/www/cn/zh/developer/articles/guide/deep-learning-with-avx512-and-dl-boost.html www.intel.cn

[^2]: Deep Learning with Intel® AVX-512222 and Intel® DL Boost https://www.intel.cn/content/www/cn/zh/developer/articles/guide/deep-learning-with-avx512-and-dl-boost.html www.intel.cn

[^3]: Deep Learning with Intel® AVX-512 and Intel® DL Boost https://www.intel.cn/content/www/cn/zh/developer/articles/guide/deep-learning-with-avx512-and-dl-boost.html www.intel.cn

Adapt AMS rule for math formula

Inputs(./input.md): refer to test_md2md_math.md.

$$
\begin{equation}\tag{abcd}\label{lalla}
e=mc^2
\end{equation}
$$

$$
\begin{equation}
e=mc^2
\end{equation}
$$

$$
e=mc^2
$$

$$
\begin{equation}\label{eq1}
e=mc^2
\end{equation}
$$

Coding:

import pathlib
import logging
import panflute as pf

from pandoc_filter.utils import TracingLogger
from pandoc_filter.md2md_filters import math_filter


pathlib.Path("./logs").mkdir(parents=True, exist_ok=True)
tracing_logger = TracingLogger(name="./logs/pf_log",level=logging.INFO)

file_path = pathlib.Path("./input.md")
with open(file_path,'r',encoding='utf-8') as f:
    markdown_content = f.read()
output_path = pathlib.Path("./output.md")

doc = pf.convert_text(markdown_content,input_format='markdown',output_format='panflute',standalone=True)
doc = pf.run_filter(action=math_filter,doc=doc,tracing_logger=tracing_logger)

with open(output_path, "w", encoding="utf-8") as f:
    f.write(pf.convert_text(doc,input_format='panflute',output_format='gfm',standalone=True))

Outputs(./output.md): refer to test_md2md_math.md.

$$
\begin{equation}\label{lalla}\tag{abcd}
e=mc^2
\end{equation}
$$

$$
\begin{equation}\tag{1}
e=mc^2
\end{equation}
$$

$$
e=mc^2
$$

$$
\begin{equation}\label{eq1}\tag{2}
e=mc^2
\end{equation}
$$

Sync local images to `Aliyun OSS`

Prerequisites:
- Consider the bucket domain is raw.little-train.com
- Consider the environment variables have been given:
  - OSS_ENDPOINT_NAME = "oss-cn-taiwan.aliyuncs.com"
  - OSS_BUCKET_NAME = "test"
  - OSS_ACCESS_KEY_ID = "123456781234567812345678"
  - OSS_ACCESS_KEY_SECRET = "123456123456123456123456123456"
- Consider images located in ./input.assets/

Inputs(./input.md): refer to test_md2md_figure.md.

![自定义头像](./input.assets/自定义头像.png)

![Level-of-concepts](./input.assets/Level-of-concepts.svg)

Coding:

import pathlib
import logging
import panflute as pf

from pandoc_filter.utils import TracingLogger
from pandoc_filter.utils import OssHelper
from pandoc_filter.md2md_filters import figure_filter

pathlib.Path("./logs").mkdir(parents=True, exist_ok=True)
tracing_logger = TracingLogger(name="./logs/pf_log",level=logging.INFO)

file_path = pathlib.Path("./input.md")
with open(file_path,'r',encoding='utf-8') as f:
    markdown_content = f.read()
output_path = pathlib.Path("./output.md")

import os
oss_endpoint_name = os.environ['OSS_ENDPOINT_NAME']
oss_bucket_name = os.environ['OSS_BUCKET_NAME']
assert os.environ['OSS_ACCESS_KEY_ID']
assert os.environ['OSS_ACCESS_KEY_SECRET']
oss_helper = OssHelper(oss_endpoint_name,oss_bucket_name)

doc = pf.convert_text(markdown_content,input_format='markdown',output_format='panflute',standalone=True)
doc.doc_path = file_path
doc = pf.run_filter(action=figure_filter,doc=doc,tracing_logger=tracing_logger,oss_helper=oss_helper)

with open(output_path, "w", encoding="utf-8") as f:
    f.write(pf.convert_text(doc,input_format='panflute',output_format='gfm',standalone=True))

Outputs(./output.md): refer to test_md2md_figure.md.

<figure>
<img
src="https://raw.little-train.com/111199e36daf608352089b12cec935fc5cbda5e3dcba395026d0b8751a013d1d.png"
alt="自定义头像" />
<figcaption aria-hidden="true">自定义头像</figcaption>
</figure>

<figure>
<img
src="https://raw.little-train.com/20061af9ba13d3b92969dc615b9ba91abb4c32c695f532a70a6159d7b806241c.svg"
alt="Level-of-concepts" />
<figcaption aria-hidden="true">Level-of-concepts</figcaption>
</figure>

Convert markdown to html

Normalize anchors, internal links and link-like strings

Inputs(./input.md):

Refer to test_md2html_anchor_and_link.md.

Coding:

import pathlib
import logging
import functools
import panflute as pf

from pandoc_filter.utils import TracingLogger
from pandoc_filter.md2html_filters import anchor_filter,internal_link_recorder,link_like_filter
from pandoc_filter.md2md_filters import internal_link_filter

pathlib.Path(f"./logs").mkdir(parents=True, exist_ok=True)
tracing_logger = TracingLogger(name="./logs/pf_log",level=logging.INFO)

def finalize(doc:pf.Doc,**kwargs):
    tracing_logger = kwargs['tracing_logger']
    id_set = set()
    for k,v in doc.anchor_count.items():
        for i in range(1,v+1):
            id_set.add(f"{k}-{i}")
    for patched_elem,url,guessed_url_with_num in doc.internal_link_record:
        if f"{url}-1" in id_set:
            patched_elem.sub(f"{url}-1",tracing_logger)
        elif guessed_url_with_num in id_set: # None is not in id_set
            patched_elem.sub(f"{guessed_url_with_num}",tracing_logger)
        else:
            tracing_logger.logger.warning(f"{patched_elem.elem}")
            tracing_logger.logger.warning(f"The internal link `{url}` is invalid and will not be changed because no target header is found.")

file_path = pathlib.Path("./input.md")
with open(file_path,'r',encoding='utf-8') as f:
    markdown_content = f.read()
output_path = pathlib.Path("./output.html")

doc = pf.convert_text(markdown_content,input_format='markdown',output_format='panflute',standalone=True)
doc = pf.run_filter(action=internal_link_filter,doc=doc,tracing_logger=tracing_logger)

_finalize = functools.partial(finalize,tracing_logger=tracing_logger)
doc = pf.run_filters(actions=[anchor_filter,internal_link_recorder,link_like_filter],doc=doc,finalize=_finalize,tracing_logger=tracing_logger)

with open(output_path, "w", encoding="utf-8") as f:
    f.write(pf.convert_text(doc,input_format='panflute',output_format='html',standalone=True))

Outputs(./output.html):

Refer to test_md2html_anchor_and_link.html.

Contribution

Contributions are welcome. But recently, the introduction and documentation are not complete. So, please wait for a while.

A simple way to contribute is to open an issue to report bugs or request new features.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.16

May 5, 2025

0.2.15

May 24, 2024

0.2.14

Apr 29, 2024

0.2.13

Mar 14, 2024

0.2.12

Feb 28, 2024

0.2.11

Feb 28, 2024

0.2.10

Feb 27, 2024

0.2.9

Feb 25, 2024

0.2.8

Feb 25, 2024

0.2.7

Feb 25, 2024

0.2.6

Feb 25, 2024

0.2.5

Jan 25, 2024

0.2.4

Jan 24, 2024

0.2.3

Jan 24, 2024

0.2.2

Jan 24, 2024

0.2.1

Jan 24, 2024

0.2.0

Jan 24, 2024

0.2.0b1 pre-release

Jan 24, 2024

0.2.0b0 pre-release

Jan 24, 2024

0.1.0b1 pre-release

Jan 19, 2024

0.1.0b0 pre-release

Jan 18, 2024

This version

0.0.1

Jan 18, 2024

0.0.1b2 pre-release

Jan 18, 2024

0.0.1b1 pre-release

Jan 18, 2024

0.0.1b0 pre-release

Jan 18, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandoc-filter-0.0.1.tar.gz (32.9 kB view details)

Uploaded Jan 18, 2024 Source

Built Distribution

pandoc_filter-0.0.1-py3-none-any.whl (31.3 kB view details)

Uploaded Jan 18, 2024 Python 3

File details

Details for the file pandoc-filter-0.0.1.tar.gz.

File metadata

Download URL: pandoc-filter-0.0.1.tar.gz
Upload date: Jan 18, 2024
Size: 32.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.12.0

File hashes

Hashes for pandoc-filter-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`7bff2fbecda6231cc82fe6abccd29e27414d33f061897e44536c3be2ec98c5c0`
MD5	`4176b1b79e1b4a490e4658920a93fc24`
BLAKE2b-256	`6ed5c71dcfefba261f4cc246e9dabf136cf55f66d31e0428d08540e91ceb5371`

See more details on using hashes here.

File details

Details for the file pandoc_filter-0.0.1-py3-none-any.whl.

File metadata

Download URL: pandoc_filter-0.0.1-py3-none-any.whl
Upload date: Jan 18, 2024
Size: 31.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.12.0

File hashes

Hashes for pandoc_filter-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2e0815dc13c8ad165e2e026beb784a6e6882c1b35585838dc5ae5234d6da98c7`
MD5	`eb1660a8df8dce4c7fbdc76726847662`
BLAKE2b-256	`a66ee85d28c916814319a6dfb18260778cc4001ff58b237976dfb540b56def5a`

See more details on using hashes here.

pandoc-filter 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pandoc-filter

Backgrounds

Main Features

Installation

Usage

Convert markdown to markdown (Normalization)

Normalize footnotes

Adapt AMS rule for math formula

Sync local images to `Aliyun OSS`

Convert markdown to html

Normalize anchors, internal links and link-like strings

Contribution

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

pandoc-filter 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pandoc-filter

Backgrounds

Main Features

Installation

Usage

Convert markdown to markdown (Normalization)

Normalize footnotes

Adapt AMS rule for math formula

Sync local images to Aliyun OSS

Convert markdown to html

Normalize anchors, internal links and link-like strings

Contribution

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Sync local images to `Aliyun OSS`