Skip to main content

A customized pandoc filters set that can be used to generate a useful pandoc python filter.

Project description

PyPI - Python Version PyPI - Version DOI GitHub License

GitHub Actions Workflow Status GitHub Actions Workflow Status GitHub Actions Workflow Status codecov

pandoc-filter

This project supports some useful and highly customized pandoc python filters that based on panflute. They can meet some special requests when using pandoc to

  • convert files from markdown to gfm
  • convert files from markdown to html
  • convert other formats (In the future)

Please see Main Features for the concrete features.

Please see Samples for the recommend usage.

Backgrounds

I'm used to taking notes with markdown and clean markdown syntax. Then, I usually post these notes on my site as web pages. So, I need to convert markdown to html. There were many tools to achieve the converting and I chose pandoc at last due to its powerful features.

But sometimes, I need many more features when converting from markdown to html, where pandoc filters are needed. I have written some pandoc python filters with some advanced features by panflute and many other tools. And now, I think it's time to gather these filters into a combined toolset as this project.

Installation

pip install -i https://pypi.org/simple/ -U pandoc-filter

Main Features

There are 2 support ways:

  • command-line-mode: use non-parametric filters in command-lines with pandoc.
  • python-mode: use run_filters_pyio function in python.

For an example, md2md_enhance_equation_filter in enhance_equation.py is a filter function as panflute-user-guide . And its registered command-line script is md2md-enhance-equation-filter.

  • So, after the installation, one can use it in command-line-mode:

    pandoc ./input.md -o ./output.md -f markdown -t gfm -s --filter md2md-enhance-equation-filter
    
  • Or, use in python mode

    import pandoc_filter
    file_path = pathlib.Path("./input.md")
    output_path = pathlib.Path("./output.md")
    pandoc_filter.run_filters_pyio(file_path,output_path,'markdown','gfm',[pandoc_filter.md2md_enhance_equation_filter])
    

Runtime status can be recorded. In python mode, any filter function will return a proposed panflute Doc. Some filter functions will add an instance attribute dict runtime_dict to the returned Doc, as a record for runtime status, which may be very useful for advanced users. For an example, md2md_enhance_equation_filter, will add an instance attribute dict runtime_dict to the returned Doc, which may contain a mapping {'math':True} if there is any math element in the Doc.

All filters with corresponding registered command-line scripts, the specific features, and the recorded runtime status are recorded in the following table:

[!NOTE]

Since some filters need additional arguments, not all filter functions support command-line-mode, even though they all support python-mode indeed.

All filters support cascaded invoking.

Filter Functions Command Line Additional Arguments Features Runtime status (doc.runtime_dict)
md2md_enhance_equation_filter md2md-enhance-equation-filter - Enhance math equations. Specifically, this filter will: Adapt AMS rule for math formula. Auto numbering markdown formulations within \begin{equation} \end{equation}, as in Typora. Allow multiple tags, but only take the first one. Allow multiple labels, but only take the first one. {'math':< bool >,'equations_count':<some_number>}
md2md_norm_footnote_filter md2md-norm-footnote-filter - Normalize the footnotes. Remove unnecessary \n in the footnote content. -
md2md_norm_internal_link_filter md2md-norm-internal-link-filter - Normalize internal links' URLs. Decode the URL if it is URL-encoded. -
md2md_upload_figure_to_aliyun_filter - doc_path Auto upload local pictures to Aliyun OSS. Replace the original src with the new one. The following environment variables should be given in advance: $Env:OSS_ENDPOINT_NAME, $Env:OSS_BUCKET_NAME, $Env:OSS_ACCESS_KEY_ID , and $Env:OSS_ACCESS_KEY_SECRET. The doc_path should be given in advance. {'doc_path':<doc_path>,'oss_helper':<Oss_Helper>}
md2html_centralize_figure_filter md2html-centralize-figure-filter - ==Deprecated== -
md2html_enhance_link_like_filter md2html-enhance-link-like-filter - Enhance the link-like string to a link element. -
md2html_hash_anchor_and_internal_link_filter md2html-hash-anchor-and-internal-link-filter - Hash both the anchor's id and the internal-link's url simultaneously. {'anchor_count':<anchor_count_dict>,'internal_link_record':<internal_link_record_list>}

Samples

Here are 2 basic examples

Convert markdown to markdown (Normalization)

Normalize internal link

  • Inputs(./input.md): refer to test_md2md_internal_link.md.

    ## 带空格 和`特殊字符` [链接](http://typora.io) 用于%%%%¥¥¥¥跳转测试        空格
    
    ### aAa-b cC `Dd`, a#%&[xxx](yyy) Zzz [xx]  (yy)
    
    [带空格 和`特殊字符` [链接](http://typora.io) 用于%%%%¥¥¥¥跳转测试        空格](#####带空格 和`特殊字符` [链接](http://typora.io) 用于%%%%¥¥¥¥跳转测试        空格)
    
    [aAa-b cC `Dd`, a#%&[xxx](yyy) Zzz [xx]  (yy)](#####aAa-b cC `Dd`, a#%&[xxx](yyy) Zzz [xx]  (yy))
    
    <a href="###带空格 和`特殊字符` [链接](http://typora.io) 用于%%%%¥¥¥¥跳转测试        空格">带空格 和`特殊字符`...</a>
    
    <a href="#aAa-b cC `Dd`, a#%&[xxx](yyy) Zzz [xx]  (yy)">aAa-b...</a>
    
  • Coding:

    pandoc ./input.md -o ./output.md -f markdown -t gfm -s --filter md2md-norm-internal-link-filter
    
  • Outputs(./output.md): refer to test_md2md_internal_link.md.

    ## 带空格 和`特殊字符` [链接](http://typora.io) 用于%%%%¥¥¥¥跳转测试 空格
    
    ### aAa-b cC `Dd`, a#%&[xxx](yyy) Zzz \[xx\] (yy)
    
    [带空格 和`特殊字符` \[链接\](http://typora.io) 用于%%%%¥¥¥¥跳转测试
    空格](#带空格 和`特殊字符` [链接](http://typora.io) 用于%%%%¥¥¥¥跳转测试 空格)
    
    [aAa-b cC `Dd`, a#%&\[xxx\](yyy) Zzz \[xx\]
    (yy)](#aAa-b cC `Dd`, a#%&[xxx](yyy) Zzz \[xx\] (yy))
    
    <a href="#带空格 和`特殊字符` [链接](http://typora.io) 用于%%%%¥¥¥¥跳转测试 空格">带空格
    和`特殊字符`…</a>
    
    <a href="#aAa-b cC `Dd`, a#%&[xxx](yyy) Zzz \[xx\] (yy)">aAa-b…</a>
    

Normalize footnotes

  • Inputs(./input.md): refer to test_md2md_footnote.md.

    which1.[^1]
    
    which2.[^2]
    
    which3.[^3]
    
    [^1]: Deep Learning with Intel® AVX-512 and Intel® DL Boost
    https://www.intel.cn/content/www/cn/zh/developer/articles/guide/deep-learning-with-avx512-and-dl-boost.html
    www.intel.cn
    
    [^2]: Deep Learning with Intel® AVX-512222 and Intel® DL Boost https://www.intel.cn/content/www/cn/zh/developer/articles/guide/deep-learning-with-avx512-and-dl-boost.html www.intel.cn
    
    [^3]: Deep Learning with Intel®     AVX-512 and Intel® DL Boost https://www.intel.cn/content/www/cn/zh/developer/articles/guide/deep-learning-with-avx512-and-dl-boost.html www.intel.cn
    
  • Coding:

    pandoc ./input.md -o ./output.md -f markdown -t gfm -s --filter md2md-norm-footnote-filter
    
  • Outputs(./output.md): refer to test_md2md_footnote.md.

    which1.[^1]
    
    which2.[^2]
    
    which3.[^3]
    
    [^1]: Deep Learning with Intel® AVX-512 and Intel® DL Boost https://www.intel.cn/content/www/cn/zh/developer/articles/guide/deep-learning-with-avx512-and-dl-boost.html www.intel.cn
    
    [^2]: Deep Learning with Intel® AVX-512222 and Intel® DL Boost https://www.intel.cn/content/www/cn/zh/developer/articles/guide/deep-learning-with-avx512-and-dl-boost.html www.intel.cn
    
    [^3]: Deep Learning with Intel® AVX-512 and Intel® DL Boost https://www.intel.cn/content/www/cn/zh/developer/articles/guide/deep-learning-with-avx512-and-dl-boost.html www.intel.cn
    

Adapt AMS rule for math formula

  • Inputs(./input.md): refer to test_md2md_math.md.

    $$
    \begin{equation}\tag{abcd}\label{lalla}
    e=mc^2
    \end{equation}
    $$
    
    $$
    \begin{equation}
    e=mc^2
    \end{equation}
    $$
    
    $$
    e=mc^2
    $$
    
    $$
    \begin{equation}\label{eq1}
    e=mc^2
    \end{equation}
    $$
    
  • Coding:

    pandoc ./input.md -o ./output.md -f markdown -t gfm -s --filter md2md-enhance-equation-filter
    
  • Outputs(./output.md): refer to test_md2md_math.md.

    $$
    \begin{equation}\label{lalla}\tag{abcd}
    e=mc^2
    \end{equation}
    $$
    
    $$
    \begin{equation}\tag{1}
    e=mc^2
    \end{equation}
    $$
    
    $$
    e=mc^2
    $$
    
    $$
    \begin{equation}\label{eq1}\tag{2}
    e=mc^2
    \end{equation}
    $$
    

Sync local images to Aliyun OSS

  • Prerequisites:

    • Consider the bucket domain is raw.little-train.com

    • Consider the environment variables have been given:

      • OSS_ENDPOINT_NAME = "oss-cn-taiwan.aliyuncs.com"

      • OSS_BUCKET_NAME = "test"

      • OSS_ACCESS_KEY_ID = "123456781234567812345678"

      • OSS_ACCESS_KEY_SECRET = "123456123456123456123456123456"

    • Consider images located in ./input.assets/

  • Inputs(./input.md): refer to test_md2md_figure.md.

    ![自定义头像](./input.assets/自定义头像.png)
    
    ![Level-of-concepts](./input.assets/Level-of-concepts.svg)
    
  • Coding:

    import pandoc_filter
    
    file_path = _check_file_path("./input.md")
    output_path = pathlib.Path(f"./output.md")
    answer_path = pathlib.Path(f"./resources/outputs/{file_path.name}")
    pandoc_filter.run_filters_pyio(
        file_path,output_path,'markdown','gfm',
        [pandoc_filter.md2md_upload_figure_to_aliyun_filter],doc_path=file_path)
    
  • Outputs(./output.md): refer to test_md2md_figure.md.

    <figure>
    <img
    src="https://raw.little-train.com/111199e36daf608352089b12cec935fc5cbda5e3dcba395026d0b8751a013d1d.png"
    alt="自定义头像" />
    <figcaption aria-hidden="true">自定义头像</figcaption>
    </figure>
    
    <figure>
    <img
    src="https://raw.little-train.com/20061af9ba13d3b92969dc615b9ba91abb4c32c695f532a70a6159d7b806241c.svg"
    alt="Level-of-concepts" />
    <figcaption aria-hidden="true">Level-of-concepts</figcaption>
    </figure>
    

Convert markdown to html

Normalize anchors, internal links and link-like strings

  • Inputs(./input.md):

    Refer to test_md2html_anchor_and_link.md.

  • Coding:

    pandoc ./input.md -o ./output.html -f markdown -t html -s --filter md2md-norm-internal-link-filter --filtermd2html-hash-anchor-and-internal-link-filter --filter md2html-enhance-link-like-filter
    
  • Outputs(./output.html):

    Refer to test_md2html_anchor_and_link.html.

Contribution

Contributions are welcome. But recently, the introduction and documentation are not complete. So, please wait for a while.

A simple way to contribute is to open an issue to report bugs or request new features.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandoc-filter-0.2.3.tar.gz (33.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pandoc_filter-0.2.3-py3-none-any.whl (34.6 kB view details)

Uploaded Python 3

File details

Details for the file pandoc-filter-0.2.3.tar.gz.

File metadata

  • Download URL: pandoc-filter-0.2.3.tar.gz
  • Upload date:
  • Size: 33.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.1

File hashes

Hashes for pandoc-filter-0.2.3.tar.gz
Algorithm Hash digest
SHA256 6c3c83abd02930396b559a827654fee70cedfb995014cfdf58a515805ce99e44
MD5 f067062c4371f88641e63dbac16d8a11
BLAKE2b-256 2db55ddb170dde55abd6c767658b8b15d286d1a173039ad4baa9f8320e1f72a0

See more details on using hashes here.

File details

Details for the file pandoc_filter-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: pandoc_filter-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 34.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.12.1

File hashes

Hashes for pandoc_filter-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 99af6399615235a9456994816a670ef424ac5f389745ac3aca0ad05c7e0a4d74
MD5 e9c04dd1835b986ec22de3888985fbbe
BLAKE2b-256 970f341e9bbaf2436c6895a288d8ccbbe25fcd51a3ef4bd086cec315aa72ccab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page