A Python library to find html in strings
Project description
markdown-html-finder
A Python library to locate HTML spans in markdown text. This library is written in Rust with bindings for Python.
why?
For a separate project I needed to locate HTML comments in markdown documents. Sadly the markdown parsers I found for Python didn't provide span information for nodes.
While it wouldn't be too hard to add some features to existing Python markdown parsers, I thought it would be interesting to see how Rust can be used from Python. The excellent pulldown-cmark crate provides span information for HTML elements, so that's what we use here.
pyo3 and maturin do the hard work of providing bindings to Python and building wheels to distribute on PyPi.
install
# poetry
poetry add markdown-html-finder
# pip
pip install markdown-html-finder
usage
from markdown_html_finder import find_html_positions
DOCUMENT = """\
# example markdown document
Amet nobis et numquam qui. Animi perferendis quia qui ut aut expedita. Ut eveniet quia quaerat.
<!-- hello world -->
Quisquam et et velit soluta quia.
"""
# NOTE: find_html_positions raises a ValueError if passed carriage returns `\r`
stripped_document = DOCUMENT.replace('\r', '')
html_positions = find_html_positions(stripped_document)
assert html_positions == [(125, 145)]
dev
# install build dependencies
poetry install
# build for python development
poetry run maturin development
building wheels
We need a wheel per version and platform. To support Python 3.7, 3.8, 3.9 we need to have 3.7, 3.8, 3.9 installed on macOS and Linux. For macOS we can use pyenv. For Linux we can use a Docker container.
macos
- install pyenv
- install each python version we want to support via
pyenv install
. Usepyenv install --list
to see the available options. - add your new Python installs globally via
pyenv global 3.8.7 3.9.0
- configure your $PATH with the .pyenv python versions. use
pyenv shims
to find the binary paths and add them, likePATH=/Users/chris/.pyenv/shims/:$PATH
- verify your Python versions are accessible via
python3.9
and verify maturin can find your python versions via./.venv/bin/maturin list-python
- build the macOS wheels via
./.venv/bin/maturin build
- upload wheels to pypi via
./.venv/bin/twine upload --skip-existing target/wheels/*
linux
- use the docker container to build all the Linux Python wheels via
docker run --rm -v $(pwd):/io cdignam/markdown-html-finder-builder:0.3.0 build --release
- upload wheels to pypi via
./.venv/bin/twine upload --skip-existing target/wheels/*
markdown-html-finder-builder
This container extends the quay.io/pypa/manylinux2014_x86_64 docker image and is based on the konstin2/maturin image, with Python2 support removed.
This image is built and uploaded manually to Docker Hub when necessary.
# build and publish a new version
VERSION='0.2.0'
docker build -f build.Dockerfile . --tag cdignam/markdown-html-finder-builder:$VERSION
docker push cdignam/markdown-html-finder-builder:$VERSION
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Hashes for markdown_html_finder-0.2.5.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12188883422f6342a78f8acf60e2c753e09946d00696b1f9ba86d8f532192f1c |
|
MD5 | 184c8ee4e817665d41d72c3ecad82973 |
|
BLAKE2b-256 | 9e73d82a6056090900c05cd31f9a532742a63c365345beb836c9b719d270129b |
Hashes for markdown_html_finder-0.2.5-cp311-cp311-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0a2637763a23d4bad98ad40e1d6563e28cea19faa9508dfaa69c9b8cf0ddda78 |
|
MD5 | 9290c78ea91ab1522dacdf2e120d4d4c |
|
BLAKE2b-256 | 524605443089368a6739a9a56b8ecb2e21a129c7b2596a12325732326532e7d2 |
Hashes for markdown_html_finder-0.2.5-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6ac8de530f5209de88a056287c3836266db02f2ad27e5f666e65dcb624852371 |
|
MD5 | 08b0cab31f391216aee35600e099e362 |
|
BLAKE2b-256 | 345119408d31f835486299d48e7f3f13097a2d3ef5cec52dd3fc231b69545439 |
Hashes for markdown_html_finder-0.2.5-cp310-cp310-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c7b893f56650d0013ed1061cc883b4f67185b16113d02a3b96a688d47db34016 |
|
MD5 | 8f5768e611e9ce447efb0b70e93925b2 |
|
BLAKE2b-256 | d22e214cd7959802bdf91be9efdc95818453b1454ac0025737794c0b676fec7c |
Hashes for markdown_html_finder-0.2.5-cp39-cp39-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 86afbb199c139e7eef27e9ad134c0781867b54124461721db261bdb3bb82f6ce |
|
MD5 | 4e9d309d9b85a2e4f9bf18de57088dc1 |
|
BLAKE2b-256 | 8866a61fb3c61b3cb1f15451fbf583a4651aaa5164846fc34d8cf4d08831bc5e |
Hashes for markdown_html_finder-0.2.5-cp39-cp39-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fbecdd05ac013b27a1ed130cdbc9d0ff1fc367a9f242697f4c3897854440b877 |
|
MD5 | 6e427ee66c72d3a90ce8eddef503a5ae |
|
BLAKE2b-256 | dcadb2e165ac819614be3cd3ada316462a5e1628b112c3ae616473f4b2324eff |
Hashes for markdown_html_finder-0.2.5-cp38-cp38-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2ea98e46c67747190620afeb5ccfffef3b8090df281465917d8d4a025a2c39cc |
|
MD5 | 1928ac99bc5e40527bbd81fe30c79c44 |
|
BLAKE2b-256 | 4b96e12602d01057126eff0550309dff00293fc810111019141c5e9b6cc3a10c |
Hashes for markdown_html_finder-0.2.5-cp38-cp38-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dfeca5ae00b1e5e19c1c4b93e63976c8279c98aca74d525f0cdf85c5501c3c03 |
|
MD5 | 63a19cf3ae41d9b2edbeb9bbc84ff6db |
|
BLAKE2b-256 | adec46557380115ed67cfe194af7b7b6884719f8b927c133171cb5fa8156f5e0 |
Hashes for markdown_html_finder-0.2.5-cp37-cp37m-manylinux_2_5_x86_64.manylinux1_x86_64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a24e093b21dc699c638283a58598cbbb2489e029bcc8d90fddc608ddd8e78f9a |
|
MD5 | 127644f1d79770906b351c73be603e1a |
|
BLAKE2b-256 | 6c69b18e9743323836afc49751e38257dc2f01cb764bdd2d6bf6e21f0c92e1b6 |
Hashes for markdown_html_finder-0.2.5-cp37-cp37m-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5588c0dcba92c8425d5e7d17a57fe6db22a94d78313094b701b6b3a8ce1d2c23 |
|
MD5 | 4115d911c5008e113426754fe642426f |
|
BLAKE2b-256 | f13df5865fa11736a62dd93db7c81965953dc792eabbfa3d81b3e9db36a7dd94 |