Terry toolkit tkitreadability
Project description
一个从html中提取正文的库
from tkitreadability import tkitReadability
html = """
<div class="full-component-wrapper">
<div class="component component--text-image image-position--right" data-id="45290" data-type="c_sideimagetext_ttt">
<div class="text-image--component-wrapper twb-container">
<div class="text-image--content-wrapper row">
<div class="text-image--image col-12 col-xl-7 order-2 order-xl-3">
<div class="field field--name-field-c-image field--type-entity-reference field--label-hidden field__item">
<picture>
<source srcset="/sites/default/files/styles/ttt_image_690/public/2021-07/border-collie.webp?itok=1oyChjVg 2x" media="all and (min-width: 1140px)" type="image/webp">
<source srcset="/sites/default/files/styles/ttt_image_930/public/2021-07/border-collie.webp?itok=QxWrubxE 1x" media="all and (min-width: 992px)" type="image/webp">
<source srcset="/sites/default/files/styles/ttt_image_690/public/2021-07/border-collie.webp?itok=1oyChjVg 1x" media="all and (min-width: 768px)" type="image/webp">
<source srcset="/sites/default/files/styles/ttt_image_510/public/2021-07/border-collie.webp?itok=jhilnwqZ 1x" media="all and (min-width: 576px)" type="image/webp">
<source srcset="/sites/default/files/styles/ttt_image_510/public/2021-07/border-collie.webp?itok=jhilnwqZ 1x" type="image/webp">
<source srcset="/sites/default/files/styles/ttt_image_690/public/2021-07/border-collie.jpg?itok=1oyChjVg 2x" media="all and (min-width: 1140px)" type="image/jpeg">
<source srcset="/sites/default/files/styles/ttt_image_930/public/2021-07/border-collie.jpg?itok=QxWrubxE 1x" media="all and (min-width: 992px)" type="image/jpeg">
<source srcset="/sites/default/files/styles/ttt_image_690/public/2021-07/border-collie.jpg?itok=1oyChjVg 1x" media="all and (min-width: 768px)" type="image/jpeg">
<source srcset="/sites/default/files/styles/ttt_image_510/public/2021-07/border-collie.jpg?itok=jhilnwqZ 1x" media="all and (min-width: 576px)" type="image/jpeg">
<source srcset="/sites/default/files/styles/ttt_image_510/public/2021-07/border-collie.jpg?itok=jhilnwqZ 1x" type="image/jpeg">
<img src="/sites/default/files/styles/ttt_image_510/public/2021-07/border-collie.jpg?itok=jhilnwqZ" alt="Border Collie" typeof="foaf:Image" loading="lazy">
</picture>
</div>
</div>
<img src="/sites/default/files/styles/ttt_image_510/public/2021-07/border-collie.jpg?itok=jhilnwqZ" alt="Border Collie" typeof="foaf:Image" loading="lazy">
<div class="text-image--text-wrapper col-12 col-xl-5 order-3 order-xl-2">
<div class="text-image--text">
<div class="clearfix text-formatted field field--name-field-c-sideimagetext-summary field--type-text-long field--label-hidden field__item"><h2>Pet Card</h2>
<ul>
<li><strong>Living Considerations:</strong> Not hypoallergenic, suitable for apartment living, good with older children</li>
<li><strong>Size:</strong> Medium</li>
<li><strong>Height:</strong> Males - 48 to 56 centimetres at the withers, Females - 45 to 53 centimetres at the withers</li>
<li><strong>Weight:</strong> Males -13 to 20 kilograms, Females - 12 to 19 kilograms</li>
<li><strong>Coat:</strong> Medium/Long</li>
<li><strong>Energy:</strong> High</li>
<li><strong>Colour:</strong> All colours or colour combinations</li>
<li><strong>Activities:</strong> Agility, Conformation, Herding, Obedience, Rally Obedience, Tracking</li>
<li><strong>Indoor/Outdoor:</strong> Both</li>
</ul>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
"""
Readability = tkitReadability()
content = Readability.html2text(html)
print(content)
# 输出为html
print(Readability.markdown2Html(content))
更新
version:'0.0.0.4'
加入的markdown的转换为html
文档查看 https://docs.terrychan.org/tkitreadability/
快速上传操作
可以自动查找依赖,然后上传
sh upload.sh
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
tkitreadability-0.0.0.5.3.tar.gz
(10.0 kB
view details)
Built Distribution
File details
Details for the file tkitreadability-0.0.0.5.3.tar.gz
.
File metadata
- Download URL: tkitreadability-0.0.0.5.3.tar.gz
- Upload date:
- Size: 10.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fd6054f63eb1d89a05ed662f778d386a2115e624a6603a4d2776708b4b151e21 |
|
MD5 | 575b26760b07214b9f758681e18b2c3b |
|
BLAKE2b-256 | 5b9110092029365fc555acce42221f7e023a015de8c6b1ab2a732200b4f33902 |
File details
Details for the file tkitreadability-0.0.0.5.3-py2.py3-none-any.whl
.
File metadata
- Download URL: tkitreadability-0.0.0.5.3-py2.py3-none-any.whl
- Upload date:
- Size: 9.8 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac5a3f1d52d7dc28e24752055c07ad6d4305a08578508fd49d55a64de333748b |
|
MD5 | 64430dff81c33c63039a812c822d083e |
|
BLAKE2b-256 | 1a373aec2a21014bda20fb09efa337b3302a3cb68fdcd981edfcbe4bacb777ed |