中文的python HTML解析提取器!
Project description
EasyHTML 中文的python HTML解析提取器!
如果程序错了,请报告给534047068@qq.com。
输入首字母快速输入,不用切换输入法。按下Tab键补全。
(我用的pycharm)
# 假设有以下 HTML 代码
from Easy_HTML_2023 import EH
html = '''
<html>
<body>
<div class="container">
<h1>标题</h1>
<p>段落文本</p>
<a href="https://example.com">链接1</a>
<a href="https://example.com">链接2</a>
</div>
<div class="box">
<h2>副标题</h2>
<p>其他文本</p>
<a href="https://example.com">链接3</a>
</div>
</body>
</html>
'''
# 创建美丽汤对象
soup = EH.EH(html)
# 获取所有的 div 标签内容
div内容 = soup.查找标签('div')
for 内容 in div内容:
文本内容 = soup.获得文本(内容)
print(文本内容)
# 获取类名为 "container" 的元素内容
container内容 = soup.查找类名('container')
for 内容 in container内容:
文本内容 = soup.获得文本(内容)
print(文本内容)
# 获取所有的链接及其网址
链接列表 = soup.查找链接()
运行上述代码将输出以下结果:
标题 段落文本 链接1 链接2
标题 段落文本 链接1 链接2
链接1
链接2
https://example.com
https://example.com
使用方法相当于beautifulsoup,不过也略有不同。(可以去看源代码)。
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
EasyHTML2023-1.0.0.1.tar.gz
(2.1 kB
view details)
Built Distribution
File details
Details for the file EasyHTML2023-1.0.0.1.tar.gz
.
File metadata
- Download URL: EasyHTML2023-1.0.0.1.tar.gz
- Upload date:
- Size: 2.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2168abbafba909353bf4fdd718c95828e20f0f7cabe5ebad959e4e93733360d7 |
|
MD5 | fe6199ef72421e59af93a3e95335f640 |
|
BLAKE2b-256 | 273b702b74c998dd659594553cbb345573baca43af6fef707cbd869f1772395f |
File details
Details for the file EasyHTML2023-1.0.0.1-py3-none-any.whl
.
File metadata
- Download URL: EasyHTML2023-1.0.0.1-py3-none-any.whl
- Upload date:
- Size: 2.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 600ca73a5f9cb8173f50f900981e9b50ebde33b250077924749bdb87eae3a69c |
|
MD5 | a73f81dee82c7eace66d5e76668d85e0 |
|
BLAKE2b-256 | db0d554743d4b1f154017b290e3eeff8f3aa101c17ae1661ad759cfce6d83c34 |