html数据解析封装
Project description
表格解析工具
提供html表格解析功能,可以根据表格行查找元素内容,返回表格行列表
安装
pip install html-parser
or
pip install git+https://github.com/lvyunze/html_parse.git
单个表格行数据获取
import urllib.request
from table_parser import HtmlTableParser
def url_get_contents(url):
""" Opens a website and read its binary contents (HTTP Response Body) """
req = urllib.request.Request(url=url)
f = urllib.request.urlopen(req)
return f.read()
def main():
url = 'http://www.stats.gov.cn/tjsj/zxfb/202105/t20210517_1817510.html'
xhtml = url_get_contents(url).decode('utf-8')
p = HtmlTableParser()
p.feed(xhtml)
print(p.seach_item("采矿"))
# ['采矿业', '…', '3.2', '…', '8.4']
# 多行数据获取
item_list = ["采矿业", "制造业", "产品销售率"]
item_data = [p.seach_item(data) for data in item_list]
"""
[['采矿业', '…', '3.2', '…', '8.4'], ['制造业', '…', '10.3', '…', '22.2'],
['产品销售率(%)', '98.3', '0.4 ( 百分点 )', '97.9', '0.9 ( 百分点 )']]
"""
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
No source distribution files available for this release.See tutorial on generating distribution archives.
Built Distribution
File details
Details for the file html_parse-0.0.3-py3-none-any.whl
.
File metadata
- Download URL: html_parse-0.0.3-py3-none-any.whl
- Upload date:
- Size: 1.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 065756da20107fe06f4ceea3409420eac50aba83bb1a2ac94939d18c8f5bf078 |
|
MD5 | f10e7354e0ba3623092e637a22b7fdf9 |
|
BLAKE2b-256 | 3048bba4ec3ac9a707aa7d98d4a619a27bed3fc86a2052bcc0ac5d97c04a329a |