Skip to main content

html数据解析封装

Project description

表格解析工具

提供html表格解析功能,可以根据表格行查找元素内容,返回表格行列表

安装

pip install html-parser
or
pip install git+https://github.com/lvyunze/html_parse.git

单个表格行数据获取

import urllib.request
from table_parser import HtmlTableParser


def url_get_contents(url):
    """ Opens a website and read its binary contents (HTTP Response Body) """
    req = urllib.request.Request(url=url)
    f = urllib.request.urlopen(req)
    return f.read()


def main():
    url = 'http://www.stats.gov.cn/tjsj/zxfb/202105/t20210517_1817510.html'
    xhtml = url_get_contents(url).decode('utf-8')
    p = HtmlTableParser()
    p.feed(xhtml)
    print(p.seach_item("采矿"))
    # ['采矿业', '…', '3.2', '…', '8.4']
    # 多行数据获取
    item_list = ["采矿业", "制造业", "产品销售率"]
    item_data = [p.seach_item(data) for data in item_list]
    """
    [['采矿业', '…', '3.2', '…', '8.4'], ['制造业', '…', '10.3', '…', '22.2'], 
    ['产品销售率(%)', '98.3', '0.4 ( 百分点 )', '97.9', '0.9 ( 百分点 )']]
    """

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

html_parse-0.0.3-py3-none-any.whl (1.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page