Skip to main content

html数据解析封装

Project description

表格解析工具

提供html表格解析功能,可以根据表格行查找元素内容,返回表格行列表

安装

pip install html-parser
or
pip install git+https://github.com/lvyunze/html_parse.git

单个表格行数据获取

import urllib.request
from table_parser import HtmlTableParser


def url_get_contents(url):
    """ Opens a website and read its binary contents (HTTP Response Body) """
    req = urllib.request.Request(url=url)
    f = urllib.request.urlopen(req)
    return f.read()


def main():
    url = 'http://www.stats.gov.cn/tjsj/zxfb/202105/t20210517_1817510.html'
    xhtml = url_get_contents(url).decode('utf-8')
    p = HtmlTableParser()
    p.feed(xhtml)
    print(p.seach_item("采矿"))
    # ['采矿业', '…', '3.2', '…', '8.4']
    # 多行数据获取
    item_list = ["采矿业", "制造业", "产品销售率"]
    item_data = [p.seach_item(data) for data in item_list]
    """
    [['采矿业', '…', '3.2', '…', '8.4'], ['制造业', '…', '10.3', '…', '22.2'], 
    ['产品销售率(%)', '98.3', '0.4 ( 百分点 )', '97.9', '0.9 ( 百分点 )']]
    """

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

html_parse-0.0.3-py3-none-any.whl (1.9 kB view details)

Uploaded Python 3

File details

Details for the file html_parse-0.0.3-py3-none-any.whl.

File metadata

  • Download URL: html_parse-0.0.3-py3-none-any.whl
  • Upload date:
  • Size: 1.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.7.3

File hashes

Hashes for html_parse-0.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 065756da20107fe06f4ceea3409420eac50aba83bb1a2ac94939d18c8f5bf078
MD5 f10e7354e0ba3623092e637a22b7fdf9
BLAKE2b-256 3048bba4ec3ac9a707aa7d98d4a619a27bed3fc86a2052bcc0ac5d97c04a329a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page