Skip to main content

A small and simple HTML table parser not requiring any external dependency.

Project description

html-table-parser-python3

This module consists of just one small class. Its purpose is to parse HTML tables without help of external modules. Everything used is part of python 3.

Installation

pip install html-table-parser-python3

How to use

Example Usage:

import urllib.request
from pprint import pprint
from html_table_parser_python3 import HTMLTableParser


def url_get_contents(url):
    """ Opens a website and read its binary contents (HTTP Response Body) """
    req = urllib.request.Request(url=url)
    f = urllib.request.urlopen(req)
    return f.read()


def main():
    url = 'http://www.twitter.com'
    xhtml = url_get_contents(url).decode('utf-8')

    p = HTMLTableParser()
    p.feed(xhtml)
    pprint(p.tables)


if __name__ == '__main__':
    main()

The parser returns a nested lists of tables containing rows containing cells as strings. Tags in cells are stripped and the tags text content is joined. The console output for parsing all tables on the twitter home page looks like this:

>>> 
[[['', 'Anmelden']],
 [['Land', 'Code', 'Für Kunden von'],
  ['Vereinigte Staaten', '40404', '(beliebig)'],
  ['Kanada', '21212', '(beliebig)'],
  ...
  ['3424486444', 'Vodafone'],
  ['Zeige SMS-Kurzwahlen für andere Länder']]]

Credit

All Credit goes to Josua Schmid (schmijos). This is all his work, I just uploaded it to PyPi. Original repository can be found at:

https://github.com/schmijos/html-table-parser-python3

License

GNU GPL v3

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

html-table-parser-python3-0.1.3.tar.gz (3.1 kB view details)

Uploaded Source

Built Distribution

html_table_parser_python3-0.1.3-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file html-table-parser-python3-0.1.3.tar.gz.

File metadata

  • Download URL: html-table-parser-python3-0.1.3.tar.gz
  • Upload date:
  • Size: 3.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.5

File hashes

Hashes for html-table-parser-python3-0.1.3.tar.gz
Algorithm Hash digest
SHA256 621b7beec3d168dba87e9da064096fde2a7d62f92b7de3449f0c0727a88ee417
MD5 d0e746e53e5ce1e609905ea3c0351bb5
BLAKE2b-256 1f8fe5990b18b02f2e0febc87dc2bdba49c6b93ce8ae520598f9db69f3ff0ab2

See more details on using hashes here.

File details

Details for the file html_table_parser_python3-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: html_table_parser_python3-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 3.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.18.4 setuptools/40.0.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.5

File hashes

Hashes for html_table_parser_python3-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6528bc49abf399320f92a230cbd351fe663f97b74e52095ef7f905feb49bb684
MD5 f87966ff30fd5d0ce2d315917c3e0ed5
BLAKE2b-256 e367966e5b64c87b275c94dc9e1bb34ca41db926cca534a12d33aa37aa83b3e8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page