Skip to main content

A small and simple HTML table parser not requiring any external dependency.

Project description

html-table-parser-python3.5+

This module consists of just one small class. Its purpose is to parse HTML tables without help of external modules. Everything I use is part of python 3. Instead of installing this module, you can just copy the class located in parse.py into your own code.

How to use

Probably best shown by example using pyenv for convenience:

pyenv local
python ./example_of_usage.py

The parser returns a nested lists of tables containing rows containing cells as strings. Tags in cells are stripped and the tags text content is joined. The console output for parsing all tables on the twitter home page looks like this:

>>> 
[[['', 'Anmelden']],
 [['Land', 'Code', 'Für Kunden von'],
  ['Vereinigte Staaten', '40404', '(beliebig)'],
  ['Kanada', '21212', '(beliebig)'],
  ...
  ['3424486444', 'Vodafone'],
  ['Zeige SMS-Kurzwahlen für andere Länder']]]

CLI

There is also a command line interface which you can use directly to generate a CSV:

./html_table_converter -u http://web.archive.org/web/20180524092138/http://metal-train.de/index.php/fahrplan.html -o metaltrain

If you need help for the supported parameters append -h:

./html_table_converter -h

Tests

A set of rudimentary tests have been implemented using Python's built-in unittest framework. Tests must be ran on Python 3.X. To run, use the following command:

python -m unittest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

html_table_parser_python3-0.3.1.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

html_table_parser_python3-0.3.1-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file html_table_parser_python3-0.3.1.tar.gz.

File metadata

File hashes

Hashes for html_table_parser_python3-0.3.1.tar.gz
Algorithm Hash digest
SHA256 8e33c436c1011501b8d3ef95114587580e8e923ff59ee04903646654608369f4
MD5 2f5db9717eb5c4073dbdf279e6e4d9ab
BLAKE2b-256 8594a6760c2f347bf8b19acf330d37fce9ba00572948d672edc74ecc388244a8

See more details on using hashes here.

File details

Details for the file html_table_parser_python3-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for html_table_parser_python3-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f909c25fdb592d0ab9a535405712ab8aa4a0c0e7402583bd7650dd94b7fddb4e
MD5 c97162a9740336ddd4a05ac073a6dd40
BLAKE2b-256 92b63d57fae8305d0f8ec2f8b378262eb31162acebd66d41dce295601a8c4076

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page