A small and simple HTML table parser not requiring any external dependency.
Project description
html-table-parser-python3.5+
This module consists of just one small class. Its purpose is to parse HTML tables without help of external modules. Everything I use is part of python 3. Instead of installing this module, you can just copy the class located in parse.py into your own code.
How to use
Probably best shown by example using pyenv for convenience:
pyenv local
python ./example_of_usage.py
The parser returns a nested lists of tables containing rows containing cells as strings. Tags in cells are stripped and the tags text content is joined. The console output for parsing all tables on the twitter home page looks like this:
>>>
[[['', 'Anmelden']],
[['Land', 'Code', 'Für Kunden von'],
['Vereinigte Staaten', '40404', '(beliebig)'],
['Kanada', '21212', '(beliebig)'],
...
['3424486444', 'Vodafone'],
['Zeige SMS-Kurzwahlen für andere Länder']]]
CLI
There is also a command line interface which you can use directly to generate a CSV:
./html_table_converter -u http://web.archive.org/web/20180524092138/http://metal-train.de/index.php/fahrplan.html -o metaltrain
If you need help for the supported parameters append -h
:
./html_table_converter -h
Tests
A set of rudimentary tests have been implemented using Python's built-in unittest framework. Tests must be ran on Python 3.X. To run, use the following command:
python -m unittest
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for html_table_parser_python3-0.3.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8e33c436c1011501b8d3ef95114587580e8e923ff59ee04903646654608369f4 |
|
MD5 | 2f5db9717eb5c4073dbdf279e6e4d9ab |
|
BLAKE2b-256 | 8594a6760c2f347bf8b19acf330d37fce9ba00572948d672edc74ecc388244a8 |
Hashes for html_table_parser_python3-0.3.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | f909c25fdb592d0ab9a535405712ab8aa4a0c0e7402583bd7650dd94b7fddb4e |
|
MD5 | c97162a9740336ddd4a05ac073a6dd40 |
|
BLAKE2b-256 | 92b63d57fae8305d0f8ec2f8b378262eb31162acebd66d41dce295601a8c4076 |