Extract data from an HTML table and store results to a csv file.
Project description
Simple script for downloading html tables as csv.
Installation
pip install -U table2csv
Usage
table2csv http://en.wikipedia.org/wiki/List_of_Super_Bowl_champions > dump.txt
Features
- accepts a URL
- Identifies all the tables
- Merges tables that share same structure (e.g. same column headers get merged)
- Figures out which table is the biggest
- extracts text
- extracts links
TODO
- add the ability to specify which table on the page you would like to download (not just the biggest one)
- add support for columns that do not use proper <th> tags [DONE] tags for headers (i.e. imperfect html tables)]
- detect the data types found within each column
- add support for tables with hierarchical indices on the rows and/or columns
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
table2csv-0.1.3.tar.gz
(4.2 kB
view hashes)