Extract data from an HTML table and store results to a csv file.
Project description
Simple script for downloading html tables as csv.
Installation
pip install -U table2csv
Usage
table2csv http://en.wikipedia.org/wiki/List_of_Super_Bowl_champions > dump.txt
Features
accepts a URL
Identifies all the tables
Merges tables that share same structure (e.g. same column headers get merged)
Figures out which table is the biggest
extracts text
extracts links
TODO
add the ability to specify which table on the page you would like to download (not just the biggest one)
add support for columns that do not use proper <th> tags [DONE] tags for headers (i.e. imperfect html tables)]
detect the data types found within each column
add support for tables with hierarchical indices on the rows and/or columns
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file table2csv-0.1.3.tar.gz
.
File metadata
- Download URL: table2csv-0.1.3.tar.gz
- Upload date:
- Size: 4.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bcd0fa21474cbc3f1cc49c8bdd986729387cb0f90f269dfaac060241da78f4be |
|
MD5 | f4dfd4a5160d4f84e93e7ee56aec31c9 |
|
BLAKE2b-256 | 6d8b526d29891ed2c41bc52426791a8b2d1d18c9549c5f226421206f3e9074fb |