Skip to main content

Extract data from an HTML table and store results to a csv file.

Project description

Simple script for downloading html tables as csv.

Installation

pip install -U table2csv

Usage

table2csv http://en.wikipedia.org/wiki/List_of_Super_Bowl_champions > dump.txt

Features

  • accepts a URL

  • Identifies all the tables

  • Merges tables that share same structure (e.g. same column headers get merged)

  • Figures out which table is the biggest

  • extracts text

  • extracts links

TODO

  • add the ability to specify which table on the page you would like to download (not just the biggest one)

  • add support for columns that do not use proper <th> tags [DONE] tags for headers (i.e. imperfect html tables)]

  • detect the data types found within each column

  • add support for tables with hierarchical indices on the rows and/or columns

View on Github

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

table2csv-0.1.3.tar.gz (4.2 kB view details)

Uploaded Source

File details

Details for the file table2csv-0.1.3.tar.gz.

File metadata

  • Download URL: table2csv-0.1.3.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for table2csv-0.1.3.tar.gz
Algorithm Hash digest
SHA256 bcd0fa21474cbc3f1cc49c8bdd986729387cb0f90f269dfaac060241da78f4be
MD5 f4dfd4a5160d4f84e93e7ee56aec31c9
BLAKE2b-256 6d8b526d29891ed2c41bc52426791a8b2d1d18c9549c5f226421206f3e9074fb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page