This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (pypi.python.org).
Help us improve Python packaging - Donate today!

Finds and extracts tables from Wikipedia.

Project Description

Overview

Finds tables in the raw HTML of a Wikipedia page and converts them to a clean list-of-dictionaries, suitable for easy processing in Python.

Note, this is a convenience tool to help one-off processing of specific Wikipedia pages, where downloading an entire Wikipedia snapshot would be impractical. It’s inefficient and will not scale well for bulk use. If you need to do bulk processing of a large number of pages in Wikipedia, please download and process a Wikipedia snapshot .

Installation

Install using pip via:

sudo pip install wptablefinder

Usage

>>> from wptablefinder import Table
>>> table = Table.from_url('https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population')[0]
>>> print table.headers
[u'Rank', u'Country (or dependent territory)', u'Population', u'Date', u'% of world population', u'Source']
>>> for row in table:
...  print row
{u'% of world population': u'18.9%', u'Rank': u'1', u'Source': u'Official population clock', u'Country (or dependent territory)': u'China [ Note 2 ]', u'Date': datetime.datetime(2015, 8, 15, 0, 0), u'Population': u'1,371,520,000'}
...
Release History

Release History

This version
History Node

0.0.3

History Node

0.0.2

History Node

0.0.1

Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
wptablefinder-0.0.3.tar.gz (5.1 kB) Copy SHA256 Checksum SHA256 Source Mar 30, 2017

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting