Skip to main content

Finds and extracts tables from Wikipedia.

Project description

Overview

Finds tables in the raw HTML of a Wikipedia page and converts them to a clean list-of-dictionaries, suitable for easy processing in Python.

Note, this is a convenience tool to help one-off processing of specific Wikipedia pages, where downloading an entire Wikipedia snapshot would be impractical. It’s inefficient and will not scale well for bulk use. If you need to do bulk processing of a large number of pages in Wikipedia, please download and process a Wikipedia snapshot .

Installation

Install using pip via:

sudo pip install wptablefinder

Usage

>>> from wptablefinder import Table
>>> table = Table.from_url('https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population')[0]
>>> print table.headers
[u'Rank', u'Country (or dependent territory)', u'Population', u'Date', u'% of world population', u'Source']
>>> for row in table:
...  print row
{u'% of world population': u'18.9%', u'Rank': u'1', u'Source': u'Official population clock', u'Country (or dependent territory)': u'China [ Note 2 ]', u'Date': datetime.datetime(2015, 8, 15, 0, 0), u'Population': u'1,371,520,000'}
...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wptablefinder-0.0.3.tar.gz (5.1 kB view details)

Uploaded Source

File details

Details for the file wptablefinder-0.0.3.tar.gz.

File metadata

File hashes

Hashes for wptablefinder-0.0.3.tar.gz
Algorithm Hash digest
SHA256 724e8995b13ac090432b56240bb162b1d1dc4d0719e29c3d49be46df23031b99
MD5 e1ef93c1699a0fbae26cf821ce673838
BLAKE2b-256 8988b9cd86e6b74c9ce5aa1cd508a3c83320c92cadc2781e666cda99ddbf3c53

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page