Skip to main content

Finds and extracts tables from Wikipedia.

Project description

Overview

Finds tables in the raw HTML of a Wikipedia page and converts them to a clean list-of-dictionaries, suitable for easy processing in Python.

Note, this is a convenience tool to help one-off processing of specific Wikipedia pages, where downloading an entire Wikipedia snapshot would be impractical. It’s inefficient and will not scale well for bulk use. If you need to do bulk processing of a large number of pages in Wikipedia, please download and process a Wikipedia snapshot .

Installation

Install using pip via:

sudo pip install wptablefinder

Usage

>>> from wptablefinder import Table
>>> table = Table.from_url('https://en.wikipedia.org/wiki/List_of_countries_and_dependencies_by_population')[0]
>>> print table.headers
[u'Rank', u'Country (or dependent territory)', u'Population', u'Date', u'% of world population', u'Source']
>>> for row in table:
...  print row
{u'% of world population': u'18.9%', u'Rank': u'1', u'Source': u'Official population clock', u'Country (or dependent territory)': u'China [ Note 2 ]', u'Date': datetime.datetime(2015, 8, 15, 0, 0), u'Population': u'1,371,520,000'}
...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wptablefinder-0.0.3.tar.gz (5.1 kB view details)

Uploaded Source

File details

Details for the file wptablefinder-0.0.3.tar.gz.

File metadata

  • Download URL: wptablefinder-0.0.3.tar.gz
  • Upload date:
  • Size: 5.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for wptablefinder-0.0.3.tar.gz
Algorithm Hash digest
SHA256 724e8995b13ac090432b56240bb162b1d1dc4d0719e29c3d49be46df23031b99
MD5 e1ef93c1699a0fbae26cf821ce673838
BLAKE2b-256 8988b9cd86e6b74c9ce5aa1cd508a3c83320c92cadc2781e666cda99ddbf3c53

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page