Easily get Wikipedia article info and Wikidata via MediaWiki APIs.
Project description
Wikipedia tools (for Humans)
Easily get Wikipedia article info and Wikidata via MediaWiki APIs.
get an HTML or plain text “extract” (lead or summary)
get a representative image (or thumbnail)
get an Infobox as a python dictionary
get selected Wikidata properties
get a Wikidata item by title
get random info
These are purpose-built methods intended to make it as easy as possible to get common data from MediaWiki instances, expose more Wikidata, and extend or augment MediaWiki API functions. We say “(for Humans)” because that is a goal in the same spirit as @kennethreitz requests. Contributions are welcome!
Install
$ pip install wptools
Usage
>>> import wptools
>>> a = wptools.wptools('a')
a (en)
{
lang: en
title: a
}
The methods below may yield the attributes noted for a given instance.
get(self) Tries all get_s below, filling available attributes get_parse(self) MediaWiki:API action=parse infobox - Infobox data as python dictionary links - inter-wiki links (iwlinks) pageid - Wikipedia database ID parse - parse {query, request, response} data parsetree - parse tree wikibase - wikibase_item (id) or wikidata URL wikitext - raw wikitext get_query(self) MediaWiki:API action=query extext - plain text extract (from extract) extract - HTML extract, see Extension:TextExtract images - {image, pageimages, thumbnail} pageid - Wikipedia database ID pageimage - see Extension:PageImages query - query {query, request, response} data random - a random article title with every request! url - the canonical wiki URL urlraw - raw wikitext URL get_wikidata(self) Wikidata:API action=wbgetentities image - Wikidata Property:P18, like DBPedia foaf:depiction description - Wikidata description label - Wikidata label
API request data can be found in the following attributes:
g_parse: <dict(3)> {info, query, response} g_query: <dict(3)> {info, query, response} g_wikidata: <dict(3)> {info, query, response}
Examples
Get a random article:
>>> import wptools
>>> wptools.wptools()
en.wikipedia.org (action=random) None
Borg_Island (en)
{
lang: en
pageid: 29332964
title: Borg_Island
}
Get a random Japanese article:
>>> wptools.wptools(lang='jp')
jp.wikipedia.org (action=random) None
土御門殿 (jp)
{
lang: jp
pageid: 526200
title: 土御門殿
}
Get a text extract:
>>> a = wptools.wptools('aardvark')
>>> a.get_query()
en.wikipedia.org (action=query) aardvark
>>> print t.extext
The **aardvark** (/ˈɑːrd.vɑːrk/ _**ARD**-vark_; _Orycteropus afer_) is a
medium-sized, burrowing, nocturnal mammal native to Africa. It is the only
living species of the order Tubulidentata, although other prehistoric species
and genera of Tubulidentata are known. Unlike other insectivores, it has a
long pig-like snout, which is used to sniff out food. It roams over most of
the southern two-thirds of the African continent, avoiding areas that are
mainly rocky. A nocturnal feeder, it subsists on ants and termites, which it
will dig out of their hills using its sharp claws and powerful legs. It also
digs to create burrows in which to live and rear its young. It receives a
"least concern" rating from the IUCN, although its numbers seem to be
decreasing.
Get an infobox and some wikidata:
>>> n = wptools.wptools('Napoleon', lang='fr')
>>> n.get_parse().get_wikidata()
fr.wikipedia.org (action=parse) Napoleon
www.wikidata.org (action=wikidata) Q517
Napoléon_Ier (fr)
{
Description: chef d'État français
Image: https://upload.wikimedia.org/wikipedia/commons/b/b5/Jacques-Louis_David_-_The_Emperor_Napoleon_in_His_Study_at_the_Tuileries_-_Google_Art_Project_2.jpg
Label: Napoléon Ier
...
infobox: <dict(64)> {charte, conjoint, couronnement 1, date de déc...
}
Get wikidata directly using wikibase item:
>>> q = wptools.wptools(wikibase='Q42')
>>> q.get_wikidata()
www.wikidata.org (action=wikidata) Q42
https://www.wikidata.org/wiki/Q42 (en)
{
Description: English writer and humorist
Image: https://upload.wikimedia.org/wikipedia/commons/c/c0/Douglas_adams_portrait_cropped.jpg
Label: Douglas Adams
g_wikidata: <dict(3)> {info, query, response}
lang: en
wikibase: https://www.wikidata.org/wiki/Q42
}
Get everything available all at once!
>>> w = wptools.wptools('Shakespeare')
>>> w.get()
en.wikipedia.org (action=query) Shakespeare
en.wikipedia.org (action=parse) William_Shakespeare
www.wikidata.org (action=wikidata) Q692
William_Shakespeare (en)
{
Description: English playwright and poet
Image: https://upload.wikimedia.org/wikipedia/commons/2/2a/Hw-shakespeare.png
Label: William Shakespeare
extext: <str(2572)> **William Shakespeare** (/ˈʃeɪkspɪər/; 26...
extract: <str(2985)> <p><b>William Shakespeare</b> (<span><span>/<...
g_parse: <dict(3)> {info, query, response}
g_query: <dict(3)> {info, query, response}
g_wikidata: <dict(3)> {info, query, response}
images: <dict(3)> {Image, pageimage, thumbnail}
infobox: <dict(14)> {birth_date, birth_place, caption, children, d...
lang: en
links: <list(8)>
pageid: 32897
pageimage: https://upload.wikimedia.org/wikipedia/commons/a/a2/Shakespeare.jpg
parsetree: <str(185585)> <root><template><title>About</title><part...
random: MobiasBanca
thumbnail: https://upload.wikimedia.org/wikipedia/commons/thumb/a/a2/Shakespeare.jpg/39px-Shakespeare.jpg
title: William_Shakespeare
url: https://en.wikipedia.org/wiki/William_Shakespeare
urlraw: https://en.wikipedia.org/wiki/William_Shakespeare?action=raw
wikibase: https://www.wikidata.org/wiki/Q692
wikitext: <str(100349)> {{About|the poet and playwright|other pers...
}
Mix languages!
>>> t = wptools.wptools(title='Tolkien', lang='zh')
>>> t.get()
zh.wikipedia.org (action=query) Tolkien
zh.wikipedia.org (action=parse) J·R·R·托爾金
www.wikidata.org (action=wikidata) Q892
J·R·R·托爾金 (zh)
{
Description: 英国作家
Image: https://upload.wikimedia.org/wikipedia/commons/b/b4/Tolkien_1916.jpg
Label: J·R·R·托尔金
extext: <str(1704)> **約翰·羅納德·魯埃爾·托爾金**,...
extract: <str(2067)> <p><b>約翰·羅納德·魯埃爾·托爾金...
...
infobox: <dict(16)> {birth_name, birthdate, birthplace, caption, d...
lang: zh
...
wikibase: https://www.wikidata.org/wiki/Q892
}
@siznax
Release History
0.0.5 (2016-08-23)
Major re-write.
Exposed core.WPTools as entrypoint.
Added get_parse(), get_query(), and get_wikidata().
Added get(self) to query all APIs.
Added show(self) method to display fetched attrs.
Show instance attributes after each request.
Ignore requests if attrs will not be updated.
Enabled language support across APIs.
Gets random article if no arguments.
CLI scripts and tests disabled pending update.
0.0.4 (2016-08-16)
Added wptools.lead.
Added safe_exit() to CLI scripts.
Removed a fair amount of unused code.
0.0.3 (2016-08-12)
Implemented wptools.image choices.
Added wptools.api to simplify python i/f and CLI scripts.
Merged @0x9900’s CLI dist fixes.
A little more test coverage.
0.0.2 (2016-08-02)
Starting to look like a legit module.
0.0.1 (2015)
Still better than alternatives for working with articles.
0.0.0 (2012)
It seems to work!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for wptools-0.0.5-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 861b071255df24c3a26f474148eebd95fc0a6f2763b6c64ba9ccad2c7ee8943e |
|
MD5 | cae87d22614e1dcf3ef91eb0d77c9bff |
|
BLAKE2b-256 | 7a23a605db2934b604fa19722a0e4ac4738c69fd72eb9bbbf242a2ee4f905dcc |