Python Wrapper for Wikipedia
Project description
Wikipedia API
Wikipedia-API is easy to use Python wrapper for Wikipedias’ API. It supports extracting texts, sections, links, categories, translations, etc from Wikipedia. Documentation provides code snippets for the most common use cases.
Installation
pip3 install wikipedia-api
Usage
Goal of Wikipedia-API is to provide simple and easy to use API for retrieving informations from Wikipedia. Bellow are examples of common use cases.
Importing
import wikipediaapi
How To Get Single Page
Getting single page is straightforward. You have to initialize Wikipedia object and ask for page by its name. It’s parameter language has be one of supported languages.
wiki_wiki = wikipediaapi.Wikipedia('en')
page_py = wiki_wiki.page('Python_(programming_language)')
How To Check If Wiki Page Exists
For checking, whether page exists, you can use function exists.
page_py = wiki_wiki.page('Python_(programming_language)')
print("Page - Exists: %s" % page_py.exists())
# Page - Exists: True
page_missing = wiki_wiki.page('NonExistingPageWithStrangeName')
print("Page - Exists: %s" % page_missing.exists())
# Page - Exists: False
How To Get Page Summary
Class WikipediaPage has property summary, which returns description of Wiki page.
print("Page - Title: %s" % page_py.title)
# Page - Title: Python (programming language)
print("Page - Summary: %s" % page_py.summary[0:60])
# Page - Summary: Python is a widely used high-level programming language for
How To Get Page URL
WikipediaPage has two properties with URL of the page. It is fullurl and canonicalurl.
print(page_py.fullurl)
# https://en.wikipedia.org/wiki/Python_(programming_language)
print(page_py.canonicalurl)
# https://en.wikipedia.org/wiki/Python_(programming_language)
How To Get Full Text
To get full text of Wikipedia page you should use property text which constructs text of the page as concatanation of summary and sections with their titles and texts.
wiki_wiki = wikipediaapi.Wikipedia(
language='en',
extract_format=wikipediaapi.ExtractFormat.WIKI
)
p_wiki = wiki_wiki.page("Test 1")
print(p_wiki.text)
# Summary
# Section 1
# Text of section 1
# Section 1.1
# Text of section 1.1
# ...
wiki_html = wikipediaapi.Wikipedia(
language='en',
extract_format=wikipediaapi.ExtractFormat.HTML
)
p_html = wiki_html.page("Test 1")
print(p_html.text)
# <p>Summary</p>
# <h2>Section 1</h2>
# <p>Text of section 1</p>
# <h3>Section 1.1</h3>
# <p>Text of section 1.1</p>
# ...
How To Get Page Sections
To get all top level sections of page, you have to use property sections. It returns list of WikipediaPageSection, so you have to use recursion to get all subsections.
def print_sections(sections, level=0):
for s in sections:
print("%s: %s - %s" % ("*" * (level + 1), s.title, s.text[0:40]))
print_sections(s.sections, level + 1)
print_sections(page_py.sections)
# *: History - Python was conceived in the late 1980s,
# *: Features and philosophy - Python is a multi-paradigm programming l
# *: Syntax and semantics - Python is meant to be an easily readable
# **: Indentation - Python uses whitespace indentation, rath
# **: Statements and control flow - Python's statements include (among other
# **: Expressions - Some Python expressions are similar to l
How To Get Page In Other Languages
If you want to get other translations of given page, you should use property langlinks. It is map, where key is language code and value is WikipediaPage.
def print_langlinks(page):
langlinks = page.langlinks
for k in sorted(langlinks.keys()):
v = langlinks[k]
print("%s: %s - %s: %s" % (k, v.language, v.title, v.fullurl))
print_langlinks(page_py)
# af: af - Python (programmeertaal): https://af.wikipedia.org/wiki/Python_(programmeertaal)
# als: als - Python (Programmiersprache): https://als.wikipedia.org/wiki/Python_(Programmiersprache)
# an: an - Python: https://an.wikipedia.org/wiki/Python
# ar: ar - بايثون: https://ar.wikipedia.org/wiki/%D8%A8%D8%A7%D9%8A%D8%AB%D9%88%D9%86
# as: as - পাইথন: https://as.wikipedia.org/wiki/%E0%A6%AA%E0%A6%BE%E0%A6%87%E0%A6%A5%E0%A6%A8
page_py_cs = page_py.langlinks['cs']
print("Page - Summary: %s" % page_py_cs.summary[0:60])
# Page - Summary: Python (anglická výslovnost [ˈpaiθtən]) je vysokoúrovňový sk
How To Get Links To Other Pages
If you want to get all links to other wiki pages from given page, you need to use property links. It’s map, where key is page title and value is WikipediaPage.
def print_links(page):
links = page.links
for title in sorted(links.keys()):
print("%s: %s" % (title, links[title]))
print_links(page_py)
# 3ds Max: 3ds Max (id: ??, ns: 0)
# ?:: ?: (id: ??, ns: 0)
# ABC (programming language): ABC (programming language) (id: ??, ns: 0)
# ALGOL 68: ALGOL 68 (id: ??, ns: 0)
# Abaqus: Abaqus (id: ??, ns: 0)
# ...
How To Get Page Categories
If you want to get all categories under which page belongs, you should use property categories. It’s map, where key is category title and value is WikipediaPage.
def print_categories(page):
categories = page.categories
for title in sorted(categories.keys()):
print("%s: %s" % (title, categories[title]))
print("Categories")
print_categories(page_py)
# Category:All articles containing potentially dated statements: ...
# Category:All articles with unsourced statements: ...
# Category:Articles containing potentially dated statements from August 2016: ...
# Category:Articles containing potentially dated statements from March 2017: ...
# Category:Articles containing potentially dated statements from September 2017: ...
How To Get All Pages From Category
To get all pages from given category, you should use property categorymembers. It returns all members of given category. You have to implement recursion and deduplication by yourself.
def print_categorymembers(categorymembers, level=0, max_level=2):
for c in categorymembers.values():
print("%s: %s (ns: %d)" % ("*" * (level + 1), c.title, c.ns))
if c.ns == wikipediaapi.Namespace.CATEGORY and level <= max_level:
print_categorymembers(c.categorymembers, level + 1)
cat = wiki_wiki.page("Category:Physics")
print("Category members: Category:Physics")
print_categorymembers(cat.categorymembers)
# Category members: Category:Physics
# * Statistical mechanics (ns: 0)
# * Category:Physical quantities (ns: 14)
# ** Refractive index (ns: 0)
# ** Vapor quality (ns: 0)
# ** Electric susceptibility (ns: 0)
# ** Specific weight (ns: 0)
# ** Category:Viscosity (ns: 14)
# *** Brookfield Engineering (ns: 0)
External Links
Other Badges
Other Pages
Changelog
0.4.1
Uses code style enforced by flake8
Increased code coverage
0.4.0
Uses type annotations => minimal requirement is now Python 3.5
Adds possibility to use more parameters for request. For example:
api = wikipediaapi.Wikipedia(
language='en',
proxies={'http': 'http://localhost:1234'}
)
Extended documentation
0.3.4
Adds support for property Categorymembers
Adds property text for retrieving complete text of the page
0.3.3
Added support for request timeout
Add header: Accept-Encoding: gzip
0.3.2
Added support for property Categories
0.3.1
Removing WikipediaLangLink
Page keeps track of its own language, so it’s easier to jump between different translations of the same page
0.3.0
Rename directory from wikipedia to wikipediaapi to avoid collisions
0.2.4
Handle redirects properly
0.2.3
Usage method page instead of article in Wikipedia
0.2.2
Added support for property Links
0.2.1
Added support for property Langlinks
0.2.0
Use properties instead of functions
Added support for property Info
0.1.6
Support for extracting texts with HTML markdown
Added initial version of unit tests
0.1.4
It’s possible to extract summary and sections of the page
Added support for property Extracts
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file Wikipedia-API-0.4.1.tar.gz
.
File metadata
- Download URL: Wikipedia-API-0.4.1.tar.gz
- Upload date:
- Size: 16.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.28.1 CPython/3.6.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b871a6a30457f251d20afa7929bbca9ee3cdcbe40d2a7acebd5b1862564ddf99 |
|
MD5 | 0fdf9bf33766982fd2146e060481d827 |
|
BLAKE2b-256 | 5fe5b730d3cc1ed139fe14c8cd5f5cf42f78003457a1a5531954ddf443c8c494 |