Wikipedia-API

Python Wrapper for Wikipedia

These details have not been verified by PyPI

Project links

Project description

Wikipedia API

Wikipedia-API is easy to use Python wrapper for Wikipedias’ API. It supports extracting texts, sections, links, categories, translations, etc from Wikipedia. Documentation provides code snippets for the most common use cases.

Installation

This package requires at least Python 3.4 to install because it’s using IntEnum.

pip3 install wikipedia-api

Usage

Goal of Wikipedia-API is to provide simple and easy to use API for retrieving informations from Wikipedia. Bellow are examples of common use cases.

Importing

import wikipediaapi

How To Get Single Page

Getting single page is straightforward. You have to initialize Wikipedia object and ask for page by its name. To initialize it, you have to provide:

user_agent to identify your project. Please follow the recommended format.
language to specify language mutation. It has to be one of supported languages.

import wikipediaapi
    wiki_wiki = wikipediaapi.Wikipedia('MyProjectName (merlin@example.com)', 'en')

    page_py = wiki_wiki.page('Python_(programming_language)')

How To Check If Wiki Page Exists

For checking, whether page exists, you can use function exists.

page_py = wiki_wiki.page('Python_(programming_language)')
print("Page - Exists: %s" % page_py.exists())
# Page - Exists: True

page_missing = wiki_wiki.page('NonExistingPageWithStrangeName')
print("Page - Exists: %s" %     page_missing.exists())
# Page - Exists: False

How To Get Page Summary

Class WikipediaPage has property summary, which returns description of Wiki page.

import wikipediaapi
    wiki_wiki = wikipediaapi.Wikipedia('MyProjectName (merlin@example.com)', 'en')

    print("Page - Title: %s" % page_py.title)
    # Page - Title: Python (programming language)

    print("Page - Summary: %s" % page_py.summary[0:60])
    # Page - Summary: Python is a widely used high-level programming language for

How To Get Page URL

WikipediaPage has two properties with URL of the page. It is fullurl and canonicalurl.

print(page_py.fullurl)
# https://en.wikipedia.org/wiki/Python_(programming_language)

print(page_py.canonicalurl)
# https://en.wikipedia.org/wiki/Python_(programming_language)

How To Get Full Text

To get full text of Wikipedia page you should use property text which constructs text of the page as concatanation of summary and sections with their titles and texts.

wiki_wiki = wikipediaapi.Wikipedia(
    user_agent='MyProjectName (merlin@example.com)',
        language='en',
        extract_format=wikipediaapi.ExtractFormat.WIKI
)

p_wiki = wiki_wiki.page("Test 1")
print(p_wiki.text)
# Summary
# Section 1
# Text of section 1
# Section 1.1
# Text of section 1.1
# ...


wiki_html = wikipediaapi.Wikipedia(
    user_agent='MyProjectName (merlin@example.com)',
        language='en',
        extract_format=wikipediaapi.ExtractFormat.HTML
)
p_html = wiki_html.page("Test 1")
print(p_html.text)
# <p>Summary</p>
# <h2>Section 1</h2>
# <p>Text of section 1</p>
# <h3>Section 1.1</h3>
# <p>Text of section 1.1</p>
# ...

How To Get Page Sections

To get all top level sections of page, you have to use property sections. It returns list of WikipediaPageSection, so you have to use recursion to get all subsections.

def print_sections(sections, level=0):
        for s in sections:
                print("%s: %s - %s" % ("*" * (level + 1), s.title, s.text[0:40]))
                print_sections(s.sections, level + 1)


print_sections(page_py.sections)
# *: History - Python was conceived in the late 1980s,
# *: Features and philosophy - Python is a multi-paradigm programming l
# *: Syntax and semantics - Python is meant to be an easily readable
# **: Indentation - Python uses whitespace indentation, rath
# **: Statements and control flow - Python's statements include (among other
# **: Expressions - Some Python expressions are similar to l

How To Get Page Section By Title

To get last section of page with given title, you have to use function section_by_title. It returns the last WikipediaPageSection with this title.

section_history = page_py.section_by_title('History')
print("%s - %s" % (section_history.title, section_history.text[0:40]))

# History - Python was conceived in the late 1980s b

How To Get All Page Sections By Title

To get all sections of page with given title, you have to use function sections_by_title. It returns the all WikipediaPageSection with this title.

    page_1920 = wiki_wiki.page('1920')
    sections_january = page_1920.sections_by_title('January')
    for s in sections_january:
        print("* %s - %s" % (s.title, s.text[0:40]))

# * January - January 1
# Polish–Soviet War in 1920: The
# * January - January 2
# Isaac Asimov, American author
# * January - January 1 – Zygmunt Gorazdowski, Polish

How To Get Page In Other Languages

If you want to get other translations of given page, you should use property langlinks. It is map, where key is language code and value is WikipediaPage.

def print_langlinks(page):
        langlinks = page.langlinks
        for k in sorted(langlinks.keys()):
            v = langlinks[k]
            print("%s: %s - %s: %s" % (k, v.language, v.title, v.fullurl))

print_langlinks(page_py)
# af: af - Python (programmeertaal): https://af.wikipedia.org/wiki/Python_(programmeertaal)
# als: als - Python (Programmiersprache): https://als.wikipedia.org/wiki/Python_(Programmiersprache)
# an: an - Python: https://an.wikipedia.org/wiki/Python
# ar: ar - بايثون: https://ar.wikipedia.org/wiki/%D8%A8%D8%A7%D9%8A%D8%AB%D9%88%D9%86
# as: as - পাইথন: https://as.wikipedia.org/wiki/%E0%A6%AA%E0%A6%BE%E0%A6%87%E0%A6%A5%E0%A6%A8

page_py_cs = page_py.langlinks['cs']
print("Page - Summary: %s" % page_py_cs.summary[0:60])
# Page - Summary: Python (anglická výslovnost [ˈpaiθtən]) je vysokoúrovňový sk

How To Get Links To Other Pages

If you want to get all links to other wiki pages from given page, you need to use property links. It’s map, where key is page title and value is WikipediaPage.

def print_links(page):
        links = page.links
        for title in sorted(links.keys()):
            print("%s: %s" % (title, links[title]))

print_links(page_py)
# 3ds Max: 3ds Max (id: ??, ns: 0)
# ?:: ?: (id: ??, ns: 0)
# ABC (programming language): ABC (programming language) (id: ??, ns: 0)
# ALGOL 68: ALGOL 68 (id: ??, ns: 0)
# Abaqus: Abaqus (id: ??, ns: 0)
# ...

How To Get Page Categories

If you want to get all categories under which page belongs, you should use property categories. It’s map, where key is category title and value is WikipediaPage.

def print_categories(page):
        categories = page.categories
        for title in sorted(categories.keys()):
            print("%s: %s" % (title, categories[title]))


print("Categories")
print_categories(page_py)
# Category:All articles containing potentially dated statements: ...
# Category:All articles with unsourced statements: ...
# Category:Articles containing potentially dated statements from August 2016: ...
# Category:Articles containing potentially dated statements from March 2017: ...
# Category:Articles containing potentially dated statements from September 2017: ...

How To Get All Pages From Category

To get all pages from given category, you should use property categorymembers. It returns all members of given category. You have to implement recursion and deduplication by yourself.

def print_categorymembers(categorymembers, level=0, max_level=1):
        for c in categorymembers.values():
            print("%s: %s (ns: %d)" % ("*" * (level + 1), c.title, c.ns))
            if c.ns == wikipediaapi.Namespace.CATEGORY and level < max_level:
                print_categorymembers(c.categorymembers, level=level + 1, max_level=max_level)


cat = wiki_wiki.page("Category:Physics")
print("Category members: Category:Physics")
print_categorymembers(cat.categorymembers)

# Category members: Category:Physics
# * Statistical mechanics (ns: 0)
# * Category:Physical quantities (ns: 14)
# ** Refractive index (ns: 0)
# ** Vapor quality (ns: 0)
# ** Electric susceptibility (ns: 0)
# ** Specific weight (ns: 0)
# ** Category:Viscosity (ns: 14)
# *** Brookfield Engineering (ns: 0)

How To See Underlying API Call

If you have problems with retrieving data you can get URL of undrerlying API call. This will help you determine if the problem is in the library or somewhere else.

import wikipediaapi
import sys
wikipediaapi.log.setLevel(level=wikipediaapi.logging.DEBUG)

# Set handler if you use Python in interactive mode
out_hdlr = wikipediaapi.logging.StreamHandler(sys.stderr)
out_hdlr.setFormatter(wikipediaapi.logging.Formatter('%(asctime)s %(message)s'))
out_hdlr.setLevel(wikipediaapi.logging.DEBUG)
wikipediaapi.log.addHandler(out_hdlr)

wiki = wikipediaapi.Wikipedia(user_agent='MyProjectName (merlin@example.com)', language='en')

page_ostrava = wiki.page('Ostrava')
print(page_ostrava.summary)
# logger prints out: Request URL: http://en.wikipedia.org/w/api.php?action=query&prop=extracts&titles=Ostrava&explaintext=1&exsectionformat=wiki

External Links

Other Badges

Other Pages

Changelog

0.5.8

Adds support for retrieving all sections with given name - Issue 39

0.5.4

Namespace could be arbitrary integer - Issue 29

0.5.3

Adds persistent HTTP connection - Issue 26
- Downloading 50 pages reduced from 13s to 8s => 40% speed up

0.5.2

Adds namespaces 102 - 105 - Issue 24

0.5.1

Adds tox for testing different Python versions

0.5.0

Allows modifying API call parameters
Fixes Issue 16 - hidden categories
Fixes Issue 21 - summary extraction

0.4.5

Handles missing sections correctly
Fixes Issue 20

0.4.4

Uses HTTPS directly instead of HTTP to avoid redirect

0.4.3

Correctly extracts text from pages without sections
Adds support for quoted page titles

api = wikipediaapi.Wikipedia(
    language='hi',
)
python = api.article(
    title='%E0%A4%AA%E0%A4%BE%E0%A4%87%E0%A4%A5%E0%A4%A8',
    unquote=True,
)
print(python.summary)

0.4.2

Adds support for Python 3.4 by not using f-strings

0.4.1

Uses code style enforced by flake8
Increased code coverage

0.4.0

Uses type annotations => minimal requirement is now Python 3.5
Adds possibility to use more parameters for request. For example:

api = wikipediaapi.Wikipedia(
    language='en',
    proxies={'http': 'http://localhost:1234'}
)

Extends documentation

0.3.4

Adds support for property Categorymembers
Adds property text for retrieving complete text of the page

0.3.3

Added support for request timeout
Add header: Accept-Encoding: gzip

0.3.2

Added support for property Categories

0.3.1

Removing WikipediaLangLink
Page keeps track of its own language, so it’s easier to jump between different translations of the same page

0.3.0

Rename directory from wikipedia to wikipediaapi to avoid collisions

0.2.4

Handle redirects properly

0.2.3

Usage method page instead of article in Wikipedia

0.2.2

Added support for property Links

0.2.1

Added support for property Langlinks

0.2.0

Use properties instead of functions
Added support for property Info

0.1.6

Support for extracting texts with HTML markdown
Added initial version of unit tests

0.1.4

It’s possible to extract summary and sections of the page
Added support for property Extracts

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.7.1

Aug 24, 2024

0.6.9

Aug 24, 2024

0.6.8

Aug 24, 2024

This version

0.6.0

Jun 29, 2023

0.5.8

Dec 17, 2022

0.5.7

Dec 17, 2022

0.5.4

Jan 26, 2020

0.5.3

Oct 20, 2019

0.5.2

Aug 18, 2019

0.5.1

Apr 7, 2019

0.5.0

Apr 7, 2019

0.4.5

Apr 7, 2019

0.4.4

Feb 2, 2019

0.4.3

Jan 27, 2019

0.4.2

Jan 6, 2019

0.4.1

Jan 6, 2019

0.3.7

Apr 18, 2018

0.3.6

Apr 18, 2018

0.3.5

Feb 13, 2018

0.3.4

Dec 19, 2017

0.3.3

Dec 16, 2017

0.3.2

Dec 13, 2017

0.3.1

Dec 13, 2017

0.3.0

Dec 13, 2017

0.2.4

Dec 13, 2017

0.2.3

Dec 13, 2017

0.2.2

Dec 13, 2017

0.2.1

Dec 12, 2017

0.2.0

Dec 12, 2017

0.1.6

Dec 12, 2017

0.1.5

Dec 11, 2017

0.1.3

Dec 11, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Wikipedia-API-0.6.0.tar.gz (17.2 kB view details)

Uploaded Jun 29, 2023 Source

Built Distribution

Wikipedia_API-0.6.0-py3-none-any.whl (14.4 kB view details)

Uploaded Jun 29, 2023 Python 3

File details

Details for the file Wikipedia-API-0.6.0.tar.gz.

File metadata

Download URL: Wikipedia-API-0.6.0.tar.gz
Upload date: Jun 29, 2023
Size: 17.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.11.4

File hashes

Hashes for Wikipedia-API-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`61e94921cca9ec68e92aa5f258261d6a88b7baa960f9acfcb0c9c2c525dcb3ff`
MD5	`6229d48ae640305a6c7b57c108bdb92d`
BLAKE2b-256	`0ba5ae546250aaec1c6b5b4bab7cc97f07a47587f09c942f086d614ecdeeb422`

See more details on using hashes here.

File details

Details for the file Wikipedia_API-0.6.0-py3-none-any.whl.

File metadata

Download URL: Wikipedia_API-0.6.0-py3-none-any.whl
Upload date: Jun 29, 2023
Size: 14.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.11.4

File hashes

Hashes for Wikipedia_API-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6dfd6b3b680e342a3843fe954049c5784c1a67fadc0060f9d1696d1d0e41ecfb`
MD5	`2f3ff55aa91350d4ccf378c43d82ab2c`
BLAKE2b-256	`2f3f919727b460d88c899d110f98d1a0c415264b5d8ad8176f14ce7ad9db0e3b`

See more details on using hashes here.

Wikipedia-API 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Wikipedia API

Installation

Usage

Importing

How To Get Single Page

How To Check If Wiki Page Exists

How To Get Page Summary

How To Get Page URL

How To Get Full Text

How To Get Page Sections

How To Get Page Section By Title

How To Get All Page Sections By Title

How To Get Page In Other Languages

How To Get Links To Other Pages

How To Get Page Categories

How To Get All Pages From Category

How To See Underlying API Call

External Links

Other Badges

Other Pages

Changelog

0.5.8

0.5.4

0.5.3

0.5.2

0.5.1

0.5.0

0.4.5

0.4.4

0.4.3

0.4.2

0.4.1

0.4.0

0.3.4

0.3.3

0.3.2

0.3.1

0.3.0

0.2.4

0.2.3

0.2.2

0.2.1

0.2.0

0.1.6

0.1.4

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes