Skip to main content

Python Wrapper for Wikipedia

Project description

Wikipedia API

This package provides python API for accessing Wikipedia.

build status Documentation Status Code Climate Issue Count

Installation

pip3 install wikipedia-api

Usage

import wikipediaapi

# Extract data in Wiki format
wiki_wiki = wikipediaapi.Wikipedia('en')

page_py = wiki_wiki.page('Python_(programming_language)')

print("Page - Exists: %s" % page_py.exists())
# Page - Exists: True

print("Page - Id: %s" % page_py.pageid)
# Page - Id: 23862

print("Page - Title: %s" % page_py.title)
# Page - Title: Python (programming language)

print("Page - Summary: %s" % page_py.summary[0:60])
# Page - Summary: Python is a widely used high-level programming language for

def print_sections(sections, level=0):
        for s in sections:
                print("%s: %s - %s" % ("*" * (level + 1), s.title, s.text[0:40]))
                print_sections(s.sections, level + 1)


print_sections(page_py.sections)
# *: History - Python was conceived in the late 1980s,
# *: Features and philosophy - Python is a multi-paradigm programming l
# *: Syntax and semantics - Python is meant to be an easily readable
# **: Indentation - Python uses whitespace indentation, rath
# **: Statements and control flow - Python's statements include (among other
# **: Expressions - Some Python expressions are similar to l
# ...

def print_langlinks(page):
        langlinks = page.langlinks
        for k in sorted(langlinks.keys()):
            v = langlinks[k]
            print("%s: %s - %s: %s" % (k, v.language, v.title, v.fullurl))

print_langlinks(page_py)
# af: af - Python (programmeertaal): https://af.wikipedia.org/wiki/Python_(programmeertaal)
# als: als - Python (Programmiersprache): https://als.wikipedia.org/wiki/Python_(Programmiersprache)
# an: an - Python: https://an.wikipedia.org/wiki/Python
# ar: ar - بايثون: https://ar.wikipedia.org/wiki/%D8%A8%D8%A7%D9%8A%D8%AB%D9%88%D9%86
# as: as - পাইথন: https://as.wikipedia.org/wiki/%E0%A6%AA%E0%A6%BE%E0%A6%87%E0%A6%A5%E0%A6%A8
# ...

def print_links(page):
        links = page.links
        for title in sorted(links.keys()):
            print("%s: %s" % (title, links[title]))

print_links(page_py)
# 3ds Max: 3ds Max (id: ??, ns: 0)
# ?:: ?: (id: ??, ns: 0)
# ABC (programming language): ABC (programming language) (id: ??, ns: 0)
# ALGOL 68: ALGOL 68 (id: ??, ns: 0)
# Abaqus: Abaqus (id: ??, ns: 0)
# ...

def print_categories(page):
        categories = page.categories
        for title in sorted(categories.keys()):
            print("%s: %s" % (title, categories[title]))


print("Categories")
print_categories(page_py)
# Category:All articles containing potentially dated statements: ...
# Category:All articles with unsourced statements: ...
# Category:Articles containing potentially dated statements from August 2016: ...
# Category:Articles containing potentially dated statements from March 2017: ...
# Category:Articles containing potentially dated statements from September 2017: ...
# ...

section_py = page_py.section_by_title('Features and philosophy')
print("Section - Title: %s" % section_py.title)
# Section - Title: Features and philosophy

print("Section - Text: %s" % section_py.text[0:60])
# Section - Text: Python is a multi-paradigm programming language. Object-orie

# Now lets extract texts with HTML tags
wiki_html = wikipediaapi.Wikipedia(
        language='cs',
        extract_format=wikipediaapi.ExtractFormat.HTML
)

page_ostrava = wiki_html.page('Ostrava')
print("Page - Summary: %s" % page_ostrava.summary[0:60])
# Page - Summary: <p><b>Ostrava</b> (polsky <span lang="pl" title="polština" x

page_nonexisting = wiki_wiki.page('Wikipedia-API-FooBar')
print("Page - Exists: %s" % page_nonexisting.exists())
# Page - Exists: False

print("Page - Id: %s" % page_nonexisting.pageid)
# Page - Id: -1

# Create wikipedia for Germany
wiki_de = wikipediaapi.Wikipedia('de')
de_page = wiki_de.page('Deutsche Sprache')
print(de_page.title + ": " + de_page.fullurl)
# Deutsche Sprache: https://de.wikipedia.org/wiki/Deutsche_Sprache
print(de_page.summary[0:60])
# Die deutsche Sprache bzw. Deutsch [dɔʏ̯t͡ʃ], abgekürzt Dt. o

# But you can still fetch data from english version
en_page = de_page.langlinks['en']
print(en_page.title + ": " + en_page.fullurl)
# German language: https://en.wikipedia.org/wiki/German_language
print(en_page.summary[0:60])
# German (Deutsch [ˈdɔʏt͡ʃ] ( listen)) is a West Germanic lang

Changelog

0.3.3

0.3.2

0.3.1

  • Removing WikipediaLangLink

  • Page keeps track of its own language, so it’s easier to jump between different translations of the same page

0.3.0

  • Rename directory from wikipedia to wikipediaapi to avoid collisions

0.2.4

  • Handle redirects properly

0.2.3

  • Usage method page instead of article in Wikipedia

0.2.2

0.2.1

0.2.0

  • Use properties instead of functions

  • Added support for property Info

0.1.6

  • Support for extracting texts with HTML markdown

  • Added initial version of unit tests

0.1.4

  • It’s possible to extract summary and sections of the page

  • Added support for property Extracts

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Wikipedia-API-0.3.3.tar.gz (11.5 kB view details)

Uploaded Source

File details

Details for the file Wikipedia-API-0.3.3.tar.gz.

File metadata

File hashes

Hashes for Wikipedia-API-0.3.3.tar.gz
Algorithm Hash digest
SHA256 216f24239e7d703403e5d82cdeb324a6e58387e63b9e36a18c6f9924c33f4d19
MD5 71a34d88a5e6b514fbac328cec5478b9
BLAKE2b-256 9fa098debe09405e33dfe10c70909d6c057c084a3184bfdb74574fd76d9184cf

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page