Skip to main content

Python Wrapper for Wikipedia

Project description

Wikipedia API

This package provides python API for accessing Wikipedia.

build status Documentation Status Code Climate Issue Count

Installation

pip3 install wikipedia-api

Usage

import wikipedia

# Extract data in Wiki format
wiki_wiki = wikipedia.Wikipedia('en')

page_py = wiki_wiki.article('Python_(programming_language)')

print("Page - Id: %s" % page_py.id())
# Page - Id: 23862

print("Page - Title: %s" % page_py.title())
# Page - Title: Python (programming language)

print("Page - Summary: %s" % page_py.summary()[0:60])
# Page - Summary: Python is a widely used high-level programming language for


def print_sections(sections, level=0):
        for s in sections:
                print("%s: %s - %s" % ("*" * (level + 1), s.title(), s.text()[0:40]))
                print_sections(s.sections(), level + 1)


print_sections(page_py.sections())
# *: History - Python was conceived in the late 1980s,
# *: Features and philosophy - Python is a multi-paradigm programming l
# *: Syntax and semantics - Python is meant to be an easily readable
# **: Indentation - Python uses whitespace indentation, rath
# **: Statements and control flow - Python's statements include (among other
# **: Expressions - Some Python expressions are similar to l
# ...

section_py = page_py.section_by_title('Features and philosophy')
print("Section - Title: %s" % section_py.title())
# Section - Title: Features and philosophy

print("Section - Text: %s" % section_py.text()[0:60])
# Section - Text: Python is a multi-paradigm programming language. Object-orie

# Now lets extract texts with HTML tags
wiki_html = wikipedia.Wikipedia(
        language='cs',
        extract_format=wikipedia.ExtractFormat.HTML
)

page_ostrava = wiki_html.article('Ostrava')
print("Page - Id: %s" % page_ostrava.id())
# Page - Id: 7667

print("Page - Title: %s" % page_ostrava.title())
# Page - Title: Ostrava

print("Page - Summary: %s" % page_ostrava.summary()[0:60])
# Page - Summary: <p><b>Ostrava</b> (polsky <span lang="pl" title="polština" x

print_sections(page_ostrava.sections())
# *: Znak a logo -
# **: Heraldický znak - <p>Městský znak je blasonován: <i>V modr
# **: Marketingové logo - <p>V roce 2008 bylo představeno nové log
# *: Historie - <dl><dd><i>Související informace nalezne
# **: Zemské hranice - <p>Zemské hranice zde tvoří řeky Odra a
# *: Obyvatelstvo - <ul class="gallery mw-gallery-traditiona

section_ostrava = page_ostrava.section_by_title('Heraldický znak')
print("Section - Title: %s" % section_ostrava.title())
# Section - Title: Heraldický znak

print("Section - Text: %s" % section_ostrava.text()[0:60])
# Section - Text: <p>Městský znak je blasonován: <i>V modrém štítě na zeleném

Changelog

0.1.6

  • Support for extracting texts with HTML markdown

  • Added initial version of unit tests

0.1.4

  • It’s possible to extract summary and sections of the page

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Wikipedia-API-0.1.6.tar.gz (7.0 kB view details)

Uploaded Source

File details

Details for the file Wikipedia-API-0.1.6.tar.gz.

File metadata

File hashes

Hashes for Wikipedia-API-0.1.6.tar.gz
Algorithm Hash digest
SHA256 38863b7cda3e09eee644e67bb1191967de0f2835cbd5d3b8ea093f79f5faee52
MD5 e90a8fef337119f8c01feb43a1b4cd6b
BLAKE2b-256 c2fef6f5768de1ab353c3dda6dc853134d57f4d6c155aaec845b386146b0f533

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page