Skip to main content

Tools for getting data from MediaWiki websites

Project description

MediaWiki Tools

Coverage Status

A high level library containing set of tools for for filtering pages using the rich data available in MediaWikis such as categories and info boxes. Uses both web-scraping and API methods (where available and feasible) to gather information.

Goals

  • Generate useful data (and datasets) from a wiki.
  • To work on any MediaWiki (including fandom.com) with or without api.
  • Get arbitrary subsets of pages based on categories and template parameters (todo).
  • Be very robust to variations and inconsistencies in user input.
  • Be efficient.

Installation

Install it using pip.

pip install mediawiki-tools

Requires python >3.8 because I like the walrus operator.

Usage

Check out the basic usage guide and detailed API documentation.

Example

Question: Which countries in Asia use english as spoken Language?

Answer:

import mwtools

wiki = MediaWikiTools('en.wikipedia.org')

wiki.get_set(['Countries in Asia', 
              'English-speaking countries and territories'], 
             'and')
# ['Philippines', 'Pakistan', 'Bahrain', 'Singapore', 'Brunei', 'India']

Question: Which countries in Asia or Europe use english as spoken Language?

Answer:

wiki.get_set(['Countries in Asia', 'Countries in Europe',
              'English-speaking countries and territories'], 
             ['or','and'])
# ['Philippines',
#  'United Kingdom',
#  'Brunei',
#  'Malta',
#  'India',
#  'Pakistan',
#  'Scotland',
#  'Republic of Ireland',
#  'Singapore',
#  'Bahrain']

Question: Which of these countries are not island nations?

Answer:

wiki.get_set(['Countries in Asia', 'Countries in Europe',
              'English-speaking countries and territories',
              'Island countries'], 
             ['or', 'and', 'not'])
# ['Pakistan', 'India']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MediaWiki-Tools-0.1.1.tar.gz (9.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

MediaWiki_Tools-0.1.1-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file MediaWiki-Tools-0.1.1.tar.gz.

File metadata

  • Download URL: MediaWiki-Tools-0.1.1.tar.gz
  • Upload date:
  • Size: 9.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2

File hashes

Hashes for MediaWiki-Tools-0.1.1.tar.gz
Algorithm Hash digest
SHA256 9c7007e26ca797c45959cf1a7a0a110b4d4bb159f70b995fd88489a82c786174
MD5 4a6a66fb29dd68dc8b0dff7e2f34eefc
BLAKE2b-256 dfa8afcef432f503adac8acb326371f08a8b968200566391e86b4689e7f5f79c

See more details on using hashes here.

File details

Details for the file MediaWiki_Tools-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: MediaWiki_Tools-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.10.2

File hashes

Hashes for MediaWiki_Tools-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4abf205563656e4ddda0cf655ca5f9d8d272eb264a0fdb853fdd67e6b19c904c
MD5 78e3e39c18869501e766c44f8b42f504
BLAKE2b-256 4a6d06ed7dc5e429363d39d78c1994da4d17c5e82faa8cf0a92eb9af27e4b563

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page