Tools for getting data from MediaWiki websites
Project description
MediaWiki Tools
A high level library containing set of tools for for filtering pages using the rich data available in MediaWikis such as categories and info boxes. Uses both web-scraping and API methods (where available and feasible) to gather information.
Goals
- Generate useful data (and datasets) from a wiki.
- To work on any MediaWiki (including
fandom.com
) with or without api. - Get arbitrary subsets of pages based on categories and template parameters (todo).
- Be very robust to variations and inconsistencies in user input.
- Be efficient.
Installation
Install it using pip.
pip install mediawiki-tools
Requires python >3.8
because I like the walrus operator.
Usage
Check out the basic usage guide and detailed API documentation.
Example
Question: Which countries in Asia use english as spoken Language?
Answer:
import mwtools
wiki = MediaWikiTools('en.wikipedia.org')
wiki.get_set(['Countries in Asia',
'English-speaking countries and territories'],
'and')
# ['Philippines', 'Pakistan', 'Bahrain', 'Singapore', 'Brunei', 'India']
Question: Which countries in Asia or Europe use english as spoken Language?
Answer:
wiki.get_set(['Countries in Asia', 'Countries in Europe',
'English-speaking countries and territories'],
['or','and'])
# ['Philippines',
# 'United Kingdom',
# 'Brunei',
# 'Malta',
# 'India',
# 'Pakistan',
# 'Scotland',
# 'Republic of Ireland',
# 'Singapore',
# 'Bahrain']
Question: Which of these countries are not island nations?
Answer:
wiki.get_set(['Countries in Asia', 'Countries in Europe',
'English-speaking countries and territories',
'Island countries'],
['or', 'and', 'not'])
# ['Pakistan', 'India']
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for MediaWiki_Tools-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7c3e75fbbe5c250a5c36013ebbf2c4e5171a46c5ba3f599572bb698ed38e825e |
|
MD5 | c0de48be8f3d0ab3e6dbbfeb6e27965d |
|
BLAKE2b-256 | 425d33caafe7e1f3e393294b4cd5231426b447cf75695ed897dc18018bb290c7 |