Python MediaWiki Bot Framework
Project description
Pywikibot
The Pywikibot framework is a Python library that interfaces with the MediaWiki API version 1.23 or higher.
Also included are various general function scripts that can be adapted for different tasks.
For further information about the library excluding scripts see the full code documentation.
Quick start
pip install requests git clone https://gerrit.wikimedia.org/r/pywikibot/core.git cd core git submodule update --init python pwb.py script_name
Or to install using PyPI (excluding scripts)
pip install -U setuptools pip install pywikibot pwb <scriptname>
In addition a MediaWiki markup parser is required. Please install one of them:
pip install mwparserfromhell
or
pip install wikitextparser
Our installation guide has more details for advanced usage.
Basic Usage
If you wish to write your own script it’s very easy to get started:
import pywikibot site = pywikibot.Site('en', 'wikipedia') # The site we want to run our bot on page = pywikibot.Page(site, 'Wikipedia:Sandbox') page.text = page.text.replace('foo', 'bar') page.save('Replacing "foo" with "bar"') # Saves the page
Wikibase Usage
Wikibase is a flexible knowledge base software that drives Wikidata. A sample pywikibot script for getting data from Wikibase:
import pywikibot site = pywikibot.Site('wikipedia:en') repo = site.data_repository() # the Wikibase repository for given site page = repo.page_from_repository('Q91') # create a local page for the given item item = pywikibot.ItemPage(repo, 'Q91') # a repository item data = item.get() # get all item data from repository for this item
Script example
Pywikibot provides bot classes to develop your own script easily:
import pywikibot from pywikibot import pagegenerators from pywikibot.bot import ExistingPageBot class MyBot(ExistingPageBot): update_options = { 'text': 'This is a test text', 'summary: 'Bot: a bot test edit with Pywikibot.' } def treat_page(self): """Load the given page, do some changes, and save it.""" text = self.current_page.text text += '\n' + self.opt.text self.put_current(text, summary=self.opt.summary) def main(): """Parse command line arguments and invoke bot.""" options = {} gen_factory = pagegenerators.GeneratorFactory() # Option parsing local_args = pywikibot.handle_args(args) # global options local_args = gen_factory.handle_args(local_args) # generators options for arg in local_args: opt, sep, value = arg.partition(':') if opt in ('-summary', '-text'): options[opt[1:]] = value MyBot(generator=gen_factory.getCombinedGenerator(), **options).run() if __name == '__main__': main()
For more documentation on Pywikibot see our docs.
Required external programs
It may require the following programs to function properly:
7za: To extract 7z files
Roadmap
Current release 7.3.0
Add support for kcgwiki (T305282)
Raise InvalidTitleError instead of unspecific ValueError in ProofreadPage (T308016)
Preload pages if GeneratorFactory.articlenotfilter_list is not empty; also set attribute is_preloading.
ClaimCollection.toJSON() should not ignore new claim (T308245)
use linktrail via siteinfo and remove update_linkrtrails maintenance script
Print counter statistic for all counters (T307834)
Use proofreadpagesinindex query module
Prioritize -namespaces options in pagegenerators.handle_args (T222519)
Remove ThreadList.stop_all() method (T307830)
L10N updates
Improve get_charset_from_content_type function (T307760)
A tiny cache wrapper was added to hold results of parameterless methods and properties
Increase workers in preload_sites.py
Close logging handlers before deleting them (T91375, T286127)
Clear _sites cache if called with pwb wrapper (T225594)
Enable short creation of a site if family name is equal to site code
Use exc_info=True with pywikibot.exception() by default (T306762)
Make IndexPage more robust when getting links in Page ns (T307280)
Do not print log header twice in log files (T264235)
Do not delegate logging output to the root logger (T281643)
Add get_charset_from_content_type to extract the charset from the content-type response header
Deprecations
7.3.0: Python 3.5 support will be dropped with Python 8 (T301908)
7.2.0: XMLDumpOldPageGenerator is deprecated in favour of a content parameter (T306134)
7.2.0: RedirectPageBot and NoRedirectPageBot bot classes are deprecated in favour of use_redirects attribute
7.2.0: tools.formatter.color_format is deprecated and will be removed
7.1.0: win32_unicode.py will be removed with Pywikibot 8
7.1.0: Unused get_redirect parameter of Page.getOldVersion() will be removed
7.1.0: APISite._simple_request() will be removed in favour of APISite.simple_request()
7.0.0: The i18n identifier ‘cosmetic_changes-append’ will be removed in favour of ‘pywikibot-cosmetic-changes’
7.0.0: User.isBlocked() method is renamed to is_blocked for consistency
7.0.0: Require mysql >= 0.7.11 (T216741)
7.0.0: Private BaseBot counters _treat_counter, _save_counter, _skip_counter will be removed in favour of collections.Counter counter attribute
7.0.0: A boolean watch parameter in Page.save() is deprecated and will be desupported
7.0.0: baserevid parameter of editSource(), editQualifier(), removeClaims(), removeSources(), remove_qualifiers() DataSite methods will be removed
7.0.0: Values of APISite.allpages() parameter filterredir other than True, False and None are deprecated
6.5.0: OutputOption.output() method will be removed in favour of OutputOption.out property
6.5.0: Infinite rotating file handler with logfilecount of -1 is deprecated
6.4.0: ‘allow_duplicates’ parameter of tools.intersect_generators as positional argument is deprecated, use keyword argument instead
6.4.0: ‘iterables’ of tools.intersect_generators given as a list or tuple is deprecated, either use consecutive iterables or use ‘*’ to unpack
6.2.0: outputter of OutputProxyOption without out property is deprecated
6.2.0: ContextOption.output_range() and HighlightContextOption.output_range() are deprecated
6.2.0: Error messages with ‘%’ style is deprecated in favour for str.format() style
6.2.0: page.url2unicode() function is deprecated in favour of tools.chars.url2string()
6.2.0: Throttle.multiplydelay attribute is deprecated
6.2.0: SequenceOutputter.format_list() is deprecated in favour of ‘out’ property
6.0.0: config.register_family_file() is deprecated
5.5.0: APISite.redirectRegex() is deprecated in favour of APISite.redirect_regex() and will be removed with Pywikibot 8
4.0.0: Revision.parent_id is deprecated in favour of Revision.parentid and will be removed with Pywikibot 8
4.0.0: Revision.content_model is deprecated in favour of Revision.contentmodel and will be removed with Pywikibot 8
Release history
See https://github.com/wikimedia/pywikibot/blob/stable/HISTORY.rst
Contributing
Our code is maintained on Wikimedia’s Gerrit installation, learn how to get started.
Code of Conduct
The development of this software is covered by a Code of Conduct.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pywikibot-7.3.0.tar.gz
.
File metadata
- Download URL: pywikibot-7.3.0.tar.gz
- Upload date:
- Size: 569.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.10.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 69e804be46c9f79b350cbd63a519ff26ba08e26e35ddf9da9eb7cfe8e3a0cf14 |
|
MD5 | 80404992a04fdfe5443dfbff4a403390 |
|
BLAKE2b-256 | d2b3d8bd97140f17845a953da8b37ab16b463ec836c9615ae621b613ce39fc1a |