Python MediaWiki Bot Framework
Project description
Pywikibot
The Pywikibot framework is a Python library that interfaces with the MediaWiki API version 1.31 or higher.
Also included are various general function scripts that can be adapted for different tasks.
For further information about the library excluding scripts see the full code documentation.
Quick start
git clone https://gerrit.wikimedia.org/r/pywikibot/core.git
cd core
git submodule update --init
pip install -r requirements.txt
python pwb.py <script_name>
Or to install using PyPI (excluding scripts)
pip install pywikibot
pwb <scriptname>
Our installation guide has more details for advanced usage.
Basic Usage
If you wish to write your own script it’s very easy to get started:
import pywikibot
site = pywikibot.Site('en', 'wikipedia') # The site we want to run our bot on
page = pywikibot.Page(site, 'Wikipedia:Sandbox')
page.text = page.text.replace('foo', 'bar')
page.save('Replacing "foo" with "bar"') # Saves the page
Wikibase Usage
Wikibase is a flexible knowledge base software that drives Wikidata. A sample pywikibot script for getting data from Wikibase:
import pywikibot
site = pywikibot.Site('wikipedia:en')
repo = site.data_repository() # the Wikibase repository for given site
page = repo.page_from_repository('Q91') # create a local page for the given item
item = pywikibot.ItemPage(repo, 'Q91') # a repository item
data = item.get() # get all item data from repository for this item
Script example
Pywikibot provides bot classes to develop your own script easily:
import pywikibot
from pywikibot import pagegenerators
from pywikibot.bot import ExistingPageBot
class MyBot(ExistingPageBot):
update_options = {
'text': 'This is a test text',
'summary': 'Bot: a bot test edit with Pywikibot.'
}
def treat_page(self):
"""Load the given page, do some changes, and save it."""
text = self.current_page.text
text += '\n' + self.opt.text
self.put_current(text, summary=self.opt.summary)
def main():
"""Parse command line arguments and invoke bot."""
options = {}
gen_factory = pagegenerators.GeneratorFactory()
# Option parsing
local_args = pywikibot.handle_args(args) # global options
local_args = gen_factory.handle_args(local_args) # generators options
for arg in local_args:
opt, sep, value = arg.partition(':')
if opt in ('-summary', '-text'):
options[opt[1:]] = value
MyBot(generator=gen_factory.getCombinedGenerator(), **options).run()
if __name == '__main__':
main()
For more documentation on Pywikibot see our docs.
Roadmap
Release 11 (in development)
Improvements
Use URL to bot’s wiki page in user_agent_formatdue to Foundation UA Policy. (T414173, T414201)
Show Pywikibot version in deprecation warnings for configvariables.
config.pickle_protocolwas updated from version 2 to 5. Older pickle files are still readable.
Enhance throttle.Throttle.waittimefor read requests. (T415891)
config.minthrottlemay me a float. (T414170, T416145)
Implement Site.abuselog()site generator for AbuseLogand page.User.last_activitymethod. (T396297, T396298)
Use explicit utf-8 encoding with class:GraphSavingThread.graph.write <interwiki_graph.GraphSavingThread> (T415891)
Optimize pickle file storage of WikiWho with subdirectory structure (T414087)
Make textlib.TimeStrippermore resilient for itwiki. (T415880)
Add WikiWhoAPI support. (T414071)
Never use None as key in WeakKeyDictionary within proofreadpage.TagAttrDesc. Class-level access returns the descriptor itself. (T413563)
text_a and text_b of diff.PatchManagerare positional-only parameters. by_letter and replace_invisible are keyword-only parameters.
Optimize pagegenerators.SubCategoriesPageGenerator
Consider retry_after in delay calculation of throttle.Throttle.get_delay. (T414354)
Remove protocol swapping in data.api.Request. (T414369)
Use environment variables PYWIKIBOT_USERNAME or PWB_USERNAME for User-Agent username if username isn’t set in user-config.pyfor a given site. (T414201)
Add support for beta site in families.meta_family.Family(T413060)
Add user agent to data.api.Requesterror log (T414170)
Increase performance of delegation for BaseSite methods to family.Familymethods (T413398)
Use queue.shutdown() for the async_manager queue
Use backports.RLockinstead of Queue to signal async_manager activity (T147178)
Add User.is_partial_blocked()and methods APISite.is_partial_blocked()to detect partial blocks. (T412613)
Add get_block_info()method to pywikibot.User class to retrieve detailed block information including block ID, reason, expiry, and restrictions (T412613)
Java based GraalPy is supported but Pillow cannot be used (T412739)
Free threading Python is supported with some restrictions. (T408131, T412605, T412624)
i18n updates.
Provide a security policy with Pywikibot. (T410753)
Show a friendly install message with pwbwrapper when mandatory packages are missing (T409662).
Update tools._unidata.__category_cf dict for tools.chars.contains_invisibleand tools.chars.replace_invisibleto unicode version 17.0.0.
Update Docker files to Python 3.12. (T408997)
Bugfixes
Remove invisible chars from textlib.Section.heading. (T411307)
Do not raise exceptions.UnknownExtensionErrorwithin APISite.page_from_repository() on non-Wikibase sites (T414068)
Handle retry-after value gracefully if it is a float instead an int (T414197)
Handle limit value gracefully if it is an int instead a str (T414168)
Handle lockmanager-fail-conflict API error in data.api.Request.submitas retryable (T396984)
Prevent login loop in data.supersetwith unsupported auth methods (T408287)
Code cleanups
{httplib2} user_agent_format variable is no longer supported (T98439)
The undocumented page_put_queue_busy was removed without deprecation period.
Dysfunctional APISite.alllinks() was removed. (T359427, T407708)
The inheritance of the exceptions.NoSiteLinkErrorexception from exceptions.NoPageErrorwas removed
The dropdelay and releasepid attributes of the throttle.Throttleclass was removed in favour of the expiry class attribute.
The regex attributes ptimeR, ptimeznR, pyearR, pmonthR, and pdayR of the textlib.TimeStripperclass was removed in favour of the patterns attribute, which is a textlib.TimeStripperPatternsobject.
The groups attribute of the textlib.TimeStripperwas removed in favour of the textlib.TIMEGROUPSconstant.
The addOnly parameter in the textlib.replaceLanguageLinksand textlib.replaceCategoryLinkswas dropped in favour of add_only.
load_tokens method of TokenWalletwas removed; clear method can be used instead.
No longer support legacy API tokens of MediaWiki 1.23 and older. (270380, 306637)
use_hard_category_redirect Site and Family properties were removed. (T348953)
The all parameter of APISite.get_tokens()` was removed; use an empty string instead.
APISite.validate_tokens() method was removed.
APISite.messages() method was removed in favour of the userinfo[‘messages’]attribute
Page.editTime() method was removed; Page.latest_revision.timestamp attribute can be used instead
data.api.QueryGenerator.continuekey was be removed in favour of data.api.QueryGenerator.modules
The Timestamp.clone() method was removed in favour of the Timestamp.replace() method
The tools.itertools.itergroup function was removed in favour of the backports.batchedor itertools.batchedfunction.
The get_login_token() method of login.ClientLoginManager was removed and can be replaces by login.LoginManager.site.tokens['login']
The family.Family.maximum_GET_lengthmethod was removed in favour of the config.maximum_GET_lengthconfiguration option (T325957)
The exceptions.Server414Error exception was replaced by exceptions.Client414Errorexception
The modules_only_mode parameter in the data.api.ParamInfoclass, its paraminfo_keys class attribute, and its preloaded_modules property was removed
The data.api.LoginManager() constructor was removed in favour of the login.ClientLoginManagerclass
The normalize parameter was removed from the pywikibot.WbTime.toTimestrand pywikibot.WbTime.toWikibase methods in Pywikibot 8.2. Since Pywikibot 11, passing normalize as an argument raises an error, because support for legacy arguments via was removed.
Several typing types were removed from backports.
The cache decorator was removed from backports. The @functools.cache() can be used instead. (T401802)
The functions removeprefix and removesuffix were removed from backports. The stdlib methodscan be used instead. (T401802)
Other breaking changes
Set minthrottleto 0.1 due to Wikimedia Bot Policy. (T414170)
Clean up user_agent_formatstring. Replace the first occurrence of “family”, “code”, or “lang” with “site”. The “lang” variable never worked properly. All of these can be replaced with “site”, which is recognized by Wikimedia traffic management. Also replace “script_product” by “script” and “version” by “revision”. Replace {script_product} with {username}/{script} in user_agent_format. (T414201)
Use global -code instead of -lang to determine a site. The old -lang option is kept for backward compatibility.
Protocol swapping in data.api.Requestwas removed. Family files should provide the correct protocol. (T414369)
Package requirements were updated (beautifulsoup4, fake-useragent, mwoauth, mwparserfromhell, packaging, Pillow, pydot, PyMySQL, python-stdnum, requests, requests-sse, wikitextparser)
Python 3.8 support was dropped. (T401802)
Remove predefined yu-tld fix in fixes. (T402088)
Deprecations
This section lists features, methods, parameters, or attributes that are deprecated and scheduled for removal in future Pywikibot releases.
Deprecated items may still work in the current release but are no longer recommended for use. Users should update their code according to the recommended alternatives.
Pywikibot follows a clear deprecation policy: features are typically deprecated in one release and removed in in the third subsequent major release, remaining available for the two releases in between.
Pending removal in Pywikibot 12
9.6.0: BaseSite.languages()will be removed in favour of BaseSite.codes
9.5.0: DataSite.getPropertyType()will be removed in favour of DataSite.get_property_type()
9.3.0: page.BasePage.userNameand page.BasePage.isIpEditare deprecated in favour of user or anon attributes of page.BasePage.latest_revisionproperty
9.3.0: botflag parameter of Page.save(), Page.put() and Page.set_redirect_target()was renamed to bot
9.2.0: All parameters of Page.templatesand Page.itertemplates()must be given as keyworded arguments
9.2.0: Imports of loggingfunctions from the botmodule are deprecated and will be desupported
9.2.0: total argument in -logevents pagegenerators option is deprecated; use -limit instead (T128981)
9.0.0: The content parameter of proofreadpage.IndexPage.page_genis deprecated and will be ignored (T358635)
9.0.0: next parameter of userinterfaces.transliteration.Transliterator.transliteratewas renamed to succ
9.0.0: userinterfaces.transliteration.transliterator object was renamed to Transliterator
9.0.0: The type parameter of site.APISite.protectedpages() was renamed to protect_type
9.0.0: The all parameter of site.APISite.namespace() was renamed to all_ns
9.0.0: filter parameter of date.dhwas renamed to filter_func
9.0.0: dict parameter of data.api.OptionSetwas renamed to data
9.0.0: pywikibot.version.get_toolforge_hostnameis deprecated with no replacement
9.0.0: allrevisions parameter of xmlreader.XmpDumpis deprecated, use revisions instead (T340804)
9.0.0: iteritems method of data.api.Requestwill be removed in favour of items
9.0.0: SequenceOutputter.output() is deprecated in favour of the tools.formatter.SequenceOutputter.outproperty
Pending removal in Pywikibot 13
10.6.0: The old (type, value, traceback) signature in tools.collections.GeneratorWrapper.throwwill be removed in Pywikibot 13, or earlier if it is dropped from a future Python release. (T340641)
10.6.0: Family.isPublic()will be removed (T407049)
10.6.0: Family.interwiki_replacementsis deprecated; use Family.code_aliasesinstead.
Keyword argument for char parameter of Transliterator.transliterate and positional arguments for prev and succ parameters are deprecated.
10.6.0: Positional arguments of daemonize()are deprecated and must be given as keyword arguments.
10.5.0: Accessing the fallback ‘*’ keys in ‘languages’, ‘namespaces’, ‘namespacealiases’, and ‘skins’ properties of APISite.siteinfoare deprecated and will be removed.
10.5.0: The methods APISite.protection_types() and APISite.protection_levels() are deprecated. APISite.restrictionsshould be used instead.
10.4.0: Require all parameters of Site.allpages() except start to be keyword arguments.
10.4.0: Positional arguments of pywikibot.Coordinateare deprecated and must be given as keyword arguments.
10.3.0: throttle.Throttle.getDelayand throttle.Throttle.setDelayswere renamed to get_delay()and set_delays() ; the old methods will be removed (T289318)
10.3.0: throttle.Throttle.next_multiplicityattribute is unused and will be removed (T289318)
10.3.0: requestsize parameter of throttle.Throttlecall is deprecated and will be dropped (T289318)
10.3.0: textlib.to_latin_digitswill be removed in favour of textlib.to_ascii_digits, NON_LATIN_DIGITS of userinterfaces.transliteration will be removed in favour of NON_ASCII_DIGITS (T398146#10958283)
10.2.0: tools.threading.RLockis deprecated and moved to backports module. The backports.RLock.countmethod is also deprecated. For Python 3.14+ use RLock from Python library threading instead. (T395182)
10.1.0: revid and date parameters of Page.authorship() were dropped
10.0.0: last_id of comms.eventstreams.EventStreamswas renamed to last_event_id (T309380)
10.0.0: ‘millenia’ argument for precision parameter of pywikibot.WbTimeis deprecated; ‘millennium’ must be used instead
10.0.0: includeredirects parameter of pagegenerators.AllpagesPageGeneratorand pagegenerators.PrefixingPageGeneratoris deprecated and should be replaced by filterredir
Pending removal in Pywikibot 14
Keyword parameters for text_a and text_b of diff.PatchManagerare deprecated. Positional parameters for by_letter and replace_invisible are deprecated.
Release history
See https://github.com/wikimedia/pywikibot/blob/stable/HISTORY.rst
Contributing
Our code is maintained on Wikimedia’s Gerrit installation, learn how to get started.
Code of Conduct
The development of this software is covered by a Code of Conduct.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pywikibot-11.0.0.tar.gz.
File metadata
- Download URL: pywikibot-11.0.0.tar.gz
- Upload date:
- Size: 641.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.15.0a1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
376677d70f770e4c811dd4f0a117574febd77888fbc85426fd2e9ef2ff777ad3
|
|
| MD5 |
610611a585fd8d4ab0e54d6db1b91717
|
|
| BLAKE2b-256 |
92bc7fe6f39f262652e0ec8e93f15567634a8bde046248b11e139710151814f7
|
File details
Details for the file pywikibot-11.0.0-py3-none-any.whl.
File metadata
- Download URL: pywikibot-11.0.0-py3-none-any.whl
- Upload date:
- Size: 740.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.15.0a1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a03b488408a681ffe215da87f152e8b8bcfb5e1216a604ace1225c18b59202f9
|
|
| MD5 |
43339004c859bc80f5e6e0a91b997132
|
|
| BLAKE2b-256 |
43dfb90070c9150d426aeba5d4d3e6af0352ca8b90926316b27374947f5f64b5
|