Skip to main content

A set of utilities for processing MediaWiki XML dump data.

Project description

# MediaWiki XML

This library contains a collection of utilities for efficiently processing MediaWiki’s XML database dumps. There are two important concerns that this module intends to address: complexity and performance of streaming XML parsing. This library enables memory efficent stream processing of XML dumps with a simple [iterator](https://pythonhosted.org/mwxml/iteration.html) strategy. This library also implements a distributed processing strategy (see [map()](https://pythonhosted.org/mwxml/map.html)) that enables parallel processing of many XML dump files at the same time.

## Example

>>> import mwxml
>>>
>>> dump = mwxml.Dump.from_file(open("dump.xml"))
>>> print(dump.site_info.name, dump.site_info.dbname)
Wikipedia enwiki
>>>
>>> for page in dump:
...     for revision in page:
...        print(revision.id)
...
1
2
3

## Author * Aaron Halfaker – https://github.com/halfak

## See also * http://dumps.wikimedia.org/ * http://community.wikia.com/wiki/Help:Database_download

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mwxml-0.3.4.tar.gz (18.2 kB view details)

Uploaded Source

Built Distribution

mwxml-0.3.4-py2.py3-none-any.whl (27.5 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file mwxml-0.3.4.tar.gz.

File metadata

  • Download URL: mwxml-0.3.4.tar.gz
  • Upload date:
  • Size: 18.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for mwxml-0.3.4.tar.gz
Algorithm Hash digest
SHA256 7a37f745f770704a7419efbde9d391b874b9071dbc192b3b1f81c3d4b52775ee
MD5 93b2430b466dca644003f79612a3d5c3
BLAKE2b-256 f44506b0018fcb876174e0ef996d936c114a5375e23c7121c6eb84ddfc3c5543

See more details on using hashes here.

File details

Details for the file mwxml-0.3.4-py2.py3-none-any.whl.

File metadata

  • Download URL: mwxml-0.3.4-py2.py3-none-any.whl
  • Upload date:
  • Size: 27.5 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for mwxml-0.3.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 f109225a47f629a1ddf73826c462efb9dc6fa6df7c546c6ac452424cc6034d52
MD5 0dec28f32120d2772976cb465a3ebc62
BLAKE2b-256 2a3bdab72fc52e0b89034b2e1bb191024e5415fa9afab93afa1e997295d44c4b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page