Skip to main content

A set of utilities for processing MediaWiki XML dump data.

Project description

# MediaWiki XML

This library contains a collection of utilities for efficiently processing MediaWiki’s XML database dumps. There are two important concerns that this module intends to address: complexity and performance of streaming XML parsing. This library enables memory efficent stream processing of XML dumps with a simple [iterator](https://pythonhosted.org/mwxml/iteration.html) strategy. This library also implements a distributed processing strategy (see [map()](https://pythonhosted.org/mwxml/map.html)) that enables parallel processing of many XML dump files at the same time.

## Example

>>> import mwxml
>>>
>>> dump = mwxml.Dump.from_file(open("dump.xml"))
>>> print(dump.site_info.name, dump.site_info.dbname)
Wikipedia enwiki
>>>
>>> for page in dump:
...     for revision in page:
...        print(revision.id)
...
1
2
3

## Author * Aaron Halfaker – https://github.com/halfak

## See also * http://dumps.wikimedia.org/ * http://community.wikia.com/wiki/Help:Database_download

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mwxml-0.3.6.tar.gz (18.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mwxml-0.3.6-py2.py3-none-any.whl (33.2 kB view details)

Uploaded Python 2Python 3

File details

Details for the file mwxml-0.3.6.tar.gz.

File metadata

  • Download URL: mwxml-0.3.6.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.10

File hashes

Hashes for mwxml-0.3.6.tar.gz
Algorithm Hash digest
SHA256 5a53181d302152ad03ec513ff186a89e0c7e1fcc50c78330452a0872caa63935
MD5 8de2c5fccf366a4eaa990a01b54aa37e
BLAKE2b-256 a8d378e0b7d2ac9a8e5e4af1157e30d3ae575edc0cdb618706ea4eb8649cc099

See more details on using hashes here.

File details

Details for the file mwxml-0.3.6-py2.py3-none-any.whl.

File metadata

  • Download URL: mwxml-0.3.6-py2.py3-none-any.whl
  • Upload date:
  • Size: 33.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.8.10

File hashes

Hashes for mwxml-0.3.6-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 f5e0cde46c7d4b0d1d921f8f0aa14d691b6eaa6532b901dacc6c7407be26c70a
MD5 66f7acd05b591b47c82f0228a0397d57
BLAKE2b-256 3be8cf48d7e707faf85e5bcaaa08d02d5ee9dd4b64c45a6e4c11c5d71f798d86

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page