Skip to main content

A set of utilities for processing MediaWiki XML dump data.

Project description

# MediaWiki XML

This library contains a collection of utilities for efficiently processing MediaWiki’s XML database dumps. There are two important concerns that this module intends to address: complexity and performance of streaming XML parsing. This library enables memory efficent stream processing of XML dumps with a simple [iterator](https://pythonhosted.org/mwxml/iteration.html) strategy. This library also implements a distributed processing strategy (see [map()](https://pythonhosted.org/mwxml/map.html)) that enables parallel processing of many XML dump files at the same time.

## Example

>>> import mwxml
>>>
>>> dump = mwxml.Dump.from_file(open("dump.xml"))
>>> print(dump.site_info.name, dump.site_info.dbname)
Wikipedia enwiki
>>>
>>> for page in dump:
...     for revision in page:
...        print(revision.id)
...
1
2
3

## Author * Aaron Halfaker – https://github.com/halfak

## See also * http://dumps.wikimedia.org/ * http://community.wikia.com/wiki/Help:Database_download

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mwxml-0.3.8.tar.gz (18.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mwxml-0.3.8-py2.py3-none-any.whl (35.1 kB view details)

Uploaded Python 2Python 3

File details

Details for the file mwxml-0.3.8.tar.gz.

File metadata

  • Download URL: mwxml-0.3.8.tar.gz
  • Upload date:
  • Size: 18.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.15.0 CPython/3.7.7

File hashes

Hashes for mwxml-0.3.8.tar.gz
Algorithm Hash digest
SHA256 ca2232a97ea931e9bb24f85b54a491058c23c074576372e711246afac70616e0
MD5 50538abb9a4610625f961c444af3356c
BLAKE2b-256 5ee0394649c0213f4f255afe6a9348761f1611ec7187dd42e5e8fac9068c7d7d

See more details on using hashes here.

File details

Details for the file mwxml-0.3.8-py2.py3-none-any.whl.

File metadata

  • Download URL: mwxml-0.3.8-py2.py3-none-any.whl
  • Upload date:
  • Size: 35.1 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.15.0 CPython/3.7.7

File hashes

Hashes for mwxml-0.3.8-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a5e0e540f8f9380e7eea8abacef988576cae0d7ec16c85d6f21a2d9973e86fba
MD5 6db64658f204454f012339e2e625e7fa
BLAKE2b-256 5a3d4341b7701dbd8a543b779f870c5e464b33c6e69eda2945ea8ea8fe78b56b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page