Skip to main content

A set of utilities for processing MediaWiki XML dump data.

Project description

# MediaWiki XML

This library contains a collection of utilities for efficiently processing MediaWiki’s XML database dumps. There are two important concerns that this module intends to address: complexity and performance of streaming XML parsing. This library enables memory efficent stream processing of XML dumps with a simple [iterator](https://pythonhosted.org/mwxml/iteration.html) strategy. This library also implements a distributed processing strategy (see [map()](https://pythonhosted.org/mwxml/map.html)) that enables parallel processing of many XML dump files at the same time.

## Example

>>> import mwxml
>>>
>>> dump = mwxml.Dump.from_file(open("dump.xml"))
>>> print(dump.site_info.name, dump.site_info.dbname)
Wikipedia enwiki
>>>
>>> for page in dump:
...     for revision in page:
...        print(revision.id)
...
1
2
3

## Author * Aaron Halfaker – https://github.com/halfak

## See also * http://dumps.wikimedia.org/ * http://community.wikia.com/wiki/Help:Database_download

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mwxml-0.3.7.tar.gz (18.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mwxml-0.3.7-py2.py3-none-any.whl (35.1 kB view details)

Uploaded Python 2Python 3

File details

Details for the file mwxml-0.3.7.tar.gz.

File metadata

  • Download URL: mwxml-0.3.7.tar.gz
  • Upload date:
  • Size: 18.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.15.0 CPython/3.7.7

File hashes

Hashes for mwxml-0.3.7.tar.gz
Algorithm Hash digest
SHA256 968ac549f1b9e973d26c5d12443fb64a5bb0a76c7f9a58feaf0ebe00e8ab249b
MD5 81fdc0222abd9b03f03f334570070d4c
BLAKE2b-256 01f95f22f701fde580ccff38146ca8c5d6d19c922f6b176148f1feec2eb8d777

See more details on using hashes here.

File details

Details for the file mwxml-0.3.7-py2.py3-none-any.whl.

File metadata

  • Download URL: mwxml-0.3.7-py2.py3-none-any.whl
  • Upload date:
  • Size: 35.1 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.24.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.15.0 CPython/3.7.7

File hashes

Hashes for mwxml-0.3.7-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 1d2b931454c46a6cf3af5af40c905fd69c370ba8ae314019c2a4327ff9c4790c
MD5 62af838eda107ffa25f21512316e9e36
BLAKE2b-256 94d4c035751d4562136f4f8f9ba1dc50e3726cda68dfc5db56b91580f4c1dcc7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page