Tools to manipulate and extract data from wikipedia dumps
This module contains code for manipulating wikipedia dumps available from http://download.wikimedia.org/backup-index.html
This module is published on PyPI and can be installed with easy_install
Alternatively, you can use pip:
pip install wikidump
I highly recommend using virtualenv to isolate the install environment.
For those on ubuntu systems, a built package is available in a PPA. Please go to the PPA for details on how to install from it.
Upon first importing the module, a file ‘wikidump.cfg’ will be created. Modify the paths in this file to point to your data.
- scratch : where indices are stores (must be writeable)
- xml_dumps : where the xml dumps are located (can be read-only)
In addition to python modules, wikidump also comes with a command-line tool to quickly access wikidump functionality. Run wikidump help for a list of options.
Release date: 04-Aug-2010
- Initial release of wikidump module