Tools to manipulate and extract data from wikipedia dumps
Project description
wikidump
Introduction
This module contains code for manipulating wikipedia dumps available from http://download.wikimedia.org/backup-index.html
Installation
This module is published on PyPI and can be installed with easy_install
For example:
easy_install wikidump
Alternatively, you can use pip:
pip install wikidump
I highly recommend using virtualenv to isolate the install environment.
For those on ubuntu systems, a built package is available in a PPA. Please go to the PPA for details on how to install from it.
Configuration
Upon first importing the module, a file ‘wikidump.cfg’ will be created. Modify the paths in this file to point to your data.
scratch : where indices are stores (must be writeable)
xml_dumps : where the xml dumps are located (can be read-only)
Usage
In addition to python modules, wikidump also comes with a command-line tool to quickly access wikidump functionality. Run wikidump help for a list of options.
Credits
News
0.1
Release date: 04-Aug-2010
Initial release of wikidump module
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.