Tools to manipulate and extract data from wikipedia dumps
Project description
wikidump
Introduction
This module contains code for manipulating wikipedia dumps available from http://download.wikimedia.org/backup-index.html
Installation
This module is published on PyPI and can be installed with easy_install
For example:
easy_install wikidump
Alternatively, you can use pip:
pip install wikidump
I highly recommend using virtualenv to isolate the install environment.
For those on ubuntu systems, a built package is available in a PPA. Please go to the PPA for details on how to install from it.
Configuration
Upon first importing the module, a file ‘wikidump.cfg’ will be created. Modify the paths in this file to point to your data.
scratch : where indices are stores (must be writeable)
xml_dumps : where the xml dumps are located (can be read-only)
Usage
In addition to python modules, wikidump also comes with a command-line tool to quickly access wikidump functionality. Run wikidump help for a list of options.
Credits
News
0.1
Release date: 04-Aug-2010
Initial release of wikidump module
0.1.3
Release date: 10-Apr-2013
Rewrote CLI
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file wikidump-0.1.3.tar.gz
.
File metadata
- Download URL: wikidump-0.1.3.tar.gz
- Upload date:
- Size: 17.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5d9198cd64065f6fd31c618bde6b3f76815c75e338942eed8addba751e2bab9f |
|
MD5 | 0f57675990d6101d74bfb4cb9a59ce43 |
|
BLAKE2b-256 | 20f0ab98925a8f15ade5088de2b5f1b7351f6828e7766ab6cca0c11dce22d449 |