Skip to main content
This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (pypi.python.org).
Help us improve Python packaging - Donate today!

Tools to manipulate and extract data from wikipedia dumps

Project Description

wikidump

Introduction

This module contains code for manipulating wikipedia dumps available from http://download.wikimedia.org/backup-index.html

Installation

This module is published on PyPI and can be installed with easy_install

For example:

easy_install wikidump

Alternatively, you can use pip:

pip install wikidump

I highly recommend using virtualenv to isolate the install environment.

For those on ubuntu systems, a built package is available in a PPA. Please go to the PPA for details on how to install from it.

Configuration

Upon first importing the module, a file ‘wikidump.cfg’ will be created. Modify the paths in this file to point to your data.

  • scratch : where indices are stores (must be writeable)
  • xml_dumps : where the xml dumps are located (can be read-only)

Usage

In addition to python modules, wikidump also comes with a command-line tool to quickly access wikidump functionality. Run wikidump help for a list of options.

News

0.1

Release date: 04-Aug-2010

  • Initial release of wikidump module

0.1.3

Release date: 10-Apr-2013

  • Rewrote CLI

Release History

This version
History Node

0.1.3

History Node

0.1.2

History Node

0.1.1

History Node

0.1

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, Size & Hash SHA256 Hash Help File Type Python Version Upload Date
wikidump-0.1.3.tar.gz
(17.3 kB) Copy SHA256 Hash SHA256
Source None Apr 10, 2013

Supported By

Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Google Google Cloud Servers DreamHost DreamHost Log Hosting