Skip to main content

Tools to manipulate and extract data from wikipedia dumps

Project description

wikidump

Introduction

This module contains code for manipulating wikipedia dumps available from http://download.wikimedia.org/backup-index.html

Installation

This module is published on PyPI and can be installed with easy_install

For example:

easy_install wikidump

Alternatively, you can use pip:

pip install wikidump

I highly recommend using virtualenv to isolate the install environment.

For those on ubuntu systems, a built package is available in a PPA. Please go to the PPA for details on how to install from it.

Configuration

Upon first importing the module, a file ‘wikidump.cfg’ will be created. Modify the paths in this file to point to your data.

  • scratch : where indices are stores (must be writeable)

  • xml_dumps : where the xml dumps are located (can be read-only)

Usage

In addition to python modules, wikidump also comes with a command-line tool to quickly access wikidump functionality. Run wikidump help for a list of options.

Credits

News

0.1

Release date: 04-Aug-2010

  • Initial release of wikidump module

0.1.3

Release date: 10-Apr-2013

  • Rewrote CLI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikidump-0.1.3.tar.gz (17.3 kB view details)

Uploaded Source

File details

Details for the file wikidump-0.1.3.tar.gz.

File metadata

  • Download URL: wikidump-0.1.3.tar.gz
  • Upload date:
  • Size: 17.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for wikidump-0.1.3.tar.gz
Algorithm Hash digest
SHA256 5d9198cd64065f6fd31c618bde6b3f76815c75e338942eed8addba751e2bab9f
MD5 0f57675990d6101d74bfb4cb9a59ce43
BLAKE2b-256 20f0ab98925a8f15ade5088de2b5f1b7351f6828e7766ab6cca0c11dce22d449

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page