Skip to main content

Tools to manipulate and extract data from wikipedia dumps

Project description

wikidump

Introduction

This module contains code for manipulating wikipedia dumps available from http://download.wikimedia.org/backup-index.html

Installation

This module is published on PyPI and can be installed with easy_install

For example:

easy_install wikidump

Alternatively, you can use pip:

pip install wikidump

I highly recommend using virtualenv to isolate the install environment.

For those on ubuntu systems, a built package is available in a PPA. Please go to the PPA for details on how to install from it.

Configuration

Upon first importing the module, a file ‘wikidump.cfg’ will be created. Modify the paths in this file to point to your data.

  • scratch : where indices are stores (must be writeable)

  • xml_dumps : where the xml dumps are located (can be read-only)

Usage

In addition to python modules, wikidump also comes with a command-line tool to quickly access wikidump functionality. Run wikidump help for a list of options.

Credits

News

0.1

Release date: 04-Aug-2010

  • Initial release of wikidump module

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikidump-0.1.2.tar.gz (15.4 kB view details)

Uploaded Source

File details

Details for the file wikidump-0.1.2.tar.gz.

File metadata

  • Download URL: wikidump-0.1.2.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for wikidump-0.1.2.tar.gz
Algorithm Hash digest
SHA256 744085a2b7ca03c376535b165db2c9196f2349a72b1f468e43e58ca41f2e80a1
MD5 cfc6ac24edb995a9bbc1076ddf72da82
BLAKE2b-256 def0fd0b17b9cdffca9e4d1e345f33e7c8cf02d4bb455e9f3f757f9e3d4f2723

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page