This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (pypi.python.org).
Help us improve Python packaging - Donate today!

A collection of scripts and utilities to support the stream-processing of MediaWiki data.

Project Description

A set of utilities for stream-processing MediaWiki data.

Usage

mwstream (-h | --help)

mwstream <utility> [-h|--help]

Data processing utilities

diffs2persistence
Generates token persistence statistics using revision JSON blobs with diff information.
dump2json
Converts an XML dump to a stream of revision JSON blobs
dump2diffs
Computes diffs directly from an XML dump
json2diffs
Computes and adds a “diff” field to a stream of revision JSON blobs
mend_diffs
Mends diffs that were computed in chunks and out of order.
persistence2stats
Aggregates a token persistence statistics to revision statistics
wikihadoop2json
Converts a Wikihadoop-processed stream of XML pages to JSON blobs

General utilities

json2tsv
Converts a stream of JSON blobs to tab-separated values based a set of fieldnames.
normalize
Normalizes old versions of RevisionDocument json schemas to correspond to the most recent schema version.
validate
Validates JSON against a provided schema.
truncate_text
Truncates the ‘text’ field of JSON blobs to a limited length in unicode characters. (addresses content dump vandalism issues) and adds a boolean ‘truncated’ field.

Installation

pip install mwstreaming
Release History

Release History

This version
History Node

0.5.5

History Node

0.5.4

History Node

0.5.3

History Node

0.5.2

History Node

0.5.1

History Node

0.5.0

History Node

0.4.0

History Node

0.3.0

History Node

0.2.5

History Node

0.2.4

History Node

0.2.3

History Node

0.2.2

History Node

0.2.1

History Node

0.2.0

History Node

0.1.0

Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
mwstreaming-0.5.5.tar.gz (12.5 kB) Copy SHA256 Checksum SHA256 Source Apr 15, 2015
mwstreaming-0.5.5.zip (23.3 kB) Copy SHA256 Checksum SHA256 Source Apr 15, 2015

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting