Skip to main content

A collection of scripts and utilities to support the stream-processing of MediaWiki data.

Project description

A collection of scripts and utilities to support the stream-processing of MediaWiki data.

  • dump2json – Converts an XML dump to a stream of revision JSON blobs

  • wikihadoop2json – Converts a Wikihadoop-processed stream of XML pages to JSON

    blobs

  • json2tsv – Converts a stream of JSON blobs to tab-separated values

  • json2diffs – Computes and adds a “diff” field to a stream of revision JSON

    blobs

  • diffs2persistence – Computes token persistence from a stream of JSON revision

    diff blobs and adds a “persistence” field.

  • persistence2revstats – Aggregates a stream of token persistence to revision

    statistics

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

mwstreaming-0.2.0.zip (14.1 kB view hashes)

Uploaded Source

mwstreaming-0.2.0.tar.gz (7.8 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page