A collection of scripts and utilities to support the stream-processing of MediaWiki data.
Project description
A collection of scripts and utilities to support the stream-processing of MediaWiki data.
dump2json – Converts an XML dump to a stream of revision JSON blobs
- wikihadoop2json – Converts a Wikihadoop-processed stream of XML pages to JSON
blobs
json2tsv – Converts a stream of JSON blobs to tab-separated values
- json2diffs – Computes and adds a “diff” field to a stream of revision JSON
blobs
- diffs2persistence – Computes token persistence from a stream of JSON revision
diff blobs and adds a “persistence” field.
- persistence2revstats – Aggregates a stream of token persistence to revision
statistics
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
mwstreaming-0.2.1.zip
(14.0 kB
view hashes)
mwstreaming-0.2.1.tar.gz
(7.8 kB
view hashes)