Skip to main content

A set of utilities for generating quality scores for MediaWiki revisions

Project description

Build Status Test coverage GitHub license PyPI version

Revision Scoring

A generic, machine learning-based revision scoring system designed to help automate critical wiki-work — for example, vandalism detection and removal. This library powers ORES.


Using a scorer_model to score a revision::

  import mwapi
  from revscoring import Model
  from revscoring.extractors.api.extractor import Extractor

  with open("models/enwiki.damaging.linear_svc.model") as f:
       scorer_model = Model.load(f)

  extractor = Extractor(mwapi.Session(host="",
                                          user_agent="revscoring demo"))

  feature_values = list(extractor.extract(123456789, scorer_model.features))

  {'prediction': True, 'probability': {False: 0.4694409344514984, True: 0.5305590655485017}}


The easiest way to install is via the Python package installer (pip).

pip install revscoring

You may find that some of the dependencies fail to compile (namely scipy, numpy and sklearn). In that case, you'll need to install some dependencies in your operating system.

Ubuntu & Debian:

  • Run sudo apt-get install python3-dev g++ gfortran liblapack-dev libopenblas-dev
  • Run sudo apt-get install aspell-ar aspell-bn aspell-el aspell-id aspell-is aspell-pl aspell-ro aspell-sv aspell-ta aspell-uk myspell-cs myspell-de-at myspell-de-ch myspell-de-de myspell-es myspell-et myspell-fa myspell-fr myspell-he myspell-hr myspell-hu myspell-lv myspell-nb myspell-nl myspell-pt-pt myspell-pt-br myspell-ru myspell-hr hunspell-bs hunspell-ca hunspell-en-au hunspell-en-us hunspell-en-gb hunspell-eu hunspell-gl hunspell-it hunspell-hi hunspell-sr hunspell-vi voikko-fi


Using Homebrew and pip, installing revscoring and enchant can be accomplished as follows::

brew install aspell --with-all-languages
brew install enchant
pip install --no-binary pyenchant revscoring

Adding languages in aspell (MacOS only)

cd /tmp
bzip2 -dc aspell-pt-0.50-2.tar.bz2 | tar xvf -
cd aspell-pt-0.50-2
sudo make install

<u> The differences between the aspell and myspell dictionaries can cause <u>some of the tests to fail

Finally, in order to make use of language features, you'll need to download some NLTK data. The following command will get the necessary corpora.

python -m nltk.downloader omw sentiwordnet stopwords wordnet

You'll also need to install enchant-compatible dictionaries of the languages you'd like to use. We recommend the following:

  • languages.arabic: aspell-ar
  • languages.basque: hunspell-eu
  • languages.bengali: aspell-bn
  • languages.bosnian: hunspell-bs
  • languages.catalan: myspell-ca
  • languages.czech: myspell-cs
  • languages.croatian: myspell-hr
  • languages.dutch: myspell-nl
  • languages.english: myspell-en-us myspell-en-gb myspell-en-au
  • languages.estonian: myspell-et
  • languages.finnish: voikko-fi
  • languages.french: myspell-fr
  • languages.galician: hunspell-gl
  • languages.german: myspell-de-at myspell-de-ch myspell-de-de
  • languages.greek: aspell-el
  • languages.hebrew: myspell-he
  • languages.hindi: aspell-hi
  • languages.hungarian: myspell-hu
  • languages.icelandic: aspell-is
  • languages.indonesian: aspell-id
  • languages.italian: myspell-it
  • languages.latvian: myspell-lv
  • languages.norwegian: myspell-nb
  • languages.persian: myspell-fa
  • languages.polish: aspell-pl
  • languages.portuguese: myspell-pt-pt myspell-pt-br
  • languages.serbian: hunspell-sr
  • languages.spanish: myspell-es
  • languages.swedish: aspell-sv
  • languages.tamil: aspell-ta
  • languages.russian: myspell-ru
  • languages.ukrainian: aspell-uk
  • languages.vietnamese: hunspell-vi


To contribute, ensure to install the dependencies:

$ pip install -r requirements.txt

Install necessary NLTK data:

python -m nltk.downloader omw sentiwordnet stopwords wordnet

Running tests

Make sure you install test dependencies:

$ pip install -r test-requirements.txt

Then run:

$ pytest . -vv

Reporting bugs

To report a bug, please use Phabricator


Project details

Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for revscoring, version 2.9.3
Filename, size File type Python version Upload date Hashes
Filename, size revscoring-2.9.3-py2.py3-none-any.whl (380.1 kB) File type Wheel Python version py2.py3 Upload date Hashes View
Filename, size revscoring-2.9.3.tar.gz (266.8 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page