Skip to main content

A library for performing automatic detection of assessment classes of Wikipedia articles.

Project description

Wikipedia article quality classification

This library provides a set of utilities for performing automatic detection of assessment classes of Wikipedia articles. For more information, see the full documentation at https://articlequality.readthedocs.io .

Compatible with Python 3.x only. Sorry.

Basic usage

>>> import articlequality
>>> from revscoring import Model
>>>
>>> scorer_model = Model.load(open("models/enwiki.nettrom_wp10.gradient_boosting.model", "rb"))
>>>
>>> text = "I am the text of a page.  I have a <ref>word</ref>"
>>> articlequality.score(scorer_model, text)
{'prediction': 'stub',
 'probability': {'stub': 0.27156163795807853,
                 'b': 0.14707452309674252,
                 'fa': 0.16844898943510833,
                 'c': 0.057668704007171959,
                 'ga': 0.21617801281707663,
                 'start': 0.13906813268582238}}

Install

Requirements

  • Python 3.5, 3.6 or 3.7
  • All the system requirements of revscoring

Installation steps

  1. clone this repository
  2. install the package itself and its dependencies python setup.py install
  3. You can verify that your installation worked by running make enwiki_models to build the English Wikipedia article quality model or make wikidatawiki_models to build the item quality model for Wikidata

Retraining the models

To retrain a model, run make -B MODEL e.g. make -B wikidatawiki_models. This will redownload the labels, re-extract the features from the revisions, and then retrain and rescore the model.

To skip re-downloading the training labels and re-extracting the features, it is enough touch the files in the datasets/ directory and run the make command without the -B flag.

Running tests

Example:

pytest -vv tests/feature_lists/test_wikidatawiki.py

Authors

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

articlequality-0.4.4.tar.gz (37.0 kB view details)

Uploaded Source

Built Distribution

articlequality-0.4.4-py2.py3-none-any.whl (56.1 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file articlequality-0.4.4.tar.gz.

File metadata

  • Download URL: articlequality-0.4.4.tar.gz
  • Upload date:
  • Size: 37.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.11.1

File hashes

Hashes for articlequality-0.4.4.tar.gz
Algorithm Hash digest
SHA256 c2a5b504890e5e41db17e44cdc5b473da73dbaa094b004013af9b4d771717262
MD5 e2d569caca034ea693310672b4f40ee4
BLAKE2b-256 bb34f0817607bff0e4b1f6da7c328cd06db69f6adbc5aec06be0b53ed06f0ca3

See more details on using hashes here.

File details

Details for the file articlequality-0.4.4-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for articlequality-0.4.4-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 eae688b3bf7d1c0b2a7b72e7c1bb92f18e40b604efb3ff138c091f68ed4e3b2d
MD5 1a6328818e9f111602d8c26f176f7d40
BLAKE2b-256 c471a732ea3f6296f8906956eaed94aeff6485890a49070528cc2f3088860946

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page