Skip to main content

Wikistats-to-CSV downloads Wikipedia Statistics in CSV format for a given Wikipedia.

Project description

Wikistats-to-CSV

wikistats2csv-logo

Wikistats-to-CSV (wikistats2csv) is a Python Package (PIP) and Command Line Interface (CLI) that downloads Wikipedia Statistics for a given Wikipedia in a format of CSV from Wikimedia Statistics project.

Install:

Wikistats-to-CSV (wikistats2csv) requires Python >=3 and the installation of a few Python packages such as lxml==4.9.1, rich==12.5.1, numpy==1.23.2, pandas==1.4.3, selenium==3.141.0, and geckodriver-autoinstaller==0.1.0. For convenience, we included the installation of these packages as a part of the setup process of Wikistats-to-CSV (wikistats2csv). If you encounter installation errors, you might need to install these packages using pip manually.

python3 -m pip install -r requirements.txt

To download Wikistats-to-CSV (wikistats2csv) using pip command , we highly recommend you first upgrade the pip command to the latest version.

python3 -m pip install --upgrade pip
python3 -m pip install wikistats2csv

If you encounter a warning of "WARNING: the script is installed in '/Users/.../.../bin' which is not on path", then you need to add the displayed path "/Users/.../.../bin" to the $PATH variable using this command:

export PATH="/Users/.../.../bin:$PATH"

Usage:

* As CLI:

>> Long Flags:

$ wikistats2csv --wiki en --metric content --query pages-to-date --period all-years --filter page-type-all --interval monthly

              ▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│▌║│█║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
              ▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║WIKISTATS-TO-CSV║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
              ▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│▌║│█║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌

## Downloaded `english--pages-to-date--page-type-all--all-years--monthly.csv` successfully :-)

** Quick glance at `english--pages-to-date--page-type-all--all-years--monthly.csv` file:
                        month  total.non-content  total.content           timeRange.start             timeRange.end
0    2001-01-01T00:00:00.000Z                 28             37  2001-01-01T00:00:00.000Z  2001-02-01T00:00:00.000Z
1    2001-02-01T00:00:00.000Z                 51            175  2001-02-01T00:00:00.000Z  2001-03-01T00:00:00.000Z
..                        ...                ...            ...                       ...                       ...
257  2022-06-01T00:00:00.000Z           36945305        6518484  2022-06-01T00:00:00.000Z  2022-07-01T00:00:00.000Z
258  2022-07-01T00:00:00.000Z           37088260        6534151  2022-07-01T00:00:00.000Z  2022-08-01T00:00:00.000Z

>> Short Flags:

$ wikistats2csv -w ar -m content -q pages-to-date -p all-years -f page-type-all -i monthly

              ▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│▌║│█║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
              ▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║WIKISTATS-TO-CSV║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
              ▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│▌║│█║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌

## Downloaded `arabic--pages-to-date--page-type-all--all-years--monthly.csv` successfully :-)

** Quick glance at `arabic--pages-to-date--page-type-all--all-years--monthly.csv` file:
                        month  total.non-content  total.content           timeRange.start             timeRange.end
0    2001-01-01T00:00:00.000Z                  0            591  2001-01-01T00:00:00.000Z  2001-02-01T00:00:00.000Z
1    2001-02-01T00:00:00.000Z                  0            591  2001-02-01T00:00:00.000Z  2001-03-01T00:00:00.000Z
..                        ...                ...            ...                       ...                       ...
257  2022-06-01T00:00:00.000Z            5508072        1173410  2022-06-01T00:00:00.000Z  2022-07-01T00:00:00.000Z
258  2022-07-01T00:00:00.000Z            5538121        1180401  2022-07-01T00:00:00.000Z  2022-08-01T00:00:00.000Z 

* As Python Package:

>>> from wikistats2csv import Content
>>> Content.pages_to_date(wiki='es', period='all-years', filter='page-type-all', interval='monthly')

## Downloaded `spanish--pages-to-date--page-type-all--all-years--monthly.csv` successfully :-)

** Quick glance at `spanish--pages-to-date--page-type-all--all-years--monthly.csv` file:
                        month  total.non-content  total.content           timeRange.start             timeRange.end
0    2001-01-01T00:00:00.000Z                  0              0  2001-01-01T00:00:00.000Z  2001-02-01T00:00:00.000Z
1    2001-02-01T00:00:00.000Z                  0              0  2001-02-01T00:00:00.000Z  2001-03-01T00:00:00.000Z
..                        ...                ...            ...                       ...                       ...
257  2022-06-01T00:00:00.000Z            3896209        1786321  2022-06-01T00:00:00.000Z  2022-07-01T00:00:00.000Z
258  2022-07-01T00:00:00.000Z            3903963        1792329  2022-07-01T00:00:00.000Z  2022-08-01T00:00:00.000Z 

Supported Features:

Content Class/Metrics:

Queries*/Functions** Periods Filters*** Intervals
absolute-bytes-difference*
absolute_bytes_difference**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot, 
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all
daily,
monthly
edited-pages*
edited_pages**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot,
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all, 
activity-level-1-to-4-edits, 
activity-level-5-to-24-edits, 
activity-level-25-to-99-edits, 
activity-level-100-or-more-edits,   
activity-level-all
daily,
monthly
net-bytes-difference*
net_bytes_difference**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot, 
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all
daily,
monthly
pages-to-date*
pages_to_date**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot, 
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all
daily,
monthly
total-media-requests*
total_media_requests**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter,
media-type-image,
media-type-video,
media-type-audio,
media-type-document,
media-type-other,
media-type-all,
agent-type-user,
agent-type-spider,
agent-type-all
daily,
monthly
top-media-requests*
top_media_requests**
last-month no-filter,
media-type-image,
media-type-video,
media-type-audio,
media-type-document,
media-type-other,
media-type-all
daily,
monthly

 * CLI Queries.        ** Py Functions.        *** More complex filters are coming to the new versions.

Contributing Metrics/Class:

Queries*/Functions** Periods Filters*** Intervals
editors* ** all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot, 
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all,
activity-level-1-to-4-edits, 
activity-level-5-to-24-edits, 
activity-level-25-to-99-edits, 
activity-level-100-or-more-edits,  
activity-level-all
daily,
monthly
active-editors*
active_editors**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all
daily,
monthly
edits* ** all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot, 
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all
daily,
monthly
user-edits*
user_edits**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all
daily,
monthly
new-pages*
new_pages**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot, 
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all
daily,
monthly
new-registered-users*
new_registered_users**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter daily,
monthly
top-editors*
top_editors**
last-month no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot, 
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all
daily,
monthly
top-edited-pages*
top_edited_pages**
last-month no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot, 
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all
daily,
monthly
active-editors-by-country*
active_editors_by_country**
last-month activity-level-5-to-99-edits,
activity-level-100-or-more-edits
daily,
monthly

 * CLI Queries.        ** Py Functions.        *** More complex filters are coming to the new versions.

Reading Metrics/Class:

Queries*/Functions** Periods Filters*** Intervals
total-page-views*
total_page_views**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter,
access-method-desktop,
access-method-mobile-app,
access-method-mobile-web,
access-method-all,
agent-type-user,
agent-type-spider,
agent-type-automated,
agent-type-all
daily,
monthly
legacy-page-views*
legacy_page_views**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter,
access-site-mobile-site,
access-site-desktop-site,
access-site-all
daily,
monthly
page-views-by-country*
page_views_by_country**
last-month no-filter,
access-method-desktop,
access-method-mobile-app,
access-method-mobile-web,
access-method-all
daily,
monthly
unique-devices*
unique_devices**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter,
access-site-mobile-site,
access-site-desktop-site,
access-site-all
daily,
monthly
top-viewed-articles*
top_viewed_articles**
last-month no-filter,
access-method-desktop,
access-method-mobile-app,
access-method-mobile-web,
access-method-all
daily,
monthly

* CLI Queries.        ** Py Functions.        *** More complex filters are coming to the new versions.

Extra Features:

List All Wikipedia Languages with its Codes:

* As CLI:

To return the full list of all Wikipedia's supported languages with their codes, try one of these commands:

$ wikistats2csv -lw
# OR
$ wikistats2csv --list-wikis

* As Python Package:

from wikistats2csv import Helper
Helper.get_Wikis_Codes()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikistats2csv-0.1.6.tar.gz (39.0 kB view details)

Uploaded Source

File details

Details for the file wikistats2csv-0.1.6.tar.gz.

File metadata

  • Download URL: wikistats2csv-0.1.6.tar.gz
  • Upload date:
  • Size: 39.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.10.7

File hashes

Hashes for wikistats2csv-0.1.6.tar.gz
Algorithm Hash digest
SHA256 64699163039069a582221673a166a760f979a7370c745b86445a461b1c72415c
MD5 83e5595a7b7decef2a2d3b5714a8693e
BLAKE2b-256 c6c4eec2ba91c41453996ba99f5673b580b5d6a50bf2a8d9d1ac824a972b3ff2

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page