Skip to main content

Wikistats-to-CSV downloads Wikipedia Statistics in CSV format for a given Wikipedia.

Project description

Wikistats-to-CSV

wikistats2csv-logo

Wikistats-to-CSV (wikistats2csv) is a Python Package (PIP) and Command Line Interface (CLI) that downloads Wikipedia Statistics for a given Wikipedia in a format of CSV from Wikimedia Statistics project.

Install:

Wikistats-to-CSV (wikistats2csv) requires Python >=3 and the installation of a few Python packages such as lxml==4.9.1, rich==12.5.1, numpy==1.23.2, pandas==1.4.3, selenium==3.141.0, and geckodriver-autoinstaller==0.1.0. For convenience, we included the installation of these packages as a part of the setup process of Wikistats-to-CSV (wikistats2csv). If you encounter installation errors, you might need to install these packages using pip manually.

python3 -m pip install -r requirements.txt

To download Wikistats-to-CSV (wikistats2csv) using pip command , we highly recommend you first upgrade the pip command to the latest version.

python3 -m pip install --upgrade pip
python3 -m pip install wikistats2csv

If you encounter a warning of "WARNING: the script is installed in '/Users/.../.../bin' which is not on path", then you need to add the displayed path "/Users/.../.../bin" to the $PATH variable using this command:

export PATH="/Users/.../.../bin:$PATH"

Usage:

* As CLI:

>> Long Flags:

$ wikistats2csv --wiki en --metric content --query pages-to-date --period all-years --filter page-type-all --interval monthly

              ▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│▌║│█║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
              ▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║WIKISTATS-TO-CSV║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
              ▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│▌║│█║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌

## Downloaded `english--pages-to-date--page-type-all--all-years--monthly.csv` successfully :-)

** Quick glance at `english--pages-to-date--page-type-all--all-years--monthly.csv` file:
                        month  total.non-content  total.content           timeRange.start             timeRange.end
0    2001-01-01T00:00:00.000Z                 28             37  2001-01-01T00:00:00.000Z  2001-02-01T00:00:00.000Z
1    2001-02-01T00:00:00.000Z                 51            175  2001-02-01T00:00:00.000Z  2001-03-01T00:00:00.000Z
..                        ...                ...            ...                       ...                       ...
257  2022-06-01T00:00:00.000Z           36945305        6518484  2022-06-01T00:00:00.000Z  2022-07-01T00:00:00.000Z
258  2022-07-01T00:00:00.000Z           37088260        6534151  2022-07-01T00:00:00.000Z  2022-08-01T00:00:00.000Z

>> Short Flags:

$ wikistats2csv -w ar -m content -q pages-to-date -p all-years -f page-type-all -i monthly

              ▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│▌║│█║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
              ▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║WIKISTATS-TO-CSV║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
              ▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│▌║│█║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌

## Downloaded `arabic--pages-to-date--page-type-all--all-years--monthly.csv` successfully :-)

** Quick glance at `arabic--pages-to-date--page-type-all--all-years--monthly.csv` file:
                        month  total.non-content  total.content           timeRange.start             timeRange.end
0    2001-01-01T00:00:00.000Z                  0            591  2001-01-01T00:00:00.000Z  2001-02-01T00:00:00.000Z
1    2001-02-01T00:00:00.000Z                  0            591  2001-02-01T00:00:00.000Z  2001-03-01T00:00:00.000Z
..                        ...                ...            ...                       ...                       ...
257  2022-06-01T00:00:00.000Z            5508072        1173410  2022-06-01T00:00:00.000Z  2022-07-01T00:00:00.000Z
258  2022-07-01T00:00:00.000Z            5538121        1180401  2022-07-01T00:00:00.000Z  2022-08-01T00:00:00.000Z 

* As Python Package:

>>> from wikistats2csv import Content
>>> Content.pages_to_date(wiki='es', period='all-years', filter='page-type-all', interval='monthly')

## Downloaded `spanish--pages-to-date--page-type-all--all-years--monthly.csv` successfully :-)

** Quick glance at `spanish--pages-to-date--page-type-all--all-years--monthly.csv` file:
                        month  total.non-content  total.content           timeRange.start             timeRange.end
0    2001-01-01T00:00:00.000Z                  0              0  2001-01-01T00:00:00.000Z  2001-02-01T00:00:00.000Z
1    2001-02-01T00:00:00.000Z                  0              0  2001-02-01T00:00:00.000Z  2001-03-01T00:00:00.000Z
..                        ...                ...            ...                       ...                       ...
257  2022-06-01T00:00:00.000Z            3896209        1786321  2022-06-01T00:00:00.000Z  2022-07-01T00:00:00.000Z
258  2022-07-01T00:00:00.000Z            3903963        1792329  2022-07-01T00:00:00.000Z  2022-08-01T00:00:00.000Z 

Supported Features:

Content Class/Metrics:

Queries*/Functions** Periods Filters*** Intervals
absolute-bytes-difference*
absolute_bytes_difference**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot, 
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all
daily,
monthly
edited-pages*
edited_pages**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot,
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all, 
activity-level-1-to-4-edits, 
activity-level-5-to-24-edits, 
activity-level-25-to-99-edits, 
activity-level-100-or-more-edits,   
activity-level-all
daily,
monthly
net-bytes-difference*
net_bytes_difference**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot, 
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all
daily,
monthly
pages-to-date*
pages_to_date**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot, 
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all
daily,
monthly
total-media-requests*
total_media_requests**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter,
media-type-image,
media-type-video,
media-type-audio,
media-type-document,
media-type-other,
media-type-all,
agent-type-user,
agent-type-spider,
agent-type-all
daily,
monthly
top-media-requests*
top_media_requests**
last-month no-filter,
media-type-image,
media-type-video,
media-type-audio,
media-type-document,
media-type-other,
media-type-all
daily,
monthly

 * CLI Queries.        ** Py Functions.        *** More complex filters are coming to the new versions.

Contributing Metrics/Class:

Queries*/Functions** Periods Filters*** Intervals
editors* ** all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot, 
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all,
activity-level-1-to-4-edits, 
activity-level-5-to-24-edits, 
activity-level-25-to-99-edits, 
activity-level-100-or-more-edits,  
activity-level-all
daily,
monthly
active-editors*
active_editors**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all
daily,
monthly
edits* ** all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot, 
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all
daily,
monthly
user-edits*
user_edits**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all
daily,
monthly
new-pages*
new_pages**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot, 
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all
daily,
monthly
new-registered-users*
new_registered_users**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter daily,
monthly
top-editors*
top_editors**
last-month no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot, 
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all
daily,
monthly
top-edited-pages*
top_edited_pages**
last-month no-filter, 
page-type-content, 
page-type-non-content, 
page-type-all, 
editor-type-user, 
editor-type-name-bot, 
editor-type-anonymous, 
editor-type-group-bot, 
editor-type-all
daily,
monthly
active-editors-by-country*
active_editors_by_country**
last-month activity-level-5-to-99-edits,
activity-level-100-or-more-edits
daily,
monthly

 * CLI Queries.        ** Py Functions.        *** More complex filters are coming to the new versions.

Reading Metrics/Class:

Queries*/Functions** Periods Filters*** Intervals
total-page-views*
total_page_views**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter,
access-method-desktop,
access-method-mobile-app,
access-method-mobile-web,
access-method-all,
agent-type-user,
agent-type-spider,
agent-type-automated,
agent-type-all
daily,
monthly
legacy-page-views*
legacy_page_views**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter,
access-site-mobile-site,
access-site-desktop-site,
access-site-all
daily,
monthly
page-views-by-country*
page_views_by_country**
last-month no-filter,
access-method-desktop,
access-method-mobile-app,
access-method-mobile-web,
access-method-all
daily,
monthly
unique-devices*
unique_devices**
all-years, 
one-year, 
two-years, 
three-months, 
one-month
no-filter,
access-site-mobile-site,
access-site-desktop-site,
access-site-all
daily,
monthly
top-viewed-articles*
top_viewed_articles**
last-month no-filter,
access-method-desktop,
access-method-mobile-app,
access-method-mobile-web,
access-method-all
daily,
monthly

* CLI Queries.        ** Py Functions.        *** More complex filters are coming to the new versions.

Extra Features:

List All Wikipedia Languages with its Codes:

* As CLI:

To return the full list of all Wikipedia's supported languages with their codes, try one of these commands:

$ wikistats2csv -lw
# OR
$ wikistats2csv --list-wikis

* As Python Package:

from wikistats2csv import Helper
Helper.get_Wikis_Codes()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wikistats2csv-0.1.6.tar.gz (39.0 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page