Wikistats-to-CSV downloads Wikipedia Statistics in CSV format for a given Wikipedia.
Project description
Wikistats-to-CSV
Wikistats-to-CSV (wikistats2csv) is a Python Package (PIP) and Command Line Interface (CLI) that downloads Wikipedia Statistics for a given Wikipedia in a format of CSV from Wikimedia Statistics project.
Install:
Wikistats-to-CSV (wikistats2csv) requires Python >=3 and the installation of a few Python packages such as lxml==4.9.1
, rich==12.5.1
, numpy==1.23.2
, pandas==1.4.3
, selenium==3.141.0
, and geckodriver-autoinstaller==0.1.0
. For convenience, we included the installation of these packages as a part of the setup process of Wikistats-to-CSV (wikistats2csv). If you encounter installation errors, you might need to install these packages using pip
manually.
python3 -m pip install -r requirements.txt
To download Wikistats-to-CSV (wikistats2csv) using pip
command , we highly recommend you first upgrade the pip
command to the latest version.
python3 -m pip install --upgrade pip
python3 -m pip install wikistats2csv
If you encounter a warning of "WARNING: the script is installed in '/Users/.../.../bin' which is not on path", then you need to add the displayed path "/Users/.../.../bin" to the $PATH
variable using this command:
export PATH="/Users/.../.../bin:$PATH"
Usage:
* As CLI:
>> Long Flags:
$ wikistats2csv --wiki en --metric content --query pages-to-date --period all-years --filter page-type-all --interval monthly
▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│▌║│█║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║WIKISTATS-TO-CSV║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│▌║│█║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
## Downloaded `english--pages-to-date--page-type-all--all-years--monthly.csv` successfully :-)
** Quick glance at `english--pages-to-date--page-type-all--all-years--monthly.csv` file:
month total.non-content total.content timeRange.start timeRange.end
0 2001-01-01T00:00:00.000Z 28 37 2001-01-01T00:00:00.000Z 2001-02-01T00:00:00.000Z
1 2001-02-01T00:00:00.000Z 51 175 2001-02-01T00:00:00.000Z 2001-03-01T00:00:00.000Z
.. ... ... ... ... ...
257 2022-06-01T00:00:00.000Z 36945305 6518484 2022-06-01T00:00:00.000Z 2022-07-01T00:00:00.000Z
258 2022-07-01T00:00:00.000Z 37088260 6534151 2022-07-01T00:00:00.000Z 2022-08-01T00:00:00.000Z
>> Short Flags:
$ wikistats2csv -w ar -m content -q pages-to-date -p all-years -f page-type-all -i monthly
▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│▌║│█║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║WIKISTATS-TO-CSV║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│▌║│█║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
## Downloaded `arabic--pages-to-date--page-type-all--all-years--monthly.csv` successfully :-)
** Quick glance at `arabic--pages-to-date--page-type-all--all-years--monthly.csv` file:
month total.non-content total.content timeRange.start timeRange.end
0 2001-01-01T00:00:00.000Z 0 591 2001-01-01T00:00:00.000Z 2001-02-01T00:00:00.000Z
1 2001-02-01T00:00:00.000Z 0 591 2001-02-01T00:00:00.000Z 2001-03-01T00:00:00.000Z
.. ... ... ... ... ...
257 2022-06-01T00:00:00.000Z 5508072 1173410 2022-06-01T00:00:00.000Z 2022-07-01T00:00:00.000Z
258 2022-07-01T00:00:00.000Z 5538121 1180401 2022-07-01T00:00:00.000Z 2022-08-01T00:00:00.000Z
* As Python Package:
>>> from wikistats2csv import Content
>>> Content.pages_to_date(wiki='es', period='all-years', filter='page-type-all', interval='monthly')
## Downloaded `spanish--pages-to-date--page-type-all--all-years--monthly.csv` successfully :-)
** Quick glance at `spanish--pages-to-date--page-type-all--all-years--monthly.csv` file:
month total.non-content total.content timeRange.start timeRange.end
0 2001-01-01T00:00:00.000Z 0 0 2001-01-01T00:00:00.000Z 2001-02-01T00:00:00.000Z
1 2001-02-01T00:00:00.000Z 0 0 2001-02-01T00:00:00.000Z 2001-03-01T00:00:00.000Z
.. ... ... ... ... ...
257 2022-06-01T00:00:00.000Z 3896209 1786321 2022-06-01T00:00:00.000Z 2022-07-01T00:00:00.000Z
258 2022-07-01T00:00:00.000Z 3903963 1792329 2022-07-01T00:00:00.000Z 2022-08-01T00:00:00.000Z
Supported Features:
Content Class/Metrics:
Queries*/Functions** | Periods | Filters*** | Intervals |
---|---|---|---|
absolute-bytes-difference* absolute_bytes_difference** |
all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all |
daily, monthly |
edited-pages* edited_pages** |
all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all, activity-level-1-to-4-edits, activity-level-5-to-24-edits, activity-level-25-to-99-edits, activity-level-100-or-more-edits, activity-level-all |
daily, monthly |
net-bytes-difference* net_bytes_difference** |
all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all |
daily, monthly |
pages-to-date* pages_to_date** |
all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all |
daily, monthly |
total-media-requests* total_media_requests** |
all-years, one-year, two-years, three-months, one-month |
no-filter, media-type-image, media-type-video, media-type-audio, media-type-document, media-type-other, media-type-all, agent-type-user, agent-type-spider, agent-type-all |
daily, monthly |
top-media-requests* top_media_requests** |
last-month | no-filter, media-type-image, media-type-video, media-type-audio, media-type-document, media-type-other, media-type-all |
daily, monthly |
* CLI Queries. ** Py Functions. *** More complex filters are coming to the new versions.
Contributing Metrics/Class:
Queries*/Functions** | Periods | Filters*** | Intervals |
---|---|---|---|
editors* ** | all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all, activity-level-1-to-4-edits, activity-level-5-to-24-edits, activity-level-25-to-99-edits, activity-level-100-or-more-edits, activity-level-all |
daily, monthly |
active-editors* active_editors** |
all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all |
daily, monthly |
edits* ** | all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all |
daily, monthly |
user-edits* user_edits** |
all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all |
daily, monthly |
new-pages* new_pages** |
all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all |
daily, monthly |
new-registered-users* new_registered_users** |
all-years, one-year, two-years, three-months, one-month |
no-filter | daily, monthly |
top-editors* top_editors** |
last-month | no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all |
daily, monthly |
top-edited-pages* top_edited_pages** |
last-month | no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all |
daily, monthly |
active-editors-by-country* active_editors_by_country** |
last-month | activity-level-5-to-99-edits, activity-level-100-or-more-edits |
daily, monthly |
* CLI Queries. ** Py Functions. *** More complex filters are coming to the new versions.
Reading Metrics/Class:
Queries*/Functions** | Periods | Filters*** | Intervals |
---|---|---|---|
total-page-views* total_page_views** |
all-years, one-year, two-years, three-months, one-month |
no-filter, access-method-desktop, access-method-mobile-app, access-method-mobile-web, access-method-all, agent-type-user, agent-type-spider, agent-type-automated, agent-type-all |
daily, monthly |
legacy-page-views* legacy_page_views** |
all-years, one-year, two-years, three-months, one-month |
no-filter, access-site-mobile-site, access-site-desktop-site, access-site-all |
daily, monthly |
page-views-by-country* page_views_by_country** |
last-month | no-filter, access-method-desktop, access-method-mobile-app, access-method-mobile-web, access-method-all |
daily, monthly |
unique-devices* unique_devices** |
all-years, one-year, two-years, three-months, one-month |
no-filter, access-site-mobile-site, access-site-desktop-site, access-site-all |
daily, monthly |
top-viewed-articles* top_viewed_articles** |
last-month | no-filter, access-method-desktop, access-method-mobile-app, access-method-mobile-web, access-method-all |
daily, monthly |
* CLI Queries. ** Py Functions. *** More complex filters are coming to the new versions.
Extra Features:
List All Wikipedia Languages with its Codes:
* As CLI:
To return the full list of all Wikipedia's supported languages with their codes, try one of these commands:
$ wikistats2csv -lw
# OR
$ wikistats2csv --list-wikis
* As Python Package:
from wikistats2csv import Helper
Helper.get_Wikis_Codes()
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.