Wikistats-to-CSV downloads Wikipedia Statistics in CSV format for a given Wikipedia.
Project description
Wikistats-to-CSV
▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│█║│▌║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║WIKISTATS-TO-CSV║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│█║│▌║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
Wikistats-to-CSV is a Python package/wrapper and command line interface (CLI) that downloads Wikipedia Statistics for a given Wikipedia in a format of CSV from Wikimedia Statistics project.
Install:
Wikistats-to-CSV (wikistats2csv) requires Python >=3 and the installation of a few Python packages such as lxml==4.9.1
, rich==12.5.1
, pandas==1.4.3
, selenium==3.141.0
, and geckodriver-autoinstaller==0.1.0
. For convenience, we included the installation of these packages as a part of the setup process of Wikistats-to-CSV (wikistats2csv). If you encounter installation errors, you might need to install these packages using pip
manually. To download Wikistats-to-CSV (wikistats2csv), use this command:
pip install wikistats2csv
Usage:
* As CLI:
>> Long Flags:
$ wikistats2csv --wiki en --metric content --query pages-to-date --period all-years --filter page-type-all --interval monthly
▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│█║│▌║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║WIKISTATS-TO-CSV║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│█║│▌║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
## Downloaded `english--pages-to-date--page-type-all--all-years--monthly.csv` successfully :-)
** Quick glance at `english--pages-to-date--page-type-all--all-years--monthly.csv` file:
month total.non-content total.content timeRange.start timeRange.end
0 2001-01-01T00:00:00.000Z 28 37 2001-01-01T00:00:00.000Z 2001-02-01T00:00:00.000Z
1 2001-02-01T00:00:00.000Z 51 175 2001-02-01T00:00:00.000Z 2001-03-01T00:00:00.000Z
.. ... ... ... ... ...
257 2022-06-01T00:00:00.000Z 36945305 6518484 2022-06-01T00:00:00.000Z 2022-07-01T00:00:00.000Z
258 2022-07-01T00:00:00.000Z 37088260 6534151 2022-07-01T00:00:00.000Z 2022-08-01T00:00:00.000Z
>> Short Flags:
$ wikistats2csv -w ar -m content -q pages-to-date -p all-years -f page-type-all -i monthly
▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│█║│▌║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║WIKISTATS-TO-CSV║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│█║│▌║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
## Downloaded `arabic--pages-to-date--page-type-all--all-years--monthly.csv` successfully :-)
** Quick glance at `arabic--pages-to-date--page-type-all--all-years--monthly.csv` file:
month total.non-content total.content timeRange.start timeRange.end
0 2001-01-01T00:00:00.000Z 0 591 2001-01-01T00:00:00.000Z 2001-02-01T00:00:00.000Z
1 2001-02-01T00:00:00.000Z 0 591 2001-02-01T00:00:00.000Z 2001-03-01T00:00:00.000Z
.. ... ... ... ... ...
257 2022-06-01T00:00:00.000Z 5508072 1173410 2022-06-01T00:00:00.000Z 2022-07-01T00:00:00.000Z
258 2022-07-01T00:00:00.000Z 5538121 1180401 2022-07-01T00:00:00.000Z 2022-08-01T00:00:00.000Z
* As Python Package:
>>> from wikistats2csv import Content
>>> Content.pages_to_date(wiki='es', period='all-years', filter='page-type-all', interval='monthly')
## Downloaded `spanish--pages-to-date--page-type-all--all-years--monthly.csv` successfully :-)
** Quick glance at `spanish--pages-to-date--page-type-all--all-years--monthly.csv` file:
month total.non-content total.content timeRange.start timeRange.end
0 2001-01-01T00:00:00.000Z 0 0 2001-01-01T00:00:00.000Z 2001-02-01T00:00:00.000Z
1 2001-02-01T00:00:00.000Z 0 0 2001-02-01T00:00:00.000Z 2001-03-01T00:00:00.000Z
.. ... ... ... ... ...
257 2022-06-01T00:00:00.000Z 3896209 1786321 2022-06-01T00:00:00.000Z 2022-07-01T00:00:00.000Z
258 2022-07-01T00:00:00.000Z 3903963 1792329 2022-07-01T00:00:00.000Z 2022-08-01T00:00:00.000Z
Supported Features:
Content Class/Metrics:
Queries* Functions** |
Periods | Filters*** | Intervals |
---|---|---|---|
absolute-bytes-difference* absolute_bytes_difference** |
all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all |
daily, monthly |
edited-pages* edited_pages** |
all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all, activity-level-1-to-4-edits, activity-level-5-to-24-edits, activity-level-25-to-99-edits, activity-level-100-or-more-edits, activity-level-all |
daily, monthly |
net-bytes-difference* net_bytes_difference** |
all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all |
daily, monthly |
pages-to-date* pages_to_date** |
all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all |
daily, monthly |
total-media-requests* total_media_requests** |
all-years, one-year, two-years, three-months, one-month |
no-filter, media-type-image, media-type-video, media-type-audio, media-type-document, media-type-other, media-type-all, agent-type-user, agent-type-spider, agent-type-all |
daily, monthly |
top-media-requests* top_media_requests** |
last-month | no-filter, media-type-image, media-type-video, media-type-audio, media-type-document, media-type-other, media-type-all |
daily, monthly |
* CLI Queries. ** Py Functions. *** More complex filters are coming to the new versions.
Contributing Metrics/Class:
Queries* Functions** |
Periods | Filters*** | Intervals |
---|---|---|---|
editors* ** | all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all, activity-level-1-to-4-edits, activity-level-5-to-24-edits, activity-level-25-to-99-edits, activity-level-100-or-more-edits, activity-level-all |
daily, monthly |
active-editors* active_editors** |
all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all |
daily, monthly |
edits* ** | all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all |
daily, monthly |
user-edits* user_edits** |
all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all |
daily, monthly |
new-pages* new_pages** |
all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all |
daily, monthly |
new-registered-users* new_registered_users** |
all-years, one-year, two-years, three-months, one-month |
no-filter | daily, monthly |
top-editors* top_editors** |
last-month | no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all |
daily, monthly |
top-edited-pages* top_edited_pages** |
last-month | no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all |
daily, monthly |
active-editors-by-country* active_editors_by_country** |
last-month | activity-level-5-to-99-edits, activity-level-100-or-more-edits |
daily, monthly |
* CLI Queries. ** Py Functions. *** More complex filters are coming to the new versions.
Reading Metrics/Class:
Queries* Functions** |
Periods | Filters*** | Intervals |
---|---|---|---|
total-page-views* total_page_views** |
all-years, one-year, two-years, three-months, one-month |
no-filter, access-method-desktop, access-method-mobile-app, access-method-mobile-web, access-method-all, agent-type-user, agent-type-spider, agent-type-automated, agent-type-all |
daily, monthly |
legacy-page-views* legacy_page_views** |
all-years, one-year, two-years, three-months, one-month |
no-filter, access-site-mobile-site, access-site-desktop-site, access-site-all |
daily, monthly |
page-views-by-country* page_views_by_country** |
last-month | no-filter, access-method-desktop, access-method-mobile-app, access-method-mobile-web, access-method-all |
daily, monthly |
unique-devices* unique_devices** |
all-years, one-year, two-years, three-months, one-month |
no-filter, access-site-mobile-site, access-site-desktop-site, access-site-all |
daily, monthly |
top-viewed-articles* top_viewed_articles** |
last-month | no-filter, access-method-desktop, access-method-mobile-app, access-method-mobile-web, access-method-all |
daily, monthly |
* CLI Queries. ** Py Functions. *** More complex filters are coming to the new versions.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.