Wikistats-to-CSV downloads Wikipedia Statistics in CSV format for a given Wikipedia.
Project description
Wikistats-to-CSV
Wikistats-to-CSV (wikistats2csv) is a Python Package (PIP) and Command Line Interface (CLI) that downloads Wikipedia Statistics for a given Wikipedia in a format of CSV from Wikimedia Statistics project.
Install:
Wikistats-to-CSV (wikistats2csv) requires Python >=3 and the installation of a few Python packages such as lxml==4.9.1
, rich==12.5.1
, numpy==1.23.2
, pandas==1.4.3
, selenium==3.141.0
, and geckodriver-autoinstaller==0.1.0
. For convenience, we included the installation of these packages as a part of the setup process of Wikistats-to-CSV (wikistats2csv). If you encounter installation errors, you might need to install these packages using pip
manually.
python3 -m pip install -r requirements.txt
To download Wikistats-to-CSV (wikistats2csv) using pip
command , we highly recommend you first upgrade the pip
command to the latest version.
python3 -m pip install --upgrade pip
python3 -m pip install wikistats2csv
If you encounter a warning of "WARNING: the script is installed in '/Users/.../.../bin' which is not on path", then you need to add the displayed path "/Users/.../.../bin" to the $PATH
variable using this command:
export PATH="/Users/.../.../bin:$PATH"
Usage:
* As CLI:
>> Long Flags:
$ wikistats2csv --wiki en --metric content --query pages-to-date --period all-years --filter page-type-all --interval monthly
▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│▌║│█║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║WIKISTATS-TO-CSV║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│▌║│█║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
## Downloaded `english--pages-to-date--page-type-all--all-years--monthly.csv` successfully :-)
** Quick glance at `english--pages-to-date--page-type-all--all-years--monthly.csv` file:
month total.non-content total.content timeRange.start timeRange.end
0 2001-01-01T00:00:00.000Z 28 37 2001-01-01T00:00:00.000Z 2001-02-01T00:00:00.000Z
1 2001-02-01T00:00:00.000Z 51 175 2001-02-01T00:00:00.000Z 2001-03-01T00:00:00.000Z
.. ... ... ... ... ...
257 2022-06-01T00:00:00.000Z 36945305 6518484 2022-06-01T00:00:00.000Z 2022-07-01T00:00:00.000Z
258 2022-07-01T00:00:00.000Z 37088260 6534151 2022-07-01T00:00:00.000Z 2022-08-01T00:00:00.000Z
>> Short Flags:
$ wikistats2csv -w ar -m content -q pages-to-date -p all-years -f page-type-all -i monthly
▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│▌║│█║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║WIKISTATS-TO-CSV║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
▌│║▌║▌█║▌║║▌│║▌│║▌║▌█║▌█║▌║│▌║│▌║│▌║│█║▌│█║▌│▌║│▌║║▌║▌█║▌║│▌
## Downloaded `arabic--pages-to-date--page-type-all--all-years--monthly.csv` successfully :-)
** Quick glance at `arabic--pages-to-date--page-type-all--all-years--monthly.csv` file:
month total.non-content total.content timeRange.start timeRange.end
0 2001-01-01T00:00:00.000Z 0 591 2001-01-01T00:00:00.000Z 2001-02-01T00:00:00.000Z
1 2001-02-01T00:00:00.000Z 0 591 2001-02-01T00:00:00.000Z 2001-03-01T00:00:00.000Z
.. ... ... ... ... ...
257 2022-06-01T00:00:00.000Z 5508072 1173410 2022-06-01T00:00:00.000Z 2022-07-01T00:00:00.000Z
258 2022-07-01T00:00:00.000Z 5538121 1180401 2022-07-01T00:00:00.000Z 2022-08-01T00:00:00.000Z
* As Python Package:
>>> from wikistats2csv import Content
>>> Content.pages_to_date(wiki='es', period='all-years', filter='page-type-all', interval='monthly')
## Downloaded `spanish--pages-to-date--page-type-all--all-years--monthly.csv` successfully :-)
** Quick glance at `spanish--pages-to-date--page-type-all--all-years--monthly.csv` file:
month total.non-content total.content timeRange.start timeRange.end
0 2001-01-01T00:00:00.000Z 0 0 2001-01-01T00:00:00.000Z 2001-02-01T00:00:00.000Z
1 2001-02-01T00:00:00.000Z 0 0 2001-02-01T00:00:00.000Z 2001-03-01T00:00:00.000Z
.. ... ... ... ... ...
257 2022-06-01T00:00:00.000Z 3896209 1786321 2022-06-01T00:00:00.000Z 2022-07-01T00:00:00.000Z
258 2022-07-01T00:00:00.000Z 3903963 1792329 2022-07-01T00:00:00.000Z 2022-08-01T00:00:00.000Z
Supported Features:
Content Class/Metrics:
Queries*/Functions** | Periods | Filters*** | Intervals |
---|---|---|---|
absolute-bytes-difference* absolute_bytes_difference** |
all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all |
daily, monthly |
edited-pages* edited_pages** |
all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all, activity-level-1-to-4-edits, activity-level-5-to-24-edits, activity-level-25-to-99-edits, activity-level-100-or-more-edits, activity-level-all |
daily, monthly |
net-bytes-difference* net_bytes_difference** |
all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all |
daily, monthly |
pages-to-date* pages_to_date** |
all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all |
daily, monthly |
total-media-requests* total_media_requests** |
all-years, one-year, two-years, three-months, one-month |
no-filter, media-type-image, media-type-video, media-type-audio, media-type-document, media-type-other, media-type-all, agent-type-user, agent-type-spider, agent-type-all |
daily, monthly |
top-media-requests* top_media_requests** |
last-month | no-filter, media-type-image, media-type-video, media-type-audio, media-type-document, media-type-other, media-type-all |
daily, monthly |
* CLI Queries. ** Py Functions. *** More complex filters are coming to the new versions.
Contributing Metrics/Class:
Queries*/Functions** | Periods | Filters*** | Intervals |
---|---|---|---|
editors* ** | all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all, activity-level-1-to-4-edits, activity-level-5-to-24-edits, activity-level-25-to-99-edits, activity-level-100-or-more-edits, activity-level-all |
daily, monthly |
active-editors* active_editors** |
all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all |
daily, monthly |
edits* ** | all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all |
daily, monthly |
user-edits* user_edits** |
all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all |
daily, monthly |
new-pages* new_pages** |
all-years, one-year, two-years, three-months, one-month |
no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all |
daily, monthly |
new-registered-users* new_registered_users** |
all-years, one-year, two-years, three-months, one-month |
no-filter | daily, monthly |
top-editors* top_editors** |
last-month | no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all |
daily, monthly |
top-edited-pages* top_edited_pages** |
last-month | no-filter, page-type-content, page-type-non-content, page-type-all, editor-type-user, editor-type-name-bot, editor-type-anonymous, editor-type-group-bot, editor-type-all |
daily, monthly |
active-editors-by-country* active_editors_by_country** |
last-month | activity-level-5-to-99-edits, activity-level-100-or-more-edits |
daily, monthly |
* CLI Queries. ** Py Functions. *** More complex filters are coming to the new versions.
Reading Metrics/Class:
Queries*/Functions** | Periods | Filters*** | Intervals |
---|---|---|---|
total-page-views* total_page_views** |
all-years, one-year, two-years, three-months, one-month |
no-filter, access-method-desktop, access-method-mobile-app, access-method-mobile-web, access-method-all, agent-type-user, agent-type-spider, agent-type-automated, agent-type-all |
daily, monthly |
legacy-page-views* legacy_page_views** |
all-years, one-year, two-years, three-months, one-month |
no-filter, access-site-mobile-site, access-site-desktop-site, access-site-all |
daily, monthly |
page-views-by-country* page_views_by_country** |
last-month | no-filter, access-method-desktop, access-method-mobile-app, access-method-mobile-web, access-method-all |
daily, monthly |
unique-devices* unique_devices** |
all-years, one-year, two-years, three-months, one-month |
no-filter, access-site-mobile-site, access-site-desktop-site, access-site-all |
daily, monthly |
top-viewed-articles* top_viewed_articles** |
last-month | no-filter, access-method-desktop, access-method-mobile-app, access-method-mobile-web, access-method-all |
daily, monthly |
* CLI Queries. ** Py Functions. *** More complex filters are coming to the new versions.
Extra Features:
List All Wikipedia Languages with its Codes:
* As CLI:
To return the full list of all Wikipedia's supported languages with their codes, try one of these commands:
$ wikistats2csv -lw
# OR
$ wikistats2csv --list-wikis
* As Python Package:
from wikistats2csv import Helper
Helper.get_Wikis_Codes()
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file wikistats2csv-0.1.6.tar.gz
.
File metadata
- Download URL: wikistats2csv-0.1.6.tar.gz
- Upload date:
- Size: 39.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 64699163039069a582221673a166a760f979a7370c745b86445a461b1c72415c |
|
MD5 | 83e5595a7b7decef2a2d3b5714a8693e |
|
BLAKE2b-256 | c6c4eec2ba91c41453996ba99f5673b580b5d6a50bf2a8d9d1ac824a972b3ff2 |