A simple package designed to collect the edit histories of Wikipedia pages
Project description
Wikipedia Histories
A simple tool to pull the complete edit history of a Wikipedia page in a variety of formats, including JSON, DataFrame, or directly as an object.
>>> import wikipedia_histories
# Generate a list of revisions for a specified page
>>> golden_swallow = wikipedia_histories.get_history('Golden swallow')
# Show the revision IDs for every edit
>>> golden_swallow
# [130805848, 162259515, 167233740, 195388442, ...
# Show the user who made a specific edit
>>> golden_swallow[16].user
# u'Snowmanradio'
# Show the text of at the time of a specific edit
>>> golden_swallow[16].content
# u'The Golden Swallow (Tachycineta euchrysea) is a swallow. The Golden Swallow formerly'...
>>> golden_swallow[200].content
# u'The golden swallow (Tachycineta euchrysea) is a passerine in the swallow family'...
# Get the article rating at the time of the edit
>>> ratings = [revision.rating for revision in golden_swallow]
>>> ratings
# ['NA', 'NA', 'NA', 'NA', 'stub', 'stub', ...
# Get the time of each edit as a datetime object
>>> times = [revision.time for revision in golden_swallow]
>>> times
# [datetime.datetime(2007, 5, 14, 16, 15, 31), datetime.datetime(2007, 10, 4, 15, 36, 29), ...
# Generate a dataframe with text and metadata from a the list of revisions
>>> df = wikipedia_histories.build_df(golden_swallow)
# Generate a JSON with text and metadata from the list of versions
>>> jsonified = wikipedia_histories.build_json(golden_swallow)
Installation
To install Wikipedia Histories, simply run:
$ pip install wikipedia-histories
Wikipedia Histories is compatible with Python 3.6+.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for wikipedia_histories-0.1.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | e0e862e40875c9668caa2bee800ba3ba9a367e9bbde1e2d19f4ef5478e365362 |
|
MD5 | 31b7857e50f56d8f9118a463a57d66fe |
|
BLAKE2b-256 | c461d219349fba403bc15344d72164c373e4810666c6e43235ada2a12fe946a4 |
Close
Hashes for wikipedia_histories-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 89d67bfb63f6baf58716695c423748e5f0823d7152521cbca8ffe6f6e7e7910d |
|
MD5 | 5ee1b438010a80a024bd0357c7b551c5 |
|
BLAKE2b-256 | d4a5088e95171c1b20bd5b2e54932c8a6007648cfc68f9321e32e9762b0530f8 |