Easy scraper that extracts data from Wikipedia articles thanks to its URL slug
Project description
wikiscraper
Easy scraper that extracts data from Wikipedia articles thanks to its URL slug : title, images, summary, sections paragraphs, sidebar info
Developed by Alexandre MEYER
This work is licensed under a Creative Commons Attribution 4.0 International License.
Installation
$ pip install wikiscraper
Initialization
Import
import wikiscraper as ws
Main request
# Set the language page in Wikipedia for the query
# (ISO 639-1 & by default "en" for English)
ws.lang("fr")
# Search and get content by the URL slug of the article
# (Example : https://fr.wikipedia.org/wiki/Paris)
result = ws.searchBySlug("Paris")
Examples
Title H1
# Get article's title
result.getTitle()
Sidebar
# Get value of the sidebar information label
result.getSideInfo("Gentilé")
Summary
# Get all paragraphs of summary
print(result.getSummary())
# Get the second paragraph of summary
print(result.getSummary()[1])
# Optional : Get the x paragraphs, starting from the beginning
print(result.getSummary(2))
Images
# Get all illustration images
img = result.getImage()
# Get a specific image thanks to its position in the page
print(img[0]) # Main image
Sections
# Get paragraphs from a specific section thanks to the parents' header title
# All optional args : .getSection(h2Title, h3Title, h4Title)
# Exemple : https://fr.wikipedia.org/wiki/Paris#Politique_et_administration
print(result.getSection('Politique et administration', 'Statut et organisation administrative', 'Historique')[0])
Errors
"Unable to find the requested query: please check the spelling of the slug"
- Check if the spelling of the slug is correct
- Check if the article exists
- Check if the language set for the query matches with the slug (by default the search is for English articles)
Versions
- 1.1.0 = Error Handling
- 1.0.0 = init
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
wikiscraper-1.1.2.tar.gz
(9.7 kB
view hashes)
Built Distribution
Close
Hashes for wikiscraper-1.1.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 66453a74bb5b295a7c0a44bf1be2cb37945220461bab71f8a34c2b913c2aa738 |
|
MD5 | e5c5cbee498a680a194cb790925fd838 |
|
BLAKE2b-256 | 8c4a4857c313b8092360b82a8cdb0aae33fd71fce4b89c743928439fd26b0cb6 |