Easy scraper that extracts data from Wikipedia articles thanks to its URL slug
Project description
wikiscraper
Easy scraper that extracts data from Wikipedia articles thanks to its URL slug : title, images, summary, sections paragraphs, sidebar info
Developed by Alexandre MEYER
This work is licensed under a Creative Commons Attribution 4.0 International License.
Installation
$ pip install wikiscraper
Initialization
Import
import wikiscraper as ws
Main request
# Set the language page in Wikipedia for the query
# (ISO 639-1 & by default "en" for English)
ws.lang("fr")
# Search and get content by the URL slug of the article
# (Example : https://fr.wikipedia.org/wiki/Paris)
result = ws.searchBySlug("Paris")
Examples
Title H1 & URL
# Get article's title
result.getTitle()
# Get article's URL
result.getURL()
Sidebar
# Get value of the sidebar information label
result.getSideInfo("Gentilé")
Abstract
# Get all paragraphs of abstract
print(result.getAbstract())
# Get the second paragraph of abstract
print(result.getAbstract()[1])
# Optional : Get the x paragraphs, starting from the beginning
print(result.getAbstract(2))
Images
# Get all illustration images
img = result.getImage()
# Get a specific image thanks to its position in the page
print(img[0]) # Main image
Sections
# Get table of contents
# Only first headlines
print(result.getContentsTable())
# All headelines (first and second levels)
print(result.getContentsTable(subcontents=True))
# Get paragraphs from a specific section thanks to the parents' header title
# All optional args : .getSection(h2Title, h3Title, h4Title)
# Exemple : https://fr.wikipedia.org/wiki/Paris#Politique_et_administration
print(result.getSection('Politique et administration', 'Statut et organisation administrative', 'Historique')[0])
Errors
"Unable to find the requested query: please check the spelling of the slug"
- Check if the spelling of the slug is correct
- Check if the article exists
- Check if the language set for the query matches with the slug (by default the search is for English articles)
Versions
- 1.1.0 = Error Handling
- 1.0.0 = init
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
wikiscraper-1.1.7.tar.gz
(10.1 kB
view hashes)
Built Distribution
Close
Hashes for wikiscraper-1.1.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 68167027eb12ef8bd247cf87860871b1d1378b07378fbb258441bf2b0987e3c1 |
|
MD5 | ab2c5afdea3e491e3bb3dbf2dbba8921 |
|
BLAKE2b-256 | 627ec030647cc25838de9375a002b1470ad6b5b0b3156ab0f2284af0a70b56ee |