Get a list of Wikipedia articles that link to a website.
Project description
wikilinks
wikilinks
is a Python command line utility and module that allow you to easily
extract external links from Wikipedia that point at a particular website. It
can be useful and interesting to see how Wikipedians have cited the content made
available on another website. These links say something about the websites
placement in Wikipedia, as well as the topics of interest in the website.
The wikilinks
code was originally part of a much larger project called
Linkypedia that was used to visualize the use of cultural heritage materials
on Wikipedia. Linkypedia has since been shut down but perhaps this little piece
of functionality could still be of use to you.
While Wikipedia's API allows you list external links from a given page, it
doesn't (to my knowledge) have an API call to retrieve pages that link to a
particular website. However they do provide the External links search page
that lets you perform this lookup in your browser. wikilinks
simply scrapes
those results, for all language Wikipedias.
Install
pip install wikilinks
Use
Find links in English Wikipedia that point at the mith.umd.edu website and print them out as tab separated URLs:
% wikilinks --lang en http://mith.umd.edu
https://en.wikipedia.org/wiki/User:Mastersplinter/Making_the_History_of_1989 http://mith.umd.edu/
https://en.wikipedia.org/wiki/University_of_Maryland_Libraries http://mith.umd.edu/ https://en.wikipedia.org/wiki/User:Walker222
http://mith.umd.edu https://en.wikipedia.org/wiki/Maryland_Institute_for_Technology_in_the_Humanities
http://mith.umd.edu/ https://en.wikipedia.org/wiki/User:Edsu http://mith.umd.edu
https://en.wikipedia.org/wiki/University_of_Maryland_College_of_Information_Studies http://mith.umd.edu/ https://en.wikipedia.org/wiki/Antonia%27s_Line
http://mith.umd.edu//WomensStudies/FilmReviews/antonias-line-mcalister https://en.wikipedia.org/wiki/Rio_Nutrias
http://mith.umd.edu//eada/gateway/diario/diary.html https://en.wikipedia.org/wiki/Rio_Nutria_(Zuni_River_tributary)
http://mith.umd.edu//eada/gateway/diario/diary.html https://en.wikipedia.org/wiki/Nutrioso,_Arizona
http://mith.umd.edu//eada/gateway/diario/diary.html https://en.wikipedia.org/wiki/Marilee_Lindemann http://mith.umd.edu/2008/01/
https://en.wikipedia.org/wiki/User:Keilana/Women_scientist_resources http://mith.umd.edu/WomensStudies/Bibliographies/ScienceBiblio/science-pt2
https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Maria_Ford http://mith.umd.edu/WomensStudies/FilmReviews/S/some-nudity-burdette.html
https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletion/Log/2015_January_14 http://mith.umd.edu/WomensStudies/FilmReviews/S/some-nudity-burdette.html
https://en.wikipedia.org/wiki/Talk:Colors_of_the_Wind http://mith.umd.edu/WomensStudies/FilmReviews/pocahontas-strong ...
Use as a Library
This example will fetch each Wikipedia article link to the mith.umd.edu
website as a (source, target)
tuple:
from wikilinks import wikilinks
for link in wikilinks("http://mith.umd.edu"):
print(link)
Which would output something like:
('https://ca.wikipedia.org/wiki/Amerigo_Vespucci', 'http://mith.umd.edu//eada/html/display.php?docs=vespucci_letters.xml')
('https://de.wikipedia.org/wiki/Lisa_Monaco', 'http://mith.umd.edu/WomensStudies/GenderIssues/Violence+Women/ResponsetoRape/introduction')
('https://de.wikipedia.org/wiki/Buenaventura_River', 'http://mith.umd.edu/eada/gateway/diario/')
('https://de.wikipedia.org/wiki/Sarah_Kemble_Knight', 'http://mith.umd.edu/eada/html/display.php?docs=knight_journal.xml')
('https://de.wikipedia.org/wiki/Klapperschlangen', 'http://mith.umd.edu/eada/html/display.php?docs=smith_map.xml')
('https://de.wikipedia.org/wiki/Theater_(Bauwerk)', 'http://mith.umd.edu/theatrefinder/')
('https://en.wikipedia.org/wiki/User:Mastersplinter/Making_the_History_of_1989', 'http://mith.umd.edu/')
...
By default wikilinks
will search all language Wikipedias. If you are only
interested in links from particular language Wikipedias you can use the langs
parameter:
for link in wikilinks("http://mith.umd.edu", langs=["de", "fr"]):
print(link)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file wikilinks-0.0.4.tar.gz
.
File metadata
- Download URL: wikilinks-0.0.4.tar.gz
- Upload date:
- Size: 4.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/42.0.2 requests-toolbelt/0.8.0 tqdm/4.28.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a1cdd9f63a28760100f34b3b251a143f7602ea694a14d24730c0bdba2fdc82d5 |
|
MD5 | 588234fe4931d05c95b65ba6615559e2 |
|
BLAKE2b-256 | 3ad6ac09073fcb51e276b85aa5a76efdb521ee5744c8b4e350149aca945332a0 |