🍰 Making Wikipedia and Wikidata Processing Easy, Like Eating a Piece of Cake
Project description
wake
🍰 Making Wikipedia and Wikidata Processing Easy, Like Eating a Piece of Cake
installation
pip3 install wake
or pipenv install wake
methods
get_wikidata_entities
Stream Wikidata Entities
from wake import get_wikidata_entities
for entity in get_wikidata_entities():
print(entity)
You can also filter entities by their type. For example, to get all entities that are humans (Q5) run:
from wake import get_wikidata_entities
for human in get_wikidata_entities(instance_of="Q5"):
print(human)
clean_title
takes in a title of a Wikipedia page as a string and escapes and cleans it of weird characters, so it can be put in a normal database
download_if_necessary
dowloads a url to the system's temp directory if a file by its name isn't already there
get_most_recent_available_dump
figures out what Wikipedia dump has certain subdumps complete
tokenize
pass in the page text from a dump and get a list of tokens in return
get_links
get links in an article(i.e. what's between '[[' and ']]')
test
python3 -m unittest wake.test
license
CC0-1.0 / Public Domain
contact
Post an issue! Thank you!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file wake-0.11.0.tar.gz
.
File metadata
- Download URL: wake-0.11.0.tar.gz
- Upload date:
- Size: 6.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/44.1.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/2.7.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5794764bda56dcedab3b56680855462462bbc1c7649bc018a013fb0bf1edc371 |
|
MD5 | 2a905fc0a262dba913146697f281d8da |
|
BLAKE2b-256 | 549a461d1514f2ffe67acaa5c5b51db312e8ccdf933a43dea50879f3783e9481 |