Skip to main content

🍰 Making Wikipedia and Wikidata Processing Easy, Like Eating a Piece of Cake

Project description

wake

🍰 Making Wikipedia and Wikidata Processing Easy, Like Eating a Piece of Cake

installation

pip install wake

methods

get_wikidata_entities

Stream Wikidata Entities

from wake import get_wikidata_entities

for entity in get_wikidata_entities():
    print(entity)

clean_title

takes in a title of a Wikipedia page as a string and escapes and cleans it of weird characters, so it can be put in a normal database

download_if_necessary

dowloads a url to the system's temp directory if a file by its name isn't already there

get_most_recent_available_dump

figures out what Wikipedia dump has certain subdumps complete

tokenize

pass in the page text from a dump and get a list of tokens in return

get_links

get links in an article(i.e. what's between '[[' and ']]')

run_sql

runs MySQL command using bash with no external, third-party connector library required

from wake import run_sql

run_sql("SHOW DATABASES")

run_sql("SELECT COUNT(*) FROM geo_tags", "geo_tags_db")

test

python3 -m unittest wake.test

license

CC0-1.0 / Public Domain

contact

Post an issue! Thank you!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wake-0.10.1.tar.gz (5.9 kB view details)

Uploaded Source

File details

Details for the file wake-0.10.1.tar.gz.

File metadata

  • Download URL: wake-0.10.1.tar.gz
  • Upload date:
  • Size: 5.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/44.1.0 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/2.7.17

File hashes

Hashes for wake-0.10.1.tar.gz
Algorithm Hash digest
SHA256 9ad0debee22697d5bad71fc16a46a6830705b57ed3d7f487a7fcc93e16eb12d6
MD5 76e504512be9c5125197897e88b68ab1
BLAKE2b-256 176d946c5df0136571c6bd6cd3d76fb1c64177831c9a564e3a5e98979213890f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page