Skip to main content
Join the official 2019 Python Developers SurveyStart the survey!

Break WikiData dumps into smaller knowledge graphs

Project description

WikiDataSets Documentation Status Updates

Breaking WikiData dumps into smaller knowledge graphs (e.g. graph of human entities).

Data Sets

Data sets are available on this page.


This is a non-exhaustive list of useful functions :

  • wikidatasets.processFunction.get_subclasses : Gets a list of WikiData IDs of entities which are subclasses of the subject.
  • wikidatasets.processFunction.query_wikidata_dump : Goes through a Wikidata dump. It can either collect entities that are instances of test_entities or collect the dictionary of labels. It can also do both.
  • wikidatasets.processFunction.build_dataset : Builds datasets from the pickle files produced by the query_wikidata_dump.
  • wikidatasets.utils.load_data_labels : Loads the edges and attributes files into Pandas dataframes and merges the labels of entities and relations to get.

The example/ folder contains examples of scripts to create datasets (e.g. Such scripts should be placed in the main directory (along with, and hard-coded paths should be tuned to match your installation.


If you find this code useful in your research, please consider citing our paper:

    title={WikiDataSets : Standardized sub-graphs from WikiData},
    author={Armand Boschin},


This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.


0.1.0 (2019-07-01)

  • First release on PyPI.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for wikidatasets, version 0.1.5
Filename, size File type Python version Upload date Hashes
Filename, size wikidatasets-0.1.5-py2.py3-none-any.whl (10.1 kB) File type Wheel Python version py2.py3 Upload date Hashes View hashes
Filename, size wikidatasets-0.1.5.tar.gz (14.7 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page