Break WikiData dumps into smaller knowledge graphs
Project description
WikiDataSets
Breaking WikiData dumps into smaller knowledge graphs (e.g. graph of human entities).
Free software: BSD license
Documentation: https://wikidatasets.readthedocs.io.
Data Sets
Data sets are available on this page.
Features
This is a non-exhaustive list of useful functions :
wikidatasets.processFunction.get_subclasses : Gets a list of WikiData IDs of entities which are subclasses of the subject.
wikidatasets.processFunction.query_wikidata_dump : Goes through a Wikidata dump. It can either collect entities that are instances of test_entities or collect the dictionary of labels. It can also do both.
wikidatasets.processFunction.build_dataset : Builds datasets from the pickle files produced by the query_wikidata_dump.
wikidatasets.utils.load_data_labels : Loads the edges and attributes files into Pandas dataframes and merges the labels of entities and relations to get.
The example/ folder contains examples of scripts to create datasets (e.g. build_humans.py). Such scripts should be placed in the main directory (along with utils.py, processFunctions.py) and hard-coded paths should be tuned to match your installation.
Citations
If you find this code useful in your research, please consider citing our paper:
@misc{arm2019wikidatasets,
title={WikiDataSets : Standardized sub-graphs from WikiData},
author={Armand Boschin},
year={2019},
eprint={1906.04536},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
Credits
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.
History
0.1.0 (2019-07-01)
First release on PyPI.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for wikidatasets-0.1.5-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 22aab373feb8f8c048bda5dc5f84d5a155361813366b54b54752c08d35650dfb |
|
MD5 | d384a36b47297330af15917b34a2e934 |
|
BLAKE2b-256 | e43b24c546235d536c6d4828a8af028c81339eb7e713971676acf4daa9ab8e7e |