Skip to main content

Library to process dumps of knowledge graphs (Wikipedia, DBpedia, Wikidata)

Project description

kgdata PyPI Documentation

KGData is a library to process dumps of Wikipedia, Wikidata. What it can do:

  • Clean up the dumps to ensure the data is consistent (resolve redirect, remove dangling references)
  • Create embedded key-value databases to access entities from the dumps.
  • Extract Wikidata ontology.
  • Extract Wikipedia tables and convert the hyperlinks to Wikidata entities.
  • Create Pyserini indices to search Wikidata’s entities.
  • and more

For a full documentation, please see the website.

Installation

From PyPI (using pre-built binaries):

pip install kgdata[spark]   # omit spark to manually specify its version if your cluster has different version

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kgdata-7.0.1.tar.gz (150.1 kB view hashes)

Uploaded Source

Built Distributions

kgdata-7.0.1-pp310-pypy310_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.0 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

kgdata-7.0.1-pp39-pypy39_pp73-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.0 MB view hashes)

Uploaded PyPy manylinux: glibc 2.17+ x86-64

kgdata-7.0.1-cp312-none-win_amd64.whl (2.2 MB view hashes)

Uploaded CPython 3.12 Windows x86-64

kgdata-7.0.1-cp312-cp312-manylinux_2_35_x86_64.whl (3.3 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.35+ x86-64

kgdata-7.0.1-cp312-cp312-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.0 MB view hashes)

Uploaded CPython 3.12 manylinux: glibc 2.17+ x86-64

kgdata-7.0.1-cp312-cp312-macosx_10_14_x86_64.macosx_11_0_arm64.macosx_10_14_universal2.whl (5.5 MB view hashes)

Uploaded CPython 3.12 macOS 10.14+ universal2 (ARM64, x86-64) macOS 10.14+ x86-64 macOS 11.0+ ARM64

kgdata-7.0.1-cp311-none-win_amd64.whl (2.2 MB view hashes)

Uploaded CPython 3.11 Windows x86-64

kgdata-7.0.1-cp311-cp311-manylinux_2_35_x86_64.whl (3.3 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.35+ x86-64

kgdata-7.0.1-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.0 MB view hashes)

Uploaded CPython 3.11 manylinux: glibc 2.17+ x86-64

kgdata-7.0.1-cp311-cp311-macosx_10_14_x86_64.macosx_11_0_arm64.macosx_10_14_universal2.whl (5.5 MB view hashes)

Uploaded CPython 3.11 macOS 10.14+ universal2 (ARM64, x86-64) macOS 10.14+ x86-64 macOS 11.0+ ARM64

kgdata-7.0.1-cp310-none-win_amd64.whl (2.2 MB view hashes)

Uploaded CPython 3.10 Windows x86-64

kgdata-7.0.1-cp310-cp310-manylinux_2_35_x86_64.whl (3.3 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.35+ x86-64

kgdata-7.0.1-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.0 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

kgdata-7.0.1-cp310-cp310-macosx_10_14_x86_64.macosx_11_0_arm64.macosx_10_14_universal2.whl (5.5 MB view hashes)

Uploaded CPython 3.10 macOS 10.14+ universal2 (ARM64, x86-64) macOS 10.14+ x86-64 macOS 11.0+ ARM64

kgdata-7.0.1-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (4.0 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page