Skip to main content

Python library to validate Wikidata items.

Project description

Entityshape

A python library to compare a wikidata item with an entityschema

Based on https://github.com/Teester/entityshape by Mark Tully and https://github.com/dpriskorn/PyEntityshape by Dennis Priskorn

Features

  • compare a given wikidata item with an entityschema and dig into missing properties, too many statement, etc.
  • determine whether an item is valid according to a certain schema or not

Limitations

The shape and compareshape classes currently only support:

  • cardinality (too many or not enough values)
  • whether the property is allowed or not
  • whether the value of a statement on a given property is correct/incorrect

It is still a bit unclear if and how the qualifier validation works.

Only Wikidata is supported currently when fetching labels for the result. If you need support for other Wikibase installations, comment here.

Installation

Get it from pypi

$ pip install pyentityshape

Usage

Jupyter Notebooks

Example notebooks with code for validation of multiple items: hiking paths campsites shelters

CLI

Example:

e = EntityShape(eid="E1", lang="en", qid="Q1")
result = e.validate_and_get_result()
# Get human readable result
print(result)
"Valid: False\nProperties_without_enough_correct_statements: instance of (P31)"
# Access the data
print(result.properties_without_enough_correct_statements)
"{'P31'}"

Validation

The is_valid method on the Result object mimics all red warnings displayed by https://www.wikidata.org/wiki/User:Teester/EntityShape.js

It currently checks these five conditions that all have to be false for the item to be valid:

  1. properties with too many statements found
  2. incorrect statements found
  3. some required properties are missing
  4. properties without enough correct statements found
  5. statements with properties that are not allowed found

Known working schemas

This library currently only supports a subset of all features in the ShEx specification.

The following Entity Schemas are known to work:

Background

This library is the glue between libraries like Wikibase Integrator and entityschemas.

It makes it easy to batch check a whole subset of Wikidata items against a schema. Nice!

TODO

The CompareShape and Shape classes should be rewritten using OOP and enums to avoid passing strings around because that is not nice to debug or maintain.

What do we want to know from the CompareShape class?

On the property level:

  • whether the property is mandatory and present/missing

On the statement level

  • whether the cardinality of values is allowed (min/max)
  • whether the value(s) are correct/incorrect

Cases:

  • mandatory property is missing
  • optional property is missing (this is not invalidating)
  • a property has an incorrect value
  • a property has a correct value
  • a property has too many values
  • a property has not enough values
  • ?

ShEx Tip

When working on your Entity Schemas the constraints here are nice to know/remember https://shex.io/shex-primer/#tripleConstraints

Thanks

Big thanks to Myst and Christian Clauss for advice and help with Ruff to make this better.

License

GPLv3+

What I learned

  • Forking other peoples undocumented spaghetti code is not much fun.
  • I want to find a more reliable validator that support somevalue and novalue
  • Pydantic is wonderful yet again it makes working with OOP easy peasy :)
  • Ruff is crazy fast and very nice!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

entityshape-0.1.1.tar.gz (50.5 kB view details)

Uploaded Source

Built Distribution

entityshape-0.1.1-py3-none-any.whl (26.2 kB view details)

Uploaded Python 3

File details

Details for the file entityshape-0.1.1.tar.gz.

File metadata

  • Download URL: entityshape-0.1.1.tar.gz
  • Upload date:
  • Size: 50.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.3 Linux/6.1.38-2-lts

File hashes

Hashes for entityshape-0.1.1.tar.gz
Algorithm Hash digest
SHA256 a05cc2f71f2aa829c577445b3be57e4b3712af226e9f999d000468a027427be8
MD5 2f7d164ab62fe36ae8bf9164f6af991d
BLAKE2b-256 78a3360f408b5aa111f420981c3b1d200825b2880bc677bad3c0463d24c241aa

See more details on using hashes here.

File details

Details for the file entityshape-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: entityshape-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 26.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.5.1 CPython/3.11.3 Linux/6.1.38-2-lts

File hashes

Hashes for entityshape-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 31b8454bc667f994abf897ff08bd1e40a7b24c758b45db5eb3d4a24ad23e90c3
MD5 587068ba4a9538e57cd7f84162511a6b
BLAKE2b-256 7b54ac1d9fae7e7a7b3ad2dd2045e74d1e3f69f0ecf316f49bb186e5eb612d26

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page