Python library to validate Wikidata items.
Project description
Entityshape
A python library to compare a wikidata item with an entityschema
Based on https://github.com/Teester/entityshape by Mark Tully and https://github.com/dpriskorn/PyEntityshape by Dennis Priskorn
Features
- compare a given wikidata item with an entityschema and dig into missing properties, too many statement, etc.
- determine whether an item is valid according to a certain schema or not
Limitations
The shape and compareshape classes currently only support:
- cardinality (too many or not enough values)
- whether the property is allowed or not
- whether the value of a statement on a given property is correct/incorrect
It is still a bit unclear if and how the qualifier validation works.
Only Wikidata is supported currently when fetching labels for the result. If you need support for other Wikibase installations, comment here.
Installation
Get it from pypi
$ pip install pyentityshape
Usage
Jupyter Notebooks
Example notebooks with code for validation of multiple items: hiking paths campsites shelters
CLI
Example:
e = EntityShape(eid="E1", lang="en", qid="Q1")
result = e.validate_and_get_result()
# Get human readable result
print(result)
"Valid: False\nProperties_without_enough_correct_statements: instance of (P31)"
# Access the data
print(result.properties_without_enough_correct_statements)
"{'P31'}"
Validation
The is_valid method on the Result object mimics all red warnings displayed by https://www.wikidata.org/wiki/User:Teester/EntityShape.js
It currently checks these five conditions that all have to be false for the item to be valid:
- properties with too many statements found
- incorrect statements found
- some required properties are missing
- properties without enough correct statements found
- statements with properties that are not allowed found
Known working schemas
This library currently only supports a subset of all features in the ShEx specification.
The following Entity Schemas are known to work:
Background
This library is the glue between libraries like Wikibase Integrator and entityschemas.
It makes it easy to batch check a whole subset of Wikidata items against a schema. Nice!
TODO
The CompareShape and Shape classes should be rewritten using OOP and enums to avoid passing strings around because that is not nice to debug or maintain.
What do we want to know from the CompareShape class?
On the property level:
- whether the property is mandatory and present/missing
On the statement level
- whether the cardinality of values is allowed (min/max)
- whether the value(s) are correct/incorrect
Cases:
- mandatory property is missing
- optional property is missing (this is not invalidating)
- a property has an incorrect value
- a property has a correct value
- a property has too many values
- a property has not enough values
- ?
ShEx Tip
When working on your Entity Schemas the constraints here are nice to know/remember https://shex.io/shex-primer/#tripleConstraints
Thanks
Big thanks to Myst and Christian Clauss for advice and help with Ruff to make this better.
License
GPLv3+
What I learned
- Forking other peoples undocumented spaghetti code is not much fun.
- I want to find a more reliable validator that support somevalue and novalue
- Pydantic is wonderful yet again it makes working with OOP easy peasy :)
- Ruff is crazy fast and very nice!
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file entityshape-0.1.1.tar.gz
.
File metadata
- Download URL: entityshape-0.1.1.tar.gz
- Upload date:
- Size: 50.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.11.3 Linux/6.1.38-2-lts
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a05cc2f71f2aa829c577445b3be57e4b3712af226e9f999d000468a027427be8 |
|
MD5 | 2f7d164ab62fe36ae8bf9164f6af991d |
|
BLAKE2b-256 | 78a3360f408b5aa111f420981c3b1d200825b2880bc677bad3c0463d24c241aa |
File details
Details for the file entityshape-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: entityshape-0.1.1-py3-none-any.whl
- Upload date:
- Size: 26.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.5.1 CPython/3.11.3 Linux/6.1.38-2-lts
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 31b8454bc667f994abf897ff08bd1e40a7b24c758b45db5eb3d4a24ad23e90c3 |
|
MD5 | 587068ba4a9538e57cd7f84162511a6b |
|
BLAKE2b-256 | 7b54ac1d9fae7e7a7b3ad2dd2045e74d1e3f69f0ecf316f49bb186e5eb612d26 |