Analyse an RDF graph to find URI's without human readable labels.
Project description
labelify
labelify is a Python module and command line utility that identifies unlabelled resources in a graph. It is highly configurable and works on a number of different RDF data sources.
Installation
labelify is on PyPI at https://pypi.org/project/labelify/ so:
pip install labelify
or
poetry add labelify
will install it.
To install from it's version control repo, for the latest unstable release:
pip install git+https://github.com/Kurrawong/labelify
Command Line Usage
Find all missing labels in myOntology.ttl:
labelify myOntology.ttl
Find missing labels for all the predicates (not subjects or objects) in
myOntology.ttl:
labelify myOntology.ttl --nodetype predicates
Find all missing labels in myOntology.ttl taking into account the
labels which have been defined in another file called
supportingVocab.ttl.
but don’t check for missing labels in supportingVocab.ttl
labelify myOntology.ttl --context supportingVocab.ttl
Same as above but use the additional labelling predicates given in
myLabellingPredicates.txt.
By default only rdfs:label is used as a labelling predicate.
labelify myOntology.ttl --context supportingVocab.ttl --labels myLabellingPredicates.txt
Where myLabellingPredicates.txt is a list of labelling predicates (one
per line and unprefixed):
http://www.w3.org/2004/02/skos/core#prefLabel
http://schema.org/name
Find all the missing labels in the subgraph http://example-graph at
the sparql endpoint http://mytriplestore/sparql using basic HTTP auth
to connect.
labelify will prompt for the password or it can be provided with the
--password flag if you dont mind it being saved to the shell history.
labelify http://mytriplestore/sparql --graph http://example-graph --username admin
Label Extraction
Get all the IRIs with missing labels from a local RDF file and put them into a text file with an IRI per line:
labelify -n all my_file.ttl -r > iris-missing-labels.txt
note use of -r for simple IRI printing
Use the output file to generate an RDF file containing the labes, extracted from either another RDF file, a directory of RDF files or a SPARQL endpoint:
labelify -x iris-missing-labels.txt other-rdf-file.ttl > labels.ttl
# or
labelify -x iris-missing-labels.txt dir-of-rdf-files/ > labels.ttl
# or
labelify -x iris-missing-labels.txt http://some-sparql-endpoint.com/sparql > labels.ttl
Command line output formats
By default, labelify will print helpful progress and configuration messages and attempt to group the missing labels by namespace, making it easier to quickly parse the output.
The --raw/-r option can be appended to any of the examples above to
tell labelify to only print the uris of objects with missing labels (one
per line) and no other messages. This is useful for command line
composition if you wish to pipe the output into another process.
More command line options
For more help and the complete list of command line options just run
labelify --help/-h
As per unix conventions all the flags shown above can also be used with
short codes. i.e. -g is the same as --graph.
Usage as a module
Print missing labels for all the objects (not subjects or predicates) in
myOntology.ttl, taking into account any labels which have been defined
in RDF files in the supportingVocabs directory.
Using skos:prefLabel and rdfs:label, but not dcterms:title and
schema:name (as per default) as the labelling predicates.
from labelify import find_missing_labels
from rdflib import Graph
from rdflib.namespace import RDFS, SKOS
import glob
graph = Graph().parse("myOntology.ttl")
context_graph = Graph()
for context_file in glob.glob("supportingVocabs/*.ttl"):
context_graph.parse(context_file)
labelling_predicates = [SKOS.prefLabel, RDFS.label]
nodetype = "objects"
missing_labels = find_missing_labels(
graph,
context_graph,
labelling_predicates,
nodetype
)
print(missing_labels)
and, to extract labels, descriptions & seeAlso details for given IRIs from a given directory of RDF files:
from pathlib import Path
from labelify import extract_labels
iris = Path("tests/get_iris/iris.txt").read_text().splitlines()
lbls_graph = extract_labels(Path("tests/one/background/"), iris)
Development
Installing from source
Clone the repository and install the dependencies
labelify uses Poetry to manage its dependencies.
git clone git@github.com:Kurrawong/labelify.git
cd labelify
poetry install
You can then use labelify from the command line
poetry shell
python labelify/ ...
Running tests
poetry run pytest
Several of the tests require a Fuseki triplestore instance to be available, so you need Docker running as the tests will attempt to use testcontainers to create throwaway containers for this purpose.
Formatting the codebase
poetry run black . && poetry run ruff check --fix labelify/
License
BSD-3-Clause, if anyone is asking.
Contact
KurrawongAI
info@kurrawong.ai
https://kurrawong.ai
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file labelify-0.3.9.tar.gz.
File metadata
- Download URL: labelify-0.3.9.tar.gz
- Upload date:
- Size: 8.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.0.0 CPython/3.13.1 Darwin/24.1.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c62e8dfebb8d05f4462ae7af62b4763e448c443afebd2570a69a9a84480f82c1
|
|
| MD5 |
cf63be1f4b1fac5a9cb871db86d467a6
|
|
| BLAKE2b-256 |
acfb717e19f5cedb8681c7b5f99b881fce623f87308df6c7eeb9c8b93811cc6e
|
File details
Details for the file labelify-0.3.9-py3-none-any.whl.
File metadata
- Download URL: labelify-0.3.9-py3-none-any.whl
- Upload date:
- Size: 10.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.0.0 CPython/3.13.1 Darwin/24.1.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
346bc2273b7ec840b88fd6adce436229616878e138d590562cc19a268aee7552
|
|
| MD5 |
501bcf6abd029bdb0998ccfacaf5ef74
|
|
| BLAKE2b-256 |
07c03fe0380d2f626ffdbee64d43196895f34a7cd3d2b9cdd1efff1f6c152180
|