Skip to main content

Analyse an RDF graph to find URI's without human readable labels.

Project description

labelify

labelify is a Python module and command line utility that identifies unlabelled resources in a graph. It is highly configurable and works on a number of different RDF data sources.

Installation

labelify is on PyPI at https://pypi.org/project/labelify/ so:

pip install labelify

or

poetry add labelify

will install it.

To install from it's version control repo, for the latest unstable release:

pip install git+https://github.com/Kurrawong/labelify

Command Line Usage

Find all missing labels in myOntology.ttl:

labelify myOntology.ttl

Find missing labels for all the predicates (not subjects or objects) in myOntology.ttl:

labelify myOntology.ttl --nodetype predicates

Find all missing labels in myOntology.ttl taking into account the labels which have been defined in another file called supportingVocab.ttl.

but don’t check for missing labels in supportingVocab.ttl

labelify myOntology.ttl --context supportingVocab.ttl

Same as above but use the additional labelling predicates given in myLabellingPredicates.txt.

By default only rdfs:label is used as a labelling predicate.

labelify myOntology.ttl --context supportingVocab.ttl --labels myLabellingPredicates.txt

Where myLabellingPredicates.txt is a list of labelling predicates (one per line and unprefixed):

http://www.w3.org/2004/02/skos/core#prefLabel
http://schema.org/name

Find all the missing labels in the subgraph http://example-graph at the sparql endpoint http://mytriplestore/sparql using basic HTTP auth to connect.

labelify will prompt for the password or it can be provided with the --password flag if you dont mind it being saved to the shell history.

labelify http://mytriplestore/sparql --graph http://example-graph --username admin

Label Extraction

Get all the IRIs with missing labels from a local RDF file and put them into a text file with an IRI per line:

labelify -n all my_file.ttl -r > iris-missing-labels.txt

note use of -r for simple IRI printing

Use the output file to generate an RDF file containing the labes, extracted from either another RDF file, a directory of RDF files or a SPARQL endpoint:

labelify -x iris-missing-labels.txt other-rdf-file.ttl > labels.ttl
# or
labelify -x iris-missing-labels.txt dir-of-rdf-files/ > labels.ttl
# or
labelify -x iris-missing-labels.txt http://some-sparql-endpoint.com/sparql > labels.ttl

Command line output formats

By default, labelify will print helpful progress and configuration messages and attempt to group the missing labels by namespace, making it easier to quickly parse the output.

The --raw/-r option can be appended to any of the examples above to tell labelify to only print the uris of objects with missing labels (one per line) and no other messages. This is useful for command line composition if you wish to pipe the output into another process.

More command line options

For more help and the complete list of command line options just run labelify --help/-h

As per unix conventions all the flags shown above can also be used with short codes. i.e. -g is the same as --graph.

Usage as a module

Print missing labels for all the objects (not subjects or predicates) in myOntology.ttl, taking into account any labels which have been defined in RDF files in the supportingVocabs directory.

Using skos:prefLabel and rdfs:label, but not dcterms:title and schema:name (as per default) as the labelling predicates.

from labelify import find_missing_labels
from rdflib import Graph
from rdflib.namespace import RDFS, SKOS
import glob

graph = Graph().parse("myOntology.ttl")
context_graph = Graph()
for context_file in glob.glob("supportingVocabs/*.ttl"):
    context_graph.parse(context_file)
labelling_predicates = [SKOS.prefLabel, RDFS.label]
nodetype = "objects"

missing_labels = find_missing_labels(
    graph,
    context_graph,
    labelling_predicates,
    nodetype
)
print(missing_labels)

and, to extract labels, descriptions & seeAlso details for given IRIs from a given directory of RDF files:

from pathlib import Path
from labelify import extract_labels

iris = Path("tests/get_iris/iris.txt").read_text().splitlines()
lbls_graph = extract_labels(Path("tests/one/background/"), iris)

Development

Installing from source

Clone the repository and install the dependencies

labelify uses Poetry to manage its dependencies.

git clone git@github.com:Kurrawong/labelify.git
cd labelify
poetry install

You can then use labelify from the command line

poetry shell
python labelify/ ...

Running tests

poetry run pytest

Several of the tests require a Fuseki triplestore instance to be available, so you need Docker running as the tests will attempt to use testcontainers to create throwaway containers for this purpose.

Formatting the codebase

poetry run black . && poetry run ruff check --fix labelify/

License

BSD-3-Clause, if anyone is asking.

Contact

KurrawongAI
info@kurrawong.ai
https://kurrawong.ai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

labelify-0.3.9.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

labelify-0.3.9-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file labelify-0.3.9.tar.gz.

File metadata

  • Download URL: labelify-0.3.9.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.0 CPython/3.13.1 Darwin/24.1.0

File hashes

Hashes for labelify-0.3.9.tar.gz
Algorithm Hash digest
SHA256 c62e8dfebb8d05f4462ae7af62b4763e448c443afebd2570a69a9a84480f82c1
MD5 cf63be1f4b1fac5a9cb871db86d467a6
BLAKE2b-256 acfb717e19f5cedb8681c7b5f99b881fce623f87308df6c7eeb9c8b93811cc6e

See more details on using hashes here.

File details

Details for the file labelify-0.3.9-py3-none-any.whl.

File metadata

  • Download URL: labelify-0.3.9-py3-none-any.whl
  • Upload date:
  • Size: 10.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.0.0 CPython/3.13.1 Darwin/24.1.0

File hashes

Hashes for labelify-0.3.9-py3-none-any.whl
Algorithm Hash digest
SHA256 346bc2273b7ec840b88fd6adce436229616878e138d590562cc19a268aee7552
MD5 501bcf6abd029bdb0998ccfacaf5ef74
BLAKE2b-256 07c03fe0380d2f626ffdbee64d43196895f34a7cd3d2b9cdd1efff1f6c152180

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page