Skip to main content

Analyse an RDF graph to find URI's without human readable labels.

Project description

= labelify

labelify is a Python module and command line utility that identifies unlabelled resources in a graph.
It is highly configurable and works on a number of different RDF data sources.

== Installation

[source,shell]
----
pip install git+https://github.com/Kurrawong/labelify
----

== Command Line Usage

Find all missing labels in `myOntology.ttl:`

[source,shell]
----
labelify myOntology.ttl
----

Find missing labels for all the predicates (not subjects or objects) in `myOntology.ttl:`

[source,shell]
----
labelify myOntology.ttl --nodetype predicates
----

Find all missing labels in `myOntology.ttl` taking into account the labels which have been defined in
another file called `supportingVocab.ttl`.

_but don't check for missing labels in `supportingVocab.ttl`_

[source,shell]
----
labelify myOntology.ttl --context supportingVocab.ttl
----

Same as above but use the additional labelling predicates given in `myLabellingPredicates.txt.`

_By default only rdfs:label is used as a labelling predicate._

[source,shell]
----
labelify myOntology.ttl --context supportingVocab.ttl --labels myLabellingPredicates.txt
----

Where `myLabellingPredicates.txt` is a list of labelling predicates (one per line and unprefixed):

[source,txt]
----
http://www.w3.org/2004/02/skos/core#prefLabel
http://schema.org/name
----

Find all the missing labels in the subgraph `http://example-graph`
at the sparql endpoint `http://mytriplestore/sparql` using basic HTTP auth to connect.

labelify will prompt for the password or it can be provided with the `--password` flag if you dont
mind it being saved to the shell history.

[source,shell]
----
labelify http://mytriplestore/sparql --graph http://example-graph --username admin
----

=== Label Extraction

Get all the IRIs with missing labels from a local RDF file and put them into a text file with an IRI per line:

[source,shell]
----
labelify -n all my_file.ttl -r > iris-missing-labels.txt
----

_note use of `-r` for simple IRI printing_

Use the output file to generate an RDF file containing the labes, extracted from either another RDF file, a directory of RDF files or a SPARQL endpoint:

[source,shell]
----
labelify -x iris-missing-labels.txt other-rdf-file.ttl > labels.ttl
# or
labelify -x iris-missing-labels.txt dir-of-rdf-files/ > labels.ttl
# or
labelify -x iris-missing-labels.txt http://some-sparql-endpoint.com/sparql > labels.ttl
----

== Command line output formats

By default, labelify will print helpful progress and configuration messages and attempt to group the
missing labels by namespace, making it easier to quickly parse the output.

The `--raw/-r` option can be appended to any of the examples above to tell labelify to only print the
uris of objects with missing labels (one per line) and no other messages. This is useful for command
line composition if you wish to pipe the output into another process.

== More command line options

For more help and the complete list of command line options just run `labelify --help/-h`

As per unix conventions all the flags shown above can also be used with short codes.
i.e. `-g` is the same as `--graph`.

== Usage as a module

Print missing labels for all the objects (not subjects or predicates) in `myOntology.ttl`, taking into account any labels which have been defined in RDF files in the `supportingVocabs` directory.

Using `skos:prefLabel` and `rdfs:label`, but not `dcterms:title` and `schema:name` (as per default) as the labelling predicates.

[source,python]
----
from labelify import find_missing_labels
from rdflib import Graph
from rdflib.namespace import RDFS, SKOS
import glob

graph = Graph().parse("myOntology.ttl")
context_graph = Graph()
for context_file in glob.glob("supportingVocabs/*.ttl"):
context_graph.parse(context_file)
labelling_predicates = [SKOS.prefLabel, RDFS.label]
nodetype = "objects"

missing_labels = find_missing_labels(
graph,
context_graph,
labelling_predicates,
nodetype
)
print(missing_labels)
----

and, to extract labels, descriptions & seeAlso details for given IRIs from a given directory of RDF files:

[source,python]
----
from pathlib import Path
from labelify import extract_labels

iris = Path("tests/get_iris/iris.txt").read_text().splitlines()
lbls_graph = extract_labels(Path("tests/one/background/"), iris)
----

== Development

=== Installing from source

Clone the repository and install the dependencies

_labelify uses https://python-poetry.org/[Poetry] to manage its dependencies._

[source,shell]
----
git clone git@github.com:Kurrawong/labelify.git
cd labelify
poetry install
----

You can then use labelify from the command line

[source,shell]
----
poetry shell
python labelify/ ...
----

=== Running tests

[source,shell]
----
poetry run pytest
----

=== Formatting the codebase

[source,shell]
----
poetry run black . && poetry run ruff check --fix labelify/
----

== License

https://opensource.org/license/bsd-3-clause/[BSD-3-Clause], if anyone is asking.

== Contact

*KurrawongAI* +
info@kurrawong.ai +
https://kurrawong.ai

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

labelify-0.3.1.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

labelify-0.3.1-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file labelify-0.3.1.tar.gz.

File metadata

  • Download URL: labelify-0.3.1.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.13.1 Darwin/24.1.0

File hashes

Hashes for labelify-0.3.1.tar.gz
Algorithm Hash digest
SHA256 cd0f2877cdfb1dae8cb2d402448989a9ecb665354026a4b7af496a273cbf676f
MD5 8da264fa1beb7d2430ed1114ea063c99
BLAKE2b-256 ed8d72368ec15a5f996e37c3b1117853482f34509264b2a18648eceeded06d8e

See more details on using hashes here.

File details

Details for the file labelify-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: labelify-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.5 CPython/3.13.1 Darwin/24.1.0

File hashes

Hashes for labelify-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b3d48a2802f90774b74a0206e0647e35928dcbf14015d662dd6d372dbbd6e6f5
MD5 f0be57f1e149c53c558f5116fa94d53b
BLAKE2b-256 3d92c2cff5311a10626932bf33d2b3d41976773b092b3e3d81c956eefa8286e2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page