A python library for retrieving semantic prefix maps
Project description
prefixmaps
A Python library for retrieving semantic prefix maps.
A semantic prefix map will map a a prefix (e.g. skos
) to a namespace (e.g http://www.w3.org/2004/02/skos/core#
).
This repository and the corresponding library is designed to satisfy the following requirements:
- generation of prefix maps in headers of RDF documents
- use in tools that expand CURIEs and short-form identifiers to URIs that can be used as subjects of RDF triples
- coverage of prefixes from multiple different domains
- no single authoritative source of either prefixes or prefix-namespace mappings (clash-resilient)
- preferred semantic namespace is prioritized over web URLs
- authority preferred prefix is prioritized where possible
- each individual prefix map is case-insensitive bijective
- prefix map composition and custom ordering of prefixmaps
- lightweight / low footprint
- fast (TODO)
- network-independence / versioned prefix maps
- optional ability to retrieve latest from external authority on network
What this is NOT intended for:
- a general source of metadata about either prefixes or namespaces
- a mechanism for resolving identifiers to web URLs for humans to find information
Installation
pip install prefixmaps
Usage
To use in combination with curies library:
from prefixmaps import load_converter
from curies import Converter
converter: Converter = load_converter(["obo", "bioregistry.upper", "linked_data", "prefixcc"])
>>> converter.expand("CHEBI:1")
'http://purl.obolibrary.org/obo/CHEBI_1'
>>> converter.expand("GEO:1")
'http://purl.obolibrary.org/obo/GEO_1'
>>> converter.expand("owl:Class")
'http://www.w3.org/2002/07/owl#Class'
>>> converter.expand("FlyBase:FBgn123")
'http://identifiers.org/fb/FBgn123'
Alternate orderings / clash resilience
- prefix.cc uses the prefix
geo
for geosparqlhttp://www.opengis.net/ont/geosparql#
- OBO uses prefix
GEO
for the Geographical Entity Ontology, expanding tohttp://purl.obolibrary.org/obo/GEO_
- the Bioregistry uses the prefix
geo
for NCBI GEO, and "re-mints" ageogeo
prefix for the OBO ontology
If we prioritize prefix.cc the OBO prefix is ignored:
converter = load_converter(["prefixcc", "obo"])
>>> converter.expand("GEO:1")
>>> converter.expand("geo:1")
'http://www.opengis.net/ont/geosparql#1'
Even though prefix expansion is case-sensitive, we intentionally block conflicts that differ only in case.
If we push bioregistry
at the start of the list then GEOGEO can be used as the prefix for the OBO ontology:
converter = load_converter(["bioregistry", "prefixcc", "obo"])
>>> converter.expand("geo:1")
'http://identifiers.org/geo/1'
>>> converter.expand("GEO:1")
>>> converter.expand("GEOGEO:1")
'http://purl.obolibrary.org/obo/GEO_1'
Note that from the OBO perspective, GEOGEO is non-canonical.
We get similar results using the upper-normalized variant of bioregistry
:
converter = load_converter(["bioregistry.upper", "prefixcc", "obo"])
>>> converter.expand("GEO:1")
'http://identifiers.org/geo/1'
>>> converter.expand("geo:1")
>>> converter.expand("GEOGEO:1")
'http://purl.obolibrary.org/obo/GEO_1'
Users of OBO ontologies will want to place OBO at the start of the list:
converter = load_converter(["obo", "bioregistry.upper", "prefixcc"])
>>> converter.expand("geo:1")
>>> converter.expand("GEO:1")
'http://purl.obolibrary.org/obo/GEO_1'
>>> converter.expand("GEOGEO:1")
Note under this ordering there is no prefix for NCBI GEO. This is not a major limitation as there is no canonical semantic rendering of NCBI GEO. This could be added in future with a unique OBO prefix.
You can use the ready-made "merged" prefix set, which prioritizes OBO:
converter = load_converter("merged")
>>> converter.expand("GEOGEO:1")
>>> converter.expand("GEO:1")
'http://purl.obolibrary.org/obo/GEO_1'
>>> converter.expand("geo:1")
Network independence and requesting latest versions
By default, this will make use of metadata distributed alongside the package. This has certain advantages in terms of reproducibility, but it means if a new ontology or prefix is added to an upstream source you won't see this.
To refresh and use the latest upstream:
converter = load_converter("obo", refresh=True)
This will perform a fetch from http://obofoundry.org/registry/obo_prefixes.ttl
Context Metadata
See the description fields
Repository organization
Data files containing pre-build prefix maps using sources like OBO and Bioregistry are distributed alongside the python
Location:
CSV field descriptions
- context: a unique handle for this context. This MUST be the same as the basename of the file
- prefix: corresponds to http://www.w3.org/ns/shacl#prefix
- namespace: corresponds to http://www.w3.org/ns/shacl#namespace
- canonical: true if this satisfies bijectivity
Refreshing the Data
The data can be refreshed in several ways:
-
Locally, you can use
tox
with:pip install tox tox-poetry tox -e refresh
-
Manually running and automatically committing via this GitHub Actions workflow.
-
Running makefile (warning, this requires some pre-configuration
make etl
TODO: make a github action that auto-releases new versions
Note that PRs should not be made against the individual CSV files. These are generated from upstream sources.
We temporarily house a small number of curated prefixmaps such as linked_data.yaml, with the CSV generated from the YAML.
Our goal is to ultimately cede these to upstream sources.
Requesting new prefixes
This repo is NOT a prefix registry. Its job is simply to aggregate different prefix maps. Request changes upstream.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for prefixmaps-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a0f87077ac1651f1b386c178b80cd2edc294a6b7b628b30143ac98a1dae6f0b7 |
|
MD5 | 682c4e2b13e968cfaea4de6b6e66a832 |
|
BLAKE2b-256 | 2cf8d7a81394d5969e031feddc534d036aa36ffdb70fede5d45386a114d32023 |