Implementation of a subset of R2RML and RML
Project description
Tiny RML
The package tinyrml is an implementation of a subset of RML and R2RML with some helpful extended features. It is intended to be used as a Python package/library, and accepts Python iterables (of dicts) as input. It has the following limitations:
- Mappings cannot specify their sources (tables or SQL queries, or RML's logical sources). Instead, data sources are assigned externally when data is mapped.
- None of the join-related features are supported. Only a single tabular data source can be mapped at a time.
- Inverse expressions (
rr:inverseExpression) are not supported, but similar functionality can be achieved viarre:expression(see below). - Named graphs are not supported.
The package supports the following extensions to R2RML (note that a special namespace rre: is reserved for these extensions):
- A
dictkey whose value is a Python list is automatically expanded as multiple values/rows. - Object maps accept the property
rre:expandAsList; if true, the value (which is assumed to be a string) is split (usingre.split) with commas and semicolons acting as separators, and expanded as multiple values/rows. This makes it possible to (say) have a comma-separated list as a quoted value in a cell in your CSV file, read the file usingcsv.DictReader, and expand the list as separate values. Splitting and expansion happens only ifrr:templatehas a value in the object map (in cases where you would have usedrr:column, you can instead introduce a template like{field}). - Term maps accept the property
rre:expression, the value of which is a string containing a Python expression. During the mapping process, this expression is evaluated with dict keys ("column names") as variables in the expression. - Also object maps accept
rr:class, allowing objects of generated triples to be typed (the R2RML only supports this for subject maps). - In term maps,
rr:columnandrml:referencecan be used interchangeably (in the original RML specificationrr:columnis reserved for SQL data sources whereasrml:referenceis used for all other sources, but very much in a similar role). See notes on template formatting below.
Tiny RML was originally part of rdfhelpers, but is now split off as its own project. It has no dependencies to rdfhelpers.
Installation
Tiny RML can be installed from PyPI:
pip install tinyrml
Usage
Tiny RML exposes the class Mapper which is the basic implementation of the mapping functionality. Instances of Mapper represent individual mappings (i.e., specific mapping definitions). The class constructor takes the following parameters:
mapping: a graph (anrdflib.Graph) containing the mapping, or a path to a file which, when parsed, yields the mapping graph. This is a required (positional) parameter, the rest are optional.triples_map_uri=, when provided (as aURIRef), identifies the actual triples map to be used. This is useful when the mapping graph contains several mappings. If the parameter is not provided,Mapperwill pick the first triples map it finds, and because of the way RDF is parsed and subsequently accessed, this may or may not be the lexically first triples map in the source file.ignore_field_keys=is a set of names of keys/fields that are ignored when determining the likely candidate for a key in a template. It defaults to an empty set.empty_string_is_none=, whenTrue(the default), makes the mapper treat empty strings as missing values.allow_expressions=, whenTrue(the default), lets the mapper use Python expressions embedded in the mapping graph.global_bindings=, when provided, is passed to theeval()function (as the parameterglobals=; see Python documentation) when embedded Python expressions are evaluated. If not provided, "global globals" (default global bindings) are used.allow_object_map_classes=, whenTrue(the default), lets mappings specifyrr:classproperties for object maps also (the R2RML specification only allows those for subject maps).input_is_json=, whenTrue(it defaults toFalse), allows the processed input data to consists of JSON objects - say, objects fromjson.load(). The objects are "flattened" so that simplistic JSONPath references (e.g.,a.b.c) can be used in mappings. The flattening is done using the methodMapper.flatten()(see below).
The method Mapper.process(self, rows, result_graph=) invokes a mapper. The parameter rows is an iterable of dicts used as the "rows" to be mapped; dictionary keys take the role of column names. If provided, result_graph= is a graph where results are added; otherwise a new graph is created. Regardless, the result graph is returned.
The method Mapper.processCSVFile(self, source, result_graph=, skip_unicode_marker=) takes a CSV file (provided as the parameter source and passed to open) and maps its contents. The parameter result_graph is passed to process. If skip_unicode_marker is True (the default), the initial character in the source file is skipped (otherwise it becomes part of the name of the first column). The result graph is returned.
The package exposes RR, RML, and RRE as the namespaces (instances of rdflib.Namespace) for R2RML, RML, and the Tiny RML extensions, respectively. By convention, we use the prefixes rr:, rml:, and rre: for these.
Template Formatting
Template strings (values of rr:template) do not support full JSONPath references. Paths like a.b.c are supported (see below); other features of JSONPath will be added in the future. The template mechanism is currently implemented using the string.Formatter class, so technically the format string syntax is available; this is likely to change in the future, though.
JSON object "flattening"
JSON objects, when processed, are first "flattened" into non-nested dicts. For example, the object
{ "a": {"b": 1}, "c": 2 }
becomes
{ "a.b": 1, "c": 2 }
and now the simplistic JSONPath "a.b" could be used in templates as a field reference.
"Flattening" is done using the method Mapper.flatten() which subclasses of Mapper can override if they so choose.
Recipies
If you have an RDF source file (say, mappings.ttl) with multiple mappings (i.e., triples maps), you can parse the file and create multiple Mapper instances. For example, assuming triples maps ex:tm_1 and ex:tm_2 (corresponding to EX.tm_1 and EX.tm_2), you could do this:
mappings = rdflib.Graph()
mappings.parse("mappings.ttl")
mapping_1 = tinyrml.Mapper(mappings, triples_map_uri=EX.tm_1)
mapping_2 = tinyrml.Mapper(mappings, triples_map_uri=EX.tm_2)
To create an rdflib.Composable instance by mapping some tabular data, you can do the following (assuming mapper contains a Mapper instance and rows contains data to be mapped):
composable = rdflib.Composable(mapper.process(rows))
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tinyrml-0.4.0.tar.gz.
File metadata
- Download URL: tinyrml-0.4.0.tar.gz
- Upload date:
- Size: 12.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bc030478eea85f9242e1952398c0570d136a21d08327b65069ae2092b8d47fc9
|
|
| MD5 |
4cba34b36f778cb6fbaba16258ebb608
|
|
| BLAKE2b-256 |
ba645e28ed7b27a8850b2da8e7514658ae5754f7e111d7afdc6f3bca2a475c1d
|
File details
Details for the file tinyrml-0.4.0-py3-none-any.whl.
File metadata
- Download URL: tinyrml-0.4.0-py3-none-any.whl
- Upload date:
- Size: 9.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6578bc6317f1081d39b9b95dd51cc6f8151d1e1305647ca279a29e570e1e4673
|
|
| MD5 |
9c16572fb6b7e92afebe14baccb06315
|
|
| BLAKE2b-256 |
144bf133453cc81db1fb3cf2c36137056fd4ee3e715443ca7bd4cbc3ce5871a6
|