Skip to main content

Implementation of a subset of R2RML

Project description

Tiny RML

The package tinyrml is an implementation of a subset of RML/R2RML with some helpful extended features. It is intended to be used as a Python package/library, and accepts Python iterables (of dicts) as input. It has the following limitations:

  • Mappings cannot specify their sources (tables or SQL queries). Data sources are assigned externally when data is mapped.
  • None of the join-related features are supported. Only a single data source can be mapped at a time.

The package supports the following extensions to R2RML (note that a special namespace rre: is reserved for extensions):

  • A dict key whose value is a Python list is expanded as multiple values/rows.
  • Object maps accept the property rre:expandAsList; if true, the value (which is assumed to be a string) is split (using re.split) with commas and semicolons acting as separators, and expanded as multiple values/rows. This makes it possible to (say) have a comma-separated list in your CSV file, read the file using csv.DictReader, and expand the list as separate values. Splitting and expansion happens only if rr:template has a value in the object map.
  • Term maps accept the property rre:expression, the value of which is a string containing a Python expression. During the mapping process, this expression is evaluated with dict keys ("column names") as variables in the expression.

Tiny RML was originally part of rdfhelpers, but is now split off as its own project. It has no dependencies to rdfhelpers.

Installation

Tiny RML can be installed from PyPI:

pip install tinyrml

Usage

Tiny RML exposes the class Mapper which is the basic implementation of the mapping functionality. Instances of Mapper represent individual mappings (i.e., specific mapping definitions). The class constructor takes the following parameters:

  • mapping: a graph (an rdflib.Graph) containing the mapping, or a path to a file which, when parsed, yields the mapping graph. This is a required (positional) parameter, the rest are optional.
  • triples_map_uri=, when provided (as a URIRef), identifies the actual triples map to be used. This is useful when the mapping graph contains several mappings.
  • ignore_field_keys= is a set of names of keys/fields that are ignored when determining the likely candidate for a key in a template. It defaults to an empty set.
  • empty_string_is_none=, when True (the default), makes the mapper treat empty strings as missing values.
  • allow_expressions=, when True (the default), lets the mapper use Python expressions embedded in the mapping graph.
  • global_bindings=, when provided, is passed to the eval() function (as the parameter globals=; see Python documentation) when embedded Python expressions are evaluated. If not provided, "global globals" (default global bindings) are used.
  • allow_object_map_classes=, when True (the default), lets mappings specify rr:class properties for object maps also (the R2RML specification only allows those for subject maps).

The method Mapper.process(self, rows, result_graph=) invokes a mapper. The parameter rows is an iterable of dicts used as the "rows" to be mapped; dictionary keys take the role of column names. If provided, result_graph= is a graph where results are added; otherwise a new graph is created. Regardless, the result graph is returned.

The package exposes RR and RRE as the namespaces for R2RML and the Tiny RML extensions, respectively. By convention, we use the prefixes rr: and rre: for these.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tinyrml-0.1.2.tar.gz (6.5 kB view hashes)

Uploaded Source

Built Distribution

tinyrml-0.1.2-py3-none-any.whl (6.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page