Skip to main content

Lightweight data-centric framework for working with scientific data

Project description

DLite

A lightweight data-centric framework for semantic interoperability

PyPi CI tests Documentation

Content

DLite is a lightweight interoperability framework, for working with and sharing scientific.

About DLite

DLite is a C implementation of the SINTEF Open Framework and Tools (SOFT), which is a set of concepts and tools for how to efficiently describe and work with scientific data.

All data in DLite is represented by an Instance, which is build on a simple data model. An Instance is identified by a unique UUID and have a set of named dimensions and properties. It is described by its Metadata. In the Metadata, each dimension is given a name and description (optional) and each property is given a name, type, shape (optional), unit (optional) and description (optional). The shape of a property refers to the named dimensions.

When an Instance is instantiated, you must suply a value to the named dimensions. The shape of the properties will be set according to that. This ensures that the shape of the properties are internally consistent.

A Metadata is also an Instance, and hence described by its meta-metadata. By default, DLite defines four levels of metadata; instance, metadata, metadata schema and basic metadata schema. The basic metadata schema describes itself, so no further meta levels are needed. The idea is if two different systems describes their data model in terms of the basic metadata schema, they can easily be made semantically interoperable.

The datamodel of DLite.

An alternative and more flexible way to enable interoperability is to use a common ontology. DLite provides a specialised Instance called Collection. A collection is essentially a container holding a set of Instances and relations between them. But it can also relate an Instance or even a dimension or property of an instance to a concept in an ontology. DLite allows to transparently map an Instance whos Metadata corresponding to a concept in one ontology to an Instance whos Metadata corresponding to a concept in another ontology. Such mappings can easily be registered (in C or Python) and reused, providing a very powerful system for achieving interoperability.

DLite provides also a common and extendable API for loading/storing Instances from/to different storages. New storage plugins can be written in C or Python.

See doc/concepts.md for more details.

DLite is licensed under the MIT license.

Example

Lets say that you have the following Python class

class Person:
    def __init__(self, name, age, skills):
        self.name = name
        self.age = age
        self.skills = skills

that you want to describe semantically. We do that by defining the following metadata (using json) identifying the Python attributes with dlite properties. Here we define name to be a string, age to be a float and skills to be an array of N strings, where N is a name of a dimension. The metadata uniquely identifies itself with the "name", "version" and "namespace" fields and "meta" refers the the metadata schema (meta-metadata) that this metadata is described by. Finally are human description of the metadata itself, its dimensions and its properties provide in the "description" fields.

{
  "name": "Person",
  "version": "0.1",
  "namespace": "http://onto-ns.com/meta",
  "meta": "http://onto-ns.com/meta/0.3/EntitySchema",
  "description": "A person.",
  "dimensions": [
    {
      "name": "N",
      "description": "Number of skills."
    }
  ],
  "properties": [
    {
      "name": "name",
      "type": "string",
      "description": "Full name."
    },
    {
      "name": "age",
      "type": "float",
      "unit": "year",
      "description": "Age of person."
    },
    {
      "name": "skills",
      "type": "string",
      "dims": ["N"],
      "description": "List of skills."
    }
  ]
}

We save the metadata in file "Person.json". Back in Python we can now make a dlite-aware subclass of Person, instantiate it and serialise it to a storage:

import dlite

# Create a dlite-aware subclass of Person
DLitePerson = dlite.classfactory(Person, url='json://Person.json')

# Instantiate
person = DLitePerson('Sherlock Holmes', 34., ['observing', 'chemistry',
    'violin', 'boxing'])

# Write to storage (here a json file)
person.dlite_inst.save('json://homes.json?mode=w')

To access this new instance from C, you can first generate a header file from the meta data

$ dlite-codegen -f c-header -o person.h Person.json

and then include it in your C program:

// homes.c -- sample program that loads instance from homes.json and prints it
#include <stdio.h>
#include <dlite.h>
#include "person.h"  // header generated with dlite-codegen

int main()
{
  /* URL of instance to load using the json driver.  The storage is
     here the file 'homes.json' and the instance we want to load in
     this file is identified with the UUID following the hash (#)
     sign. */
  char *url = "json://homes.json#315088f2-6ebd-4c53-b825-7a6ae5c9659b";

  Person *person = (Person *)dlite_instance_load_url(url);

  int i;
  printf("name:  %s\n", person->name);
  printf("age:   %g\n", person->age);
  printf("skills:\n");
  for (i=0; i<person->N; i++)
    printf("  - %s\n", person->skills[i]);

  return 0;
}

Now run the python file and it would create a homes.json file, which contains an entity information. Use the UUID of the entity from the homes.json file, and update the url variable in the homes.c file.

Since we are using dlite_instance_load_url() to load the instance, you must link to dlite when compiling this program. Assuming you are using Linux and dlite in installed in $HOME/.local, compiling with gcc would look like:

$ gcc homes.c -o homes -I$HOME/.local/include/dlite -L$HOME/.local/lib -ldlite -ldlite-utils

Or if you are using the development environment , you can compile using:

$ gcc -I/tmp/dlite-install/include/dlite -L/tmp/dlite-install/lib -o homes homes.c -ldlite -ldlite-utils

Finally you can run the program with

$ DLITE_STORAGES=*.json ./homes
name:  Sherlock Holmes
age:   34
skills:
  - observing
  - chemistry
  - violin
  - boxing

Note that we in this case have to define the environment variable DLITE_STORAGES in order to let dlite find the metadata we stored in 'Person.json'. There are ways to avoid this, e.g. by hardcoding the metadata in C using dlite-codegen -f c-source or in the C program explicitely load 'Person.json' before 'homes.json'.

This was just a brief example. There is much more to dlite. Since the documentation is still not complete, the best source is the code itself, including the tests and examples.

Main features

See doc/features.md for a more detailed list.

  • Enables semantic interoperability via simple formalised metadata and data
  • Metadata can be linked to or generated from ontologies
  • Code generation for simple integration in existing code bases
  • Plugin API for data storages (json, hdf5, rdf, yaml, postgresql, blob, csv...)
  • Plugin API for mapping between metadata
  • Bindings to C, Python and Fortran

Installing DLite

Installing with pip

If you are using Python, the easiest way to install DLite is with pip:

pip install DLite-Python

Note, currently only Linux versions for Python 3.7, 3.8, 3.9 and 3.10 are available. But Windows versions will soon be available.

Docker image

A docker image is available on https://github.com/SINTEF/dlite/packages.

Compile from sources

The sources can be cloned from GitHub

git clone git@github.com:SINTEF/dlite.git

Dependencies

Runtime dependencies

  • HDF5, optional, support v1.10+ (needed by HDF5 storage plugin)
  • librdf, optional (needed by RDF (Redland) storage plugin)
  • Python 3, optional (needed by Python bindings and some plugins)
    • tripper, required by the Python bindings
    • NumPy, required if Python is enabled
    • PyYAML, optional (used for generic YAML storage plugin)
    • psycopg2, optional (used for generic PostgreSQL storage plugin) Note that in some cases a GSSAPI error is raised when using psycopg2 by pip installing psycopg2-binary. This is solved by installing from source as described in their documentation.
    • pandas, optional (used for csv storage plugin)
    • pymongo, optional, (used for mongodb storage plugin)
    • mongomock, optional, used for testing mongodb storage plugin.

Build dependencies

  • cmake, required for building - note that cmake isntalled from pypi does not always work.
  • HDF5 development libraries, needed by HDF5 storage plugin.
  • Python 3 development libraries, needed by Python bindings.
  • NumPy development libraries, needed by Python bindings.
  • SWIG needed by building Python bindings.
  • Doxygen used for documentation generation.
  • Graphviz used for documentation generation.
  • valgrind, optional, used for memory checking (Linux only).
  • cppcheck, optional, used for static code analysis.
  • librdf development libraries, optional, needed by librdf storage plugin.

Compiling

Build and install with Python

Given you have a C compiler and Python correctly installed, you should be able to build and install dlite via the python/setup.py script:

cd python
python setup.py install

Build on Linux

Install dependencies (e.g. with apt-get install on Ubuntu or dnf install on Fedora)

Configure the build with:

mkdir build
cd build
cmake ..

Configuration options can be added to the cmake command. For example, you can change the installation directory by adding -DCMAKE_INSTALL_PREFIX=/path/to/new/install/dir. The default is ~/.local.

Alternatively, you can configure configuration options with ccmake ...

If you use virtual environments for Python, you should activate your environment before running cmake and set CMAKE_INSTALL_PREFIX to the directory of the virtual environment. For example:

VIRTUAL_ENV=/path/to/virtual/env
source $VIRTUAL_ENV/bin/activate
cmake -DCMAKE_INSTALL_PREFIX=$VIRTUAL_ENV -DWITH_DOC=YES ..

Build with:

make

To run the tests, do

ctest            # same as running `ctest`
make memcheck    # runs all tests with memory checking (requires
                 # valgrind)

To generate code documentation, do

make doc         # direct your browser to build/doc/html/index.html

To install dlite locally, do

make install

Note about VirtualEnvWrapper

By default, VirtualEnvWrapper does not set LD_LIBRARY_PATH. This will result in errors when running, for example, dlite-codegen in the example above. To fix this, after compiling and installing dlite, the user needs prepend/append $VIRTUAL_ENV/lib/ to LD_LIBRARY_PATH. This can be done by modifying the activate shell file, located at $WORKON_HOME/<envs_name>/bin/activate. First, the user should add

if ! [ -z "${_OLD_LD_LIBRARY_PATH}" ] ; then
    LD_LIBRARY_PATH="$_OLD_LD_LIBRARY_PATH"
    export LD_LIBRARY_PATH
    unset _OLD_LD_LIBRARY_PATH
fi

at the end of the deactivate function in the activate shell file. Next, add

export _OLD_LD_LIBRARY_PATH=$LD_LIBRARY_PATH
export LD_LIBRARY_PATH="$VIRTUAL_ENV/lib/:$LD_LIBRARY_PATH"

at the end of activate.

Explanation The value of LD_LIBRARY_PATH is exported (saved) into a new temporary environment variable, _OLD_LD_LIBRARY_PATH. $VIRTUAL_ENV/lib/ is then prepended to LD_LIBRARY_PATH. The if statement within the deactivate function checks whether the variable _OLD_LD_LIBRARY_PATH has been declared. If true, then the deactivate function will set LD_LIBRARY_PATH to its original value and unset the temporary environment variable _OLD_LD_LIBRARY_PATH.

Build with VS Code on Windows

See here for detailed instructions for building with Visual Studio.

Quick start with VS Code and Remote Container

Using Visual Studio Code it is possible to do development on the system defined in Dockerfile.

  1. Download and install Visual Studio Code.
  2. Install the extension Remote Development.
  3. Clone dlite and initialize git modules: git submodule update --init.
  4. Open the dlite folder with VS Code.
  5. Start VS Code, run the Remote-Containers: Open Folder in Container... command from the Command Palette (F1) or quick actions Status bar item. This will build the container and restart VS Code in it. This may take some time the first time as the Docker image must be built. See Quick start: Open an existing folder in a container for more information and instructions.
  6. In the container terminal, perform the first build and tests with mkdir /workspace/build; cd /workspace/build; cmake ../dlite; make && make test.

Build documentation

In order to reduce build dependencies for the causal user, DLite does not build documentation by default. Provide the -DWITH_DOC=YES option to cmake to build the documentation.

Build Python Documentation

DLite uses sphinx to generate documentation from Python source code. Ensure the correct virtual environment is set up and install the requirements pip install -r requirements_doc.txt

Build C Documentation

If you have [doxygen][11] installed, the html documentation should be generated as a part of the build process. It can be browsed by opening the following file in your browser:

<build>/doc/html/index.html

where <build> is your build folder. To only build the documentation, you can do:

cd build
cmake --build . --target doc

If you have LaTeX and make installed, you can also the latex documentation with

cd build
cmake --build . --target latex

which will produce the file

<build>/doc/latex/refman.pdf

Setting up the environment

As a dlite user it should be enough to do 'pip install Dlite-Python', or 'pip install .' from within the dlite/python directory.

As a developer it is more useful to install dlite from source. If dlite is installed in a non-default location, you may need to set the PATH, LD_LIBRARY_PATH, PYTHONPATH and DLITE_ROOT environment variables. See the documentation of environment variables for more details.

An example of how to install dlite as developer within a python environment in linux is given below. Make sure that all required dependencies are installed within the environment.

First activate the environment, e.g.:

source </path/to/dedicated/pythonenvironment>/bin/activate

Set the Python variables. The following should automatically find the correct python paths

Python3_ROOT=$(python3 -c 'import sys; print(sys.exec_prefix)')
Python3_VERSION=$(python3 -c 'import sys;\
print(str(sys.version_info.major)+"."\
+str(sys.version_info.minor))')
Python3_EXECUTABLE=${Python3_ROOT}/bin/python${Python3_VERSION}

Python variables for developement libraries must be set manually.

Python3_LIBRARY=</path/to/system>/libpython${Python3_VERSION}.so
Python3_INCLUDE_DIR=</path/to/system>/include/python${Python3_VERSION}

You may run find . -name libpython*.so to help find these paths.

Go into your dlite directory:

cd </path/to>/dlite

Build dlite:

mkdir build
cd build
cmake .. -DPython3_EXECUTABLE=$Python3_EXECUTABLE \
-DPython3_LIBRARY=$Python3_LIBRARY \
-DPython3_INCLUDE_DIR=$Python3_INCLUDE_DIR \
-DWITH_STATIC_PYTHON=FALSE \
-DCMAKE_INSTALL_PREFIX=$Python3_ROOT

Then install dlite

make
make install

Finally run tests

ctest

An example of how to use dlite is shown above. See also the examples in the examples directory for how to link to dlite from C and use of the Fortran bindings.

Short vocabulary

The following terms have a special meaning in dlite:

  • Basic metadata schema: Toplevel meta-metadata which describes itself.
  • Collection: A specialised instance that contains references to set of instances and relations between them. Within a collection instances are labeled. See also the SOFT5 nomenclauture.
  • Data instance: A "leaf" instance that is not metadata.
  • Entity: May be any kind of instance, including data instances, metadata instances or meta-metadata instances. However, for historical reasons it is often used for "standard" metadata that are instances of meta-metadata "http://onto-ns.com/meta/0.3/EntitySchema".
  • Instance: The basic data object in DLite. All instances are described by their metadata which itself are instances. Instances are identified by an UUID.
  • Mapping: A function that maps one or more input instances to an output instance. They are an important mechanism for interoperability. Mappings are called translators in SOFT5.
  • Metadata: a special type of instances that describe other instances. All metadata are immutable and has an unique URI in addition to their UUID.
  • Meta-metadata: metadata that describes metadata.
  • Relation: A subject-predicate-object triplet. Relations are immutable.
  • Storage: A generic handle encapsulating actual storage backends.
  • Transaction: An instance that has a reference to an immutable (frozen) parent instance is called a transaction. Transactions are very useful for ensuring data provenance and makes it easy to work with time series. Conceptually, they share many similarities with git. See also the SOFT5 nomenclauture.
  • uri: A uniform resource identifier (URI) is a generalisation of URL, but follows the same syntax rules. In dlite, the term "uri" is used as an human readable identifier for instances (optional for data instances) and has the form namespace/version/name.
  • url: A uniform resource locator (URL) is an reference to a web resource, like a file (on a given computer), database entry, web page, etc. In dlite url's refer to a storage or even an specific instance in a storage using the general syntax driver://location?options#fragment, where options and fragment are optional. If fragment is provided, it should be the uuid or uri of an instance.
  • uuid: A universal unique identifier (UUID) is commonly used to uniquely identify digital information. DLite uses the 36 character string representation of uuid's to uniquely identify instances. The uuid is generated from the uri for instances that has an uri, otherwise it is randomly generated.

Developer documentation

License

DLite is licensed under the MIT license. However, it include a few third party source files with other permissive licenses. All of these should allow dynamic and static linking against open and propritary codes. A full list of included licenses can be found in LICENSES.txt.

Acknowledgment

In addition from internal funding from SINTEF and NTNU this work has been supported by several projects, including:

  • AMPERE (2015-2020) funded by Forskningsrådet and Norwegian industry partners.
  • FICAL (2015-2020) funded by Forskningsrådet and Norwegian industry partners.
  • Rational alloy design (ALLDESIGN) (2018-2022) NTNU internally funded project.
  • SFI Manufacturing (2015-2023) funded by Forskningsrådet and Norwegian industry partners.
  • SFI PhysMet (2020-2028) funded by Forskningsrådet and Norwegian industry partners.
  • OntoTrans (2020-2024) that receives funding from the European Union’s Horizon 2020 Research and Innovation Programme, under Grant Agreement n. 862136.
  • OpenModel (2021-2025) that receives funding from the European Union’s Horizon 2020 Research and Innovation Programme, under Grant Agreement n. 953167.
  • DOME 4.0 (2021-2025) that receives funding from the European Union’s Horizon 2020 Research and Innovation Programme, under Grant Agreement n. 953163.
  • VIPCOAT (2021-2025) that receives funding from the European Union’s Horizon 2020 Research and Innovation Programme, under Grant Agreement n. 952903.

DLite is developed with the hope that it will be a delight to work with.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

DLite-Python-0.3.17.tar.gz (26.1 kB view hashes)

Uploaded Source

Built Distributions

DLite_Python-0.3.17-cp310-cp310-win_amd64.whl (352.5 kB view hashes)

Uploaded CPython 3.10 Windows x86-64

DLite_Python-0.3.17-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.9 MB view hashes)

Uploaded CPython 3.10 manylinux: glibc 2.17+ x86-64

DLite_Python-0.3.17-cp39-cp39-win_amd64.whl (352.3 kB view hashes)

Uploaded CPython 3.9 Windows x86-64

DLite_Python-0.3.17-cp39-cp39-musllinux_1_1_i686.whl (373.3 kB view hashes)

Uploaded CPython 3.9 musllinux: musl 1.1+ i686

DLite_Python-0.3.17-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.9 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ x86-64

DLite_Python-0.3.17-cp39-cp39-manylinux_2_17_i686.manylinux2014_i686.whl (15.6 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.17+ i686

DLite_Python-0.3.17-cp39-cp39-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.9 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.12+ x86-64

DLite_Python-0.3.17-cp39-cp39-manylinux_2_12_i686.manylinux2010_i686.whl (7.0 MB view hashes)

Uploaded CPython 3.9 manylinux: glibc 2.12+ i686

DLite_Python-0.3.17-cp38-cp38-win_amd64.whl (352.8 kB view hashes)

Uploaded CPython 3.8 Windows x86-64

DLite_Python-0.3.17-cp38-cp38-musllinux_1_1_i686.whl (373.1 kB view hashes)

Uploaded CPython 3.8 musllinux: musl 1.1+ i686

DLite_Python-0.3.17-cp38-cp38-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.9 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ x86-64

DLite_Python-0.3.17-cp38-cp38-manylinux_2_17_i686.manylinux2014_i686.whl (15.6 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.17+ i686

DLite_Python-0.3.17-cp38-cp38-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.9 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.12+ x86-64

DLite_Python-0.3.17-cp38-cp38-manylinux_2_12_i686.manylinux2010_i686.whl (7.0 MB view hashes)

Uploaded CPython 3.8 manylinux: glibc 2.12+ i686

DLite_Python-0.3.17-cp37-cp37m-win_amd64.whl (351.5 kB view hashes)

Uploaded CPython 3.7m Windows x86-64

DLite_Python-0.3.17-cp37-cp37m-musllinux_1_1_i686.whl (374.0 kB view hashes)

Uploaded CPython 3.7m musllinux: musl 1.1+ i686

DLite_Python-0.3.17-cp37-cp37m-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.9 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ x86-64

DLite_Python-0.3.17-cp37-cp37m-manylinux_2_17_i686.manylinux2014_i686.whl (15.6 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.17+ i686

DLite_Python-0.3.17-cp37-cp37m-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (6.9 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.12+ x86-64

DLite_Python-0.3.17-cp37-cp37m-manylinux_2_12_i686.manylinux2010_i686.whl (7.0 MB view hashes)

Uploaded CPython 3.7m manylinux: glibc 2.12+ i686

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page