Skip to main content

Utilities for working with the linked data service LINDAS of the Swiss Federal Administration. Includes modules for working with cubes.

Project description

pylindas

About

pylindas is a package to build and publish linked data such as cubes as defined by cube.link, describing a schema to describe structured data from tables in RDF. It allows for an alternative to the Cube-Creator. Currently this project is heavily linked to the LINDAS the Swiss Federal Linked Data Service.

For further information, please refer to our Wiki

Installation

There are two ways to install this package, locally or through the Python Package Index (PyPI).

Locally

Clone this repository and cd into the directory. You can now install this package locally on your machine - we advise to use a virtual environment to avoid conflicts with other projects. Additionally, install all dependencies as described in requirements.txt

pip install -e .
pip install -r requirements.txt

Published Version

You can install this package through pip without cloning the repository.

pip install pylindas

Contributing and Suggestions

If you wish to contribute to this project, feel free to clone this repository and open a pull request to be reviewed and merged.

Alternatively feel free to open an issue with a suggestion on what we could implement. We laid out a rough road map for the features ahead on our Timetable

Functionality and structure

This package consists of multiple sub modules

pycube

To avoid the feeling of a black box, our philosophy is to make the construction of cubes modular. The process will take place in multiple steps, outlined below.

  1. Initialization
from pylindas.pycube import Cube

cube = pycube.Cube(dataframe: pd.Dataframe, cube_yaml: dict, shape_yaml: dict)

This step sets some need background information about the cube up.

  1. Mapping
cube.prepare_data()

Adds observation URIs and applies the mappings as described in the shape yaml.

  1. Write cube:Cube
cube.write_cube()

Writes the cube:Cube.

  1. Write cube:Observation
cube.write_observations()

Writes the cube:Observations and the cube:ObservationSet. The URI for the observations are written as <cube_URI/observations/[list_of_key_dimensions]>. This should avoid the possibilities of conflicts in their uniqueness.

  1. Write cube:ObersvationConstraint
cube.write_shape()

Writes the cube:ObservationConstraint.

The full work-flow

# Write the cube
cube = pycube.Cube(dataframe: pd.DataFrame, cube_yaml: dict, shape_yaml: dict)
cube.prepare_data()
cube.write_cube()
cube.write_observations()
cube.write_shape()

# Upload the cube
cube.upload(endpoint: str, named_graph: str)

For an upload, use cube.upload(endpoint: str, named_graph: str) with the proper endpoint as well as named_graph.

A lindas.ini file is read for this step, containing these information as well as a password. It contains the structure:

[TEST]
endpoint=https://stardog-test.cluster.ldbar.ch
username=a-lindas-user-name
password=something-you-don't-need-to-see;)

With additional information for the other environments.

Command line

If you wish, a command line utility is present, that expects an opinionated way to store the data and the description in a directory. It then helps you to perform common operations.

Directory Layout

The directory should be structured as follows:

  • data.csv: This file contains the observations.
  • description.json or description.yml: This file contains the cube and dimension descriptions.

Command Line Usage

For example, to serialize the data, use:

python cli.py serialize <input_directory> <output_ttl_file>

For additional help and options, you can use:

python cli.py --help

Fetching from data sources

There is the possibility to download datasets from other data sources. Right now, the functionality is basic, but it could be possible in the future to extend it.

  • It supports only datasets coming from data.europa.eu
  • It supports only datasets with a Frictionless datapackage

See Frictionless for more information on Frictionless.

python fetch.py 'https://data.europa.eu/data/datasets/fc49eebf-3750-4c9c-a29e-6696eb644362?locale=en' example/corona/

Examples

Multiple cube example are ready in the example directory.

$ python cli.py example list
corona: Corona Numbers Timeline
kita: Number of kids in day care facilities
wind: Wind turbines  operated WKA per year in Schleswig-Holstein

To load an example in a Fuseki database, you can use the load subcommand of the example command.

$ python cli.py example load kita

There is a start-fuseki command that can be used to start a Fuseki server containing data from the examples.

$ python cli.py example start-fuseki

About shared dimensions queries

When a data scientist wants to link a dimension to an existing Shared Dimension, he has to:

  • Find a suitable Shared Dimension
  • Use the URLs of the terms of that Shared Dimension to configure dimension in the yml file and its "mapping" field

This is a first implementation of:

  • Basic queries to request shared dimensions information from LINDAS (including terms and their URLs)
  • Display the results, line by line

See the folder pylindas/shared_dimension_queries and its README for detailed explanation

About generating Shared Dimension, see here under.

About concept tables and multi-lingual concepts

This is first implementation to handle:

  • concept tables
  • multilingual concepts

A concept table is the possibility to handle the values of a dimension as a url to a new resource (a concept).
This is similar to an object that is the URL of a Shared Dimension's term, but here the concepts are created for the cube and uploaded with the cube.
Remark: if the resource/concept already exist, than the case is similar to the handling of Shared Dimensions mapping, and this is already handled by pyCube with the "mapping" mechanism.

See the folder example/Cubes/concept_table_airport and its README for detailed explanation

About generation of shared dimensions

This is a first implementation to generate a shared dimension, following an approach similar to pyCube, but to transform a .csv file to the corresponding RDF.

See the folder pylindas/pyshareddimension and its README for detailed explanation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pylindas-0.4.23.tar.gz (36.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pylindas-0.4.23-py3-none-any.whl (36.0 kB view details)

Uploaded Python 3

File details

Details for the file pylindas-0.4.23.tar.gz.

File metadata

  • Download URL: pylindas-0.4.23.tar.gz
  • Upload date:
  • Size: 36.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for pylindas-0.4.23.tar.gz
Algorithm Hash digest
SHA256 0c424930f4552c8c9619843a99cfade146cf22861e13164a488363a6d7f6e88e
MD5 340196d893c82f52b66f4d8fd3eefed8
BLAKE2b-256 267154f3d1e045d877098e7ce23ada9dbe58d6784af9bb923558b4e3f4a9834d

See more details on using hashes here.

File details

Details for the file pylindas-0.4.23-py3-none-any.whl.

File metadata

  • Download URL: pylindas-0.4.23-py3-none-any.whl
  • Upload date:
  • Size: 36.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for pylindas-0.4.23-py3-none-any.whl
Algorithm Hash digest
SHA256 52d3bad28a5f96b9a1b2a87951480af24993a860f7cfcb4d43e2799fb9731f76
MD5 cef46e0d8a35a585d626f1b38de4169f
BLAKE2b-256 8094b3404af3ec373ab4e84c8224e9b78d1dfbf0626be79eb298dcf66d84e8b9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page