Skip to main content

A Command Line Interface to orchestrate the integration of heterogenous data and the deployment of services consuming the integrated data. See https://d2s.semanticscience.org

Project description

Version Python versions

Test Python package Publish Python package

A Command Line Interface to help orchestrate the integration of heterogenous data sources under a common RDF Knowledge Graph using Python, RML mappings, Bash, and GitHub Actions workflows (YAML).

You can find more informations about the Data2Services project on the d2s documentation website 📖

Installation

Requirements:

Install from pypi

pip install d2s

Use pip, pip3 or pipx depending on your preferences.

Update

pip install --upgrade d2s 

Install from GitHub branch

You can also install it from the master branch, if you want the latest updates:

pip install git+https://github.com/MaastrichtU-IDS/d2s-cli.git@master

Uninstall

pip uninstall d2s

Use d2s

Display the default help command

d2s

Generate metadata

Analyze a SPARQL endpoint metadata to generate HCLS descriptive metadata for each graph:

d2s metadata analyze https://graphdb.dumontierlab.com/repositories/umids-kg -o metadata.ttl

Analyze a SPARQL endpoint metadata to generate metadata specific to Bio2RDF for each graph:

d2s metadata analyze https://bio2rdf.137.120.31.102.nip.io/sparql -o metadata.ttl -m bio2rdf

You can also generate detailed HCLS metadata for the dataset version and distribution by answering the questions after running this command:

d2s metadata create -o metadata.ttl

Bootstrap a datasets conversion project

d2s can be used to help you converting datasets to RDF.

You will need to initialize the current folder, it is highly recommended to do this at the root of a Git repository where the conversion will be stored:

d2s init

This command will create a datasets folder to store the datasets conversions and a .github/workflows folder for the workflows, if it does not exist already.

All d2s commands are designed to be run from the project folder

You can create a new dataset conversion:

d2s new dataset

You will be asked a few questions about the dataset via the terminal, then a folder will be generated with:

  • Your dataset metadata
  • Example YARRRML and RML mappings
  • Example python preprocessing script
  • Example bash script to download the data to convert
  • A GitHub Action workflow to run the different steps of the processing

You can now edit the file generated in the datasets folder to implement your data conversion.

Run the RML mapper

Requirements: Java installed

This feature is still experimental

d2s can be used to easily run the RML mapper:

d2s rml my-dataset

Enable autocompletion

Enable commandline autocompletion in the terminal

Recommended, it makes d2s much more user-friendly

  • ZSH: add the import autocomplete line to the ~/.zshrc file.
echo 'eval "$(_D2S_COMPLETE=source_zsh d2s)"' >> ~/.zshrc

Set your terminal to use ZSH by default:

chsh -s /bin/zsh

A oh-my-zsh theme can be easily chosen for a personalized experience. See the zsh-theme-biradate to easily install a simple theme and configure your terminal in a few minutes.

  • Bash: add the import autocomplete line to the ~/.bashrc file. Something like this probably:
echo 'eval "$(_D2S_COMPLETE=source d2s)"' >> ~/.bashrc

Build and publish

Install d2s for development

Install d2s as executable to run it from the terminal

Clone the repository:

git clone https://github.com/MaastrichtU-IDS/d2s-cli.git
cd d2s-cli

Install d2s:

pip install -e .

d2s will be updated directly on change in the code.

Optional: isolate with a Virtual Environment

If you face conflicts with already installed packages, then you might want to use a Virtual Environment to isolate the installation in the current folder before installing d2s:

# Create the virtual environment folder in your workspace
python3 -m venv .venv
# Activate it using a script in the created folder
source .venv/bin/activate

Publish using Docker

To publish a new version on pypi:

  • upgrade the version in setup.py (e.g. from 0.2.1 to 0.2.2)
  • use the following script to build and publish automatically using Docker:
./publish_pip.sh

A test will be run using Docker before publishing to make sure d2s init works.

Build locally

Building and publishing can be done locally:

# Build packages in dist/ folder
python3 setup.py sdist bdist_wheel
# Publish packages previously built in the dist/ folder
twine upload --repository-url https://upload.pypi.org/legacy/ dist/*

Additional instructions to install twine locally (not needed)

pip install twine

If you experience issues with Bash or ZSH because d2s is not defined when installing for dev. Then add pip install --editable develop/d2s-cli to .zshrc

You might need to install Python3.7 for dev (dev with python3.7 should work though)

sudo apt-get install python3.7 python3.7-venv python3.7-dev
# Set python3 to use 3.7
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.7 1
sudo update-alternatives --config python3
vim /usr/bin/gnome-terminal

#!/usr/bin/python3.7

If you face issue uploading the package on pypi:

twine check dist/d2s-*-py3-none-any.whl

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

d2s-0.3.3.tar.gz (32.0 kB view hashes)

Uploaded Source

Built Distribution

d2s-0.3.3-py3-none-any.whl (46.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page