A Command Line Interface to orchestrate the integration of heterogenous data and the deployment of services consuming the integrated data. See https://d2s.semanticscience.org
Project description
A Command Line Interface to help orchestrate the integration of heterogenous data sources under a common RDF Knowledge Graph using Python, RML mappings, Bash, and GitHub Actions workflows (YAML).
Installation
Complete documentation about the d2s-cli
on the d2s documentation website 📖
Requirements:
- Python 3.6+ (built using python:3.6)
- Git
- Optional: Java 11+ to use
d2s sparql upload
- Optional:
oc
command line tool for deploying to the DSRI OpenShift cluster (for Maastricht University academics and students)
Install from pypi
pip install d2s
Update
pip install --upgrade d2s
Install from GitHub branch
You can also install it from the master
branch, if you want the latest updates:
pip install git+https://github.com/MaastrichtU-IDS/d2s-cli.git@master
See those instructions to install d2s on Windows using the Chocolatey package manager and pipx.
Install d2s for development
Install d2s
as executable in local for development.
Clone the repository:
git clone https://github.com/MaastrichtU-IDS/d2s-cli.git
d2s
will be updated directly on change in the code.
pip install -e .
Uninstall
pip uninstall d2s
Use d2s
Display the default help command
d2s
Generate metadata
Analyze a SPARQL endpoint metadata to generate HCLS descriptive metadata for each graph:
d2s metadata analyze https://graphdb.dumontierlab.com/repositories/test -o metadata.ttl
Analyze a SPARQL endpoint metadata to generate metadata specific to Bio2RDF for each graph:
d2s metadata analyze https://bio2rdf.137.120.31.102.nip.io/sparql -o metadata.ttl -m bio2rdf
You can also generate detailed HCLS metadata for the dataset version and distribution by answering the questions after running this command:
d2s metadata create -o metadata.ttl
Bootstrap a datasets conversion project
d2s
can be used to help you converting datasets to RDF.
You will need to initialize the current folder, it is highly recommended to do this at the root of a Git repository where the conversion will be stored:
d2s init
This command will create a datasets
folder to store the datasets conversions and a .github/workflows
folder for the workflows, if it does not exist already.
All
d2s
commands are designed to be run from the project folder
You can create a new dataset conversion:
d2s new dataset
You will be asked a few questions about the dataset via the terminal, then a folder will be generated with:
- Your dataset metadata
- Example YARRRML and RML mappings
- Example python preprocessing script
- Example bash script to download the data to convert
- A GitHub Action workflow to run the different steps of the processing
You can now edit the file generated in the datasets
folder to implement your data conversion.
Run the RML mapper
Requirements: Java installed
This feature is still experimental
d2s
can be used to easily run the RML mapper:
d2s rml my-dataset
Enable autocompletion
Enable commandline autocompletion in the terminal
Recommended, it makes
d2s
much more user-friendly
- ZSH: add the import autocomplete line to the
~/.zshrc
file.
echo 'eval "$(_D2S_COMPLETE=source_zsh d2s)"' >> ~/.zshrc
Set your terminal to use ZSH by default:
chsh -s /bin/zsh
A oh-my-zsh theme can be easily chosen for a personalized experience. See the zsh-theme-biradate to easily install a simple theme and configure your terminal in a few minutes.
- Bash: add the import autocomplete line to the
~/.bashrc
file.
echo 'eval "$(_D2S_COMPLETE=source d2s)"' >> ~/.bashrc
To be tested.
Build and publish
Publish using Docker
To publish a new version on pypi:
- upgrade the version in setup.py (e.g. from
0.2.1
to0.2.2
) - use the following script to build and publish automatically using Docker:
./publish_pip.sh
A test will be run using Docker before publishing to make sure
d2s init
works.
Build locally
Building and publishing can be done locally:
# Build packages in dist/ folder
python3 setup.py sdist bdist_wheel
# Publish packages previously built in the dist/ folder
twine upload --repository-url https://upload.pypi.org/legacy/ dist/*
Additional instructions to install twine locally (not needed)
pip install twine
If you experience issues with Bash or ZSH because
d2s
is not defined when installing for dev. Then addpip install --editable develop/d2s-cli
to.zshrc
You might need to install Python3.6 for dev (dev with python3.6 should work though)
sudo apt-get install python3.6 python3.6-venv python3.6-dev
# Set python3 to use 3.6
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.6 1
sudo update-alternatives --config python3
vim /usr/bin/gnome-terminal #!/usr/bin/python3.6
If you face issue uploading the package on pypi:
twine check dist/d2s-*-py3-none-any.whl
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.