A Command Line Interface to orchestrate the integration of heterogenous data and the deployment of services consuming the integrated data. See https://d2s.semanticscience.org
Project description
A Command Line Interface to help orchestrate the integration of heterogenous data sources under a common RDF Knowledge Graph using Python, RML mappings, Bash, and GitHub Actions workflows (YAML).
You can find more informations about the Data2Services project on the d2s documentation website 📖
Installation
Requirements:
- Python 3.7+
- Git
- Optional: Java 11+ to use
d2s sparql upload
- Optional:
oc
command line tool for deploying to the DSRI OpenShift cluster (for Maastricht University academics and students)
Install d2s
Install d2s
as executable to run it from the terminal
Clone the repository:
git clone https://github.com/MaastrichtU-IDS/d2s-cli.git
cd d2s-cli
Install d2s
:
pip install -e .
d2s
will be updated directly on change in the code.
Optional: isolate with a Virtual Environment
If you face conflicts with already installed packages, then you might want to use a Virtual Environment to isolate the installation in the current folder before installing d2s
:
# Create the virtual environment folder in your workspace
python3 -m venv .venv
# Activate it using a script in the created folder
source .venv/bin/activate
Uninstall
pip uninstall d2s
Use d2s
Display the default help command
d2s
Generate metadata
Analyze a SPARQL endpoint metadata to generate HCLS descriptive metadata for each graph:
d2s metadata analyze https://graphdb.dumontierlab.com/repositories/umids-kg -o metadata.ttl
Analyze a SPARQL endpoint metadata to generate metadata specific to Bio2RDF for each graph:
d2s metadata analyze https://bio2rdf.137.120.31.102.nip.io/sparql -o metadata.ttl -m bio2rdf
You can also generate detailed HCLS metadata for the dataset version and distribution by answering the questions after running this command:
d2s metadata create -o metadata.ttl
Bootstrap a datasets conversion project
d2s
can be used to help you converting datasets to RDF.
You will need to initialize the current folder, it is highly recommended to do this at the root of a Git repository where the conversion will be stored:
d2s init
This command will create a datasets
folder to store the datasets conversions and a .github/workflows
folder for the workflows, if it does not exist already.
All
d2s
commands are designed to be run from the project folder
You can create a new dataset conversion:
d2s new dataset
You will be asked a few questions about the dataset via the terminal, then a folder will be generated with:
- Your dataset metadata
- Example YARRRML and RML mappings
- Example python preprocessing script
- Example bash script to download the data to convert
- A GitHub Action workflow to run the different steps of the processing
You can now edit the file generated in the datasets
folder to implement your data conversion.
Run the RML mapper
Requirements: Java installed
This feature is still experimental
d2s
can be used to easily run the RML mapper:
d2s rml my-dataset
Enable autocompletion
Enable commandline autocompletion in the terminal
Recommended, it makes
d2s
much more user-friendly
- ZSH: add the import autocomplete line to the
~/.zshrc
file.
echo 'eval "$(_D2S_COMPLETE=source_zsh d2s)"' >> ~/.zshrc
Set your terminal to use ZSH by default:
chsh -s /bin/zsh
A oh-my-zsh theme can be easily chosen for a personalized experience. See the zsh-theme-biradate to easily install a simple theme and configure your terminal in a few minutes.
- Bash: add the import autocomplete line to the
~/.bashrc
file. Something like this probably:
echo 'eval "$(_D2S_COMPLETE=source d2s)"' >> ~/.bashrc
Build and publish
Publish using Docker
To publish a new version on pypi:
- upgrade the version in setup.py (e.g. from
0.2.1
to0.2.2
) - use the following script to build and publish automatically using Docker:
./publish_pip.sh
A test will be run using Docker before publishing to make sure
d2s init
works.
Build locally
Building and publishing can be done locally:
# Build packages in dist/ folder
python3 setup.py sdist bdist_wheel
# Publish packages previously built in the dist/ folder
twine upload --repository-url https://upload.pypi.org/legacy/ dist/*
Additional instructions to install twine locally (not needed)
pip install twine
If you experience issues with Bash or ZSH because
d2s
is not defined when installing for dev. Then addpip install --editable develop/d2s-cli
to.zshrc
You might need to install Python3.7 for dev (dev with python3.7 should work though)
sudo apt-get install python3.7 python3.7-venv python3.7-dev
# Set python3 to use 3.7
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.7 1
sudo update-alternatives --config python3
vim /usr/bin/gnome-terminal #!/usr/bin/python3.7
If you face issue uploading the package on pypi:
twine check dist/d2s-*-py3-none-any.whl
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.