CLI tools for the creation of Kudaf Data Source components: 1) Variables Metadata, and 2) REST API Datasource back-end
Project description
KUDAF Datasource CLI Tools
This is a set of Command Line Interface (CLI) tools to facilitate the technical tasks requirered from Data Providers that want to make their data available on the KUDAF data-sharing platform.
It was developed by Sikt - Kunnskapssektorens tjenesteleverandør under the KUDAF initiative to enable a Data Producer to make small-file data available on the KUDAF data-sharing platform.
The CLI can create the following Kudaf Data Source components:
- Variables Metadata, and /or
- REST API Datasource back-end (including variables metadata and possibly even the data files themselves)
About KUDAF
KUDAF - Kunnskapssektorens datafelleskap skal sørge for tryggere, enklere og bedre deling av data. Les mer om KUDAF.
High-level workflow for Data Source administrators (Beta version)
Fra dataprodusent til datatilbyder
Feide Kundeportal - Datadeling (Nosrk)
Data Sharing and the Feide Customer Portal
Local installation instructions (Linux/Mac)
Make sure Python3 is installed on your computer (versions from 3.8 up to 3.11 should work fine)
$ python3 --version
Navigate to the folder chosen to contain this project
$ cd path/to/desired/folder
Create a Python virtual environment and activate it
$ python3 -m venv .venv
This created the virtualenv under the hidden folder .venv
Activate it with:
$ source .venv/bin/activate
Install Kudaf Datasource Tools and other required Python packages
$ pip install kudaf_datasource_tools
Creating a YAML configuration file
Click here for a basic YAML syntax tutorial
Example YAML configuration file
The following file is included in the package and can be found in the datasource_tools/config
folder:
config_example.yaml
---
projectInfo:
organization: "my-short-organizations-name"
datasourceName: "my-FeideKundeportal-Datasource-name"
datasourceId: "my-FeideKundeportal-Datasource-UUID"
dataFiles:
- dataFile: &mydatafile
fileNameExt: mydatafile.csv
csvParseDelimiter: ";" # Valgfritt (som standard ","). Angir tegnet som brukes i CSV-filen for å skille verdier innenfor hver rad
fileDirectory: /path/to/my/datafiles/directory # Bare nødvendig hvis forskjellig fra gjeldende katalog
unitTypes:
- MIN_ENHETSTYPE1: &min_enhetstype1 # Bare nødvendig hvis forskjellig fra de globale enhetstypene: PERSON/VIRKSOMHET/KOMMUNE/FYLKE
shortName: MIN_ENHETSTYPE1 # This shows as the Key indicator in the Front-end
name: Kort identifikasjonsetikett # This will label the Identifier blue box
description: Detaljert beskrivelse av denne enhetstypen
dataType: LONG # En av STRING/DATE/LONG/DOUBLE
- MIN_ENHETSTYPE2: &min_enhetstype2 # Bare nødvendig hvis forskjellig fra de globale enhetstypene: PERSON/VIRKSOMHET/KOMMUNE/FYLKE
shortName: MIN_ENHETSTYPE2 # This shows as the Key indicator in the Front-end
name: Kort identifikasjonsetikett # This will label the Identifier blue box
description: Detaljert beskrivelse av denne enhetstypen
dataType: LONG # En av STRING/DATE/LONG/DOUBLE
variables:
- name: VARIABELENS_NAVN
temporalityType: FIXED # En av FIXED/EVENT/STATUS/ACCUMULATED
dataRetrievalUrl: https://my-datasource-api.no/api/v1/variables/VARIABELENS_NAVN
sensitivityLevel: NONPUBLIC # En av PUBLIC/NONPUBLIC
populationDescription:
- Beskrivelse av populasjonen som denne variabelen måler
spatialCoverageDescription:
- Norge
- Annen geografisk beskrivelse som gjelder disse dataene
subjectFields:
- Temaer/konsepter/begreper som disse dataene handler om
identifierVariables:
- unitType: *min_enhetstype1 # Kan også være en av de globale enhetstypene: PERSON/VIRKSOMHET/KOMMUNE/FYLKE
measureVariables:
- label: Kort etikett på hva denne variabelen måler/viser
description: Detaljert beskrivelse av hva denne variabelen måler/viser
dataType: STRING # En av STRING/LONG/DATE/DOUBLE
dataMapping:
dataFile: *mydatafile
identifierColumns:
- Min_Identificatorkolonne # CSV-filkolonne for identifikatoren
measureColumns:
- Min_Målkolonnen # CSV-filkolonne for det som måles
- name: VARIABELENS_NAVN_ACCUM
temporalityType: ACCUMULATED # En av FIXED/EVENT/STATUS/ACCUMULATED
dataRetrievalUrl: https://my-datasource-api.no/api/v1/variables/VARIABELENS_NAVN
sensitivityLevel: NONPUBLIC # En av PUBLIC/NONPUBLIC
populationDescription:
- Beskrivelse av populasjonen som denne variabelen måler
spatialCoverageDescription:
- Norge
- Annen geografisk beskrivelse som gjelder disse dataene
subjectFields:
- Temaer/konsepter/begreper som disse dataene handler om
identifierVariables:
- unitType: *min_enhetstype2 # Kan også være en av de globale enhetstypene: PERSON/VIRKSOMHET/KOMMUNE/FYLKE
measureVariables:
- label: Kort etikett på hva denne variabelen måler/viser
description: Detaljert beskrivelse av hva denne variabelen måler/viser
dataType: LONG # Hvis akkumulerte data må summeres, bør dette enten være en av LONG/DOUBLE
dataMapping:
dataFile: *mydatafile
identifierColumns:
- Min_Identificatorkolonne # CSV-filkolonne for identifikatoren
measureColumns:
- Min_Målkolonnen # CSV-filkolonne for det som måles
measureColumnsAccumulated: False # Valgfritt (True/False, som standard False). Angir om dataene i filen allerede er akkumulert
attributeColumns: # Kun nødvendig for EVENT/STATUS/ACCUMULATED temporalityType(s)
- Start_Time # CSV-filkolonnen for starttidspunktet
- End_Time # CSV-filkolonne for sluttidspunktet
- name: NØKKELVAR_ID-NØKKEL_MÅLE-NOKKEL
temporalityType: FIXED # En av FIXED/EVENT/STATUS/ACCUMULATED
dataRetrievalUrl: https://my-datasource-api.no/api/v1/variables/NØKKELVAR_ID-NØKKEL_MÅLE-NOKKEL
sensitivityLevel: PUBLIC # En av PUBLIC/NONPUBLIC
populationDescription:
- Beskrivelse av populasjonen som denne variabelen måler
spatialCoverageDescription:
- Norge
- Annen geografisk beskrivelse som gjelder disse dataene
subjectFields:
- Temaer/konsepter/begreper som disse dataene handler om
identifierVariables:
- unitType: *min_enhetstype1 # ID-NØKKEL - Kan også være en av de globale enhetstypene: PERSON/VIRKSOMHET/KOMMUNE/FYLKE
measureVariables:
- label: Kort etikett på hva denne variabelen måler/viser
description: Detaljert beskrivelse av hva denne variabelen måler/viser
unitType: *min_enhetstype2 # MÅLE-NØKKEL - Kan også være en av de globale enhetstypene: PERSON/VIRKSOMHET/KOMMUNE/FYLKE
dataType: LONG # En av STRING/LONG/DATE/DOUBLE
dataMapping:
dataFile: *mydatafile
identifierColumns:
- Min_Identificatorkolonne # CSV-filkolonne for identifikatoren
measureColumns:
- Min_Målkolonnen # CSV-filkolonne for det som måles
...
Kudaf Datasource Tools CLI operation
Navigate to the project directory and activate the virtual environment (only if not already activated):
$ source .venv/bin/activate
The kudaf-generate
command should be now activated. This is the main entry point to the CLI's functionalities.
Displaying the help menus
$ kudaf-generate --help
Usage: kudaf-generate [OPTIONS] COMMAND [ARGS]...
Kudaf Datasource Tools
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --install-completion Install completion for the current shell. │
│ --show-completion Show completion for the current shell, to copy it or customize the installation. │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ api Generate a Kudaf Datasource REST API back-end │
│ metadata Generate Variables/UnitTypes Metadata │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
As we can see, there are two sub-commands available: api
and metadata
.
We can obtain help on them as well:
$ kudaf-generate api --help
Usage: kudaf-generate api [OPTIONS]
Generate a Kudaf Datasource REST API back-end
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --config-yaml-path PATH Absolute path to the YAML configuration file │
│ [default: /home/me/current/directory/config.yaml] │
│ --input-data-files-dir PATH Absolute path to the data files directory │
│ [default: /home/me/current/directory] │
│ --output-api-dir PATH Absolute path to directory where the Datasource API folder is to be written │
│ to │
│ [default: /current/directory] │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
$ kudaf-generate metadata --help
Usage: kudaf-generate metadata [OPTIONS]
Generate Variables/UnitTypes Metadata
JSON metadata files ('variables.json' and maybe 'unit_types.json') will be written to the
(optionally) given output directory.
If any of the optional directories is not specified, the current directory is used as default.
╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --config-yaml-path PATH Absolute path to the YAML configuration file │
│ [default: /home/me/current/directory/config.yaml] │
│ --output-metadata-dir PATH Absolute path to directory where the Metadata files are to be written to │
│ [default: /home/me/current/directory] │
│ --help Show this message and exit. │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
Generating metadata only from a YAML configuration file
$ kudaf-generate metadata --config-yaml-path /home/me/path/to/config.yaml --output-metadata-dir /home/me/path/to/metadata/folder
Generating an API
$ kudaf-generate api --config-yaml-path /home/me/path/to/config.yaml --output-api-dir /home/me/path/to/api/folder
Local API launch
Navigate to the folder containing the generated API:
$ cd /home/me/path/to/api/folder
Create a Python virtual environment and activate it
$ python3 -m venv .venv
This created the virtualenv under the hidden folder .venv
Activate it with:
$ source .venv/bin/activate
Install needed Python packages
$ pip install -r requirements.txt
Launch the Kudaf Datasource API
$ uvicorn app.main:app
Browser: Navigate to the the API's interactive documentation at:
http://localhost:8000/docs
Docker API launch (Linux)
Install Docker for your desktop
Follow instructions for your desktop model at https://docs.docker.com/desktop/
Navigate to the folder containing the generated API:
$ cd /home/me/path/to/api/folder
Create a Docker image
$ sudo docker build -t my-kudaf-datasource-image .
(Note: The last .
in the above command is important! Make sure it's entered like that )
Create and launch a Docker container for the generated image
$ sudo docker run -d --name my-kudaf-datasource-container -p 9000:8000 my-kudaf-datasource-image
Browser: Navigate to the the API's interactive documentation at:
http://localhost:9000/docs
Developers
Download the package to your computer
Option A: Installation from repository:
Open up a Terminal window and clone the repo locally:
$ git clone https://gitlab.sikt.no/kudaf/kudaf-datasource-tools.git
Option B: Installation from source:
-
Open up your browser and navigate to the project's GitLab page:
https://gitlab.sikt.no/kudaf/kudaf-datasource-tools
-
Once there, download a ZIP file with the source code
-
Move the zipped file to whichever directory you want to use for this installation
-
Open a Terminal window and navigate to the directory where the zipped file is
-
Unzip the downloaded file, it will create a folder called
kudaf-datasource-tools-main
-
Switch to the newly created folder
$ cd path/to/kudaf-datasource-tools-main
Make sure Python3 is installed on your computer (versions from 3.8 up to 3.11 should work fine)
$ python3 --version
Install Poetry (Python package and dependency manager) on your computer
Full Poetry documentation can be found here: https://python-poetry.org/docs/
The official installer should work fine on the command line for Linux, macOS and Windows:
$ curl -sSL https://install.python-poetry.org | python3 -
If the installation was successful, configure this option:
$ poetry config virtualenvs.in-project true
Mac users: Troubleshooting
In case of errors installing Poetry on your Mac, you may have to try installing it with pipx
. But to install that, we need to have Homebrew
installed first.
$ /bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
(Homebrew documentation: https://brew.sh/)
Once Homebrew
is installed, proceed to install pipx
:
$ brew install pipx
$ pipx ensurepath
Finally, install Poetry
:
$ pipx install poetry
Create a Python virtual environment and activate it
$ python3 -m venv .venv
This created the virtualenv under the hidden folder .venv
Activate it with:
$ source .venv/bin/activate
Install Kudaf Datasource Tools and other required Python packages
$ poetry install
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for kudaf_datasource_tools-0.4.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f27f5834288e1fbeba9d1056a03200eddeb63c21395763ce50bc23cd0b349b8 |
|
MD5 | 42025c4032abf56da5fc02121967fd05 |
|
BLAKE2b-256 | 49e1e0bd0fcd111f60f05b75bc6ba2d16b229998ea6782728c0c96a6626201fa |
Hashes for kudaf_datasource_tools-0.4.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b0aa1e9a401fbf1ad9c89e18f627749f03e533b094c734e85f165d20eb12154e |
|
MD5 | 81adb383201f3f6aa8713bc9b0f1178d |
|
BLAKE2b-256 | ea014f4dd472ccf06a75b2141e46dee83194daa64f998c9a5a93bea1337a9d6f |