Skip to main content

GCS driver for the Khiops tool

Project description

Khiops driver for Google Cloud Storage aka GCS

This repository hosts the source code for the Khiops filesystem driver enabling transparent manipulation for data stored in GCS buckets.

Quickstart

If you just want to start using Khiops with your data located on GCS, simply install the driver package next to Khiops. If you installed Khiops the standard way, the driver package can be installed via conda like so:

conda install -c conda-forge khiops-driver-gcs

Or, if you have used your system package manager, you will have to install the driver by the same method. For debian/ubuntu, you will do this:

CODENAME=$(lsb_release -cs) && \
TEMP_DEB="$(mktemp)" && \
wget -O "$TEMP_DEB" "https://github.com/KhiopsML/khiopsdriver-gcs/releases/download/0.0.14/khiops-driver-gcs_0.0.14-1-${CODENAME}.amd64.deb" && \
sudo dpkg -i "$TEMP_DEB && \
rm -f $TEMP_DEB

or if using Rocky linux, do this:

sudo yum update -y && sudo yum install wget -y && \
CENTOS_VERSION=$(rpm -E %{rhel}) && \
TEMP_RPM="$(mktemp).rpm" && \
wget -O "$TEMP_RPM" "https://github.com/KhiopsML/khiopsdriver-gcs/releases/download/0.0.14/khiops-driver-gcs_0.0.14-1.el${CENTOS_VERSION}.x86_64.rpm" && \
sudo yum install "$TEMP_RPM" -y && \
rm -f $TEMP_RPM

You can check that the driver is installed propery by running

khiops -s

You should see an output similar to this:

Khiops 10.3.0

Drivers:
    GCS driver (0.0.14) for URI scheme 'gs'
Environment variables:
    None
Internal environment variables:
    None

which indicates that the driver was loaded properly and will be used for datafiles following the gs:// pattern.

Authentication

In order to access the data stored on a GCS bucket, in most cases a valid authentication in required. The Khiops GCS driver by default uses the standard Application Default Credentials authentication. This means that once you have valid credentials setup in your environment, Khiops will be using these exactly like your python script or google provided tools like gcloud or gsutil.

In order to setup your local environment with these credentials (assuming you have installed the gcloud CLI), you will have to do the following:

gcloud init
gcloud auth application-default login

Voilà! You now have access to your data in GCS buckets! The exact same authentication mechanism will allow a containerized Khiops script to run on the Google infrastructure.

Logging

You can log information, warnings, errors and debug traces to a file using the following environment variables (they must both be defined to log anything):

  • GCS_DRIVER_LOGLEVEL: available values are off, critical, error, warning, info, debug, trace (they are actually the values of the spdlog logging library)
  • GCS_DRIVER_LOGFILE: path to the log file (which does not need to already exist).

Tip: you can define GCS_DRIVER_LOGFILE to be /dev/stderr or /dev/stdout if you want to log to standard error or standard output, respectively.

Example usage

Khiops usage (low level)

khiops -b -i gs://mydatabucket/khiops_samples/scenario.kh

Python sample

# Imports
import os
from khiops import core as kh

# Set the file paths
dictionary_file_path = "gs://mydatabucket/khiops_samples/Adult/Adult.kdic"
data_table_path = "gs://mydatabucket/khiops_samples/Adult/Adult.kdic"
results_dir = "khiops_output"

# Train the predictor
kh.train_predictor(
    dictionary_file_path,
    "Adult",
    data_table_path,
    "class",
    results_dir,
    max_trees=0,
)

Development: Coverage reports

Coverage targets are available on Linux in non-Release builds when BUILD_TESTS=ON. Tests are executed through ctest so coverage matches the test registry.

Configure and build in Debug mode:

cmake --preset ninja-dbg -DBUILD_TESTS=ON
cmake --build --preset ninja-dbg

Run tests directly with ctest (optional baseline check):

ctest --preset ninja-dbg --output-on-failure

Generate unit-only coverage (tests labeled unit):

cmake --build --preset ninja-dbg --target khiops-gcs_coverage_unit
cmake --build --preset ninja-dbg --target khiops-gcs_cobertura_unit

Generate full coverage (all tests known by ctest):

cmake --build --preset ninja-dbg --target khiops-gcs_coverage_full
cmake --build --preset ninja-dbg --target khiops-gcs_cobertura_full

Artifacts are generated under build/debug/:

  • HTML reports: build/debug/coverage-unit/index.html and build/debug/coverage-full/index.html
  • Cobertura XML: build/debug/coverage-unit.xml and build/debug/coverage-full.xml

Legacy targets are still available and map to full coverage:

cmake --build --preset ninja-dbg --target khiops-gcs_coverage
cmake --build --preset ninja-dbg --target khiops-gcs_cobertura

Development: GitHub CI coverage UX

Coverage reporting in CI uses only native GitHub capabilities (no external service).

On Linux workflow runs:

  • The workflow writes a Coverage Report section to the run summary.
  • Pull requests receive a single updatable comment with current coverage status.
  • Coverage artifacts are uploaded only when the expected reports are generated.

Artifact names in GitHub Actions:

  • coverage-unit-ubuntu-latest
  • coverage-full-ubuntu-latest

If coverage generation fails or skips, upload is skipped consistently and the summary/comment explicitly indicates missing reports.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

khiops_driver_gcs-0.0.23.tar.gz (4.6 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

khiops_driver_gcs-0.0.23-py3-none-win_amd64.whl (1.4 MB view details)

Uploaded Python 3Windows x86-64

khiops_driver_gcs-0.0.23-py3-none-manylinux_2_28_x86_64.whl (5.0 MB view details)

Uploaded Python 3manylinux: glibc 2.28+ x86-64

khiops_driver_gcs-0.0.23-py3-none-manylinux_2_28_aarch64.whl (5.0 MB view details)

Uploaded Python 3manylinux: glibc 2.28+ ARM64

khiops_driver_gcs-0.0.23-py3-none-macosx_11_0_arm64.whl (3.9 MB view details)

Uploaded Python 3macOS 11.0+ ARM64

File details

Details for the file khiops_driver_gcs-0.0.23.tar.gz.

File metadata

  • Download URL: khiops_driver_gcs-0.0.23.tar.gz
  • Upload date:
  • Size: 4.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for khiops_driver_gcs-0.0.23.tar.gz
Algorithm Hash digest
SHA256 ab969d44450664ef51728f29c04db798f079589ab486b788e12b1a8b06e83b2a
MD5 d830acf59b7fdfb867b02867f5516e2f
BLAKE2b-256 fbd2cb70be9348bde59b1f5bbca852aad39618567489575c6be5fb0d2ac227eb

See more details on using hashes here.

Provenance

The following attestation bundles were made for khiops_driver_gcs-0.0.23.tar.gz:

Publisher: pack-pip.yml on KhiopsML/khiopsdriver-gcs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file khiops_driver_gcs-0.0.23-py3-none-win_amd64.whl.

File metadata

File hashes

Hashes for khiops_driver_gcs-0.0.23-py3-none-win_amd64.whl
Algorithm Hash digest
SHA256 4fbbab54c2374cf144d633db438878a4c4a9c1e4521ccae263a72cc2323d6d5c
MD5 a30476bf5ff5ce3481c78672beda87b8
BLAKE2b-256 8fbd4fcdb33774a391a94f84cab29ad9bd57d1cc5fc3f21636931b6879d3025a

See more details on using hashes here.

Provenance

The following attestation bundles were made for khiops_driver_gcs-0.0.23-py3-none-win_amd64.whl:

Publisher: pack-pip.yml on KhiopsML/khiopsdriver-gcs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file khiops_driver_gcs-0.0.23-py3-none-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for khiops_driver_gcs-0.0.23-py3-none-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 90dddd21f8e9edf6344a9dd8f06e8fb7624c81a763e65c59a919d16469bf677d
MD5 f02ad67004979af1816cdacded3c57dd
BLAKE2b-256 e1ee5f444660e648ee77d1ebc69283956c7d224ff4912777e6a498c4a69b07a6

See more details on using hashes here.

Provenance

The following attestation bundles were made for khiops_driver_gcs-0.0.23-py3-none-manylinux_2_28_x86_64.whl:

Publisher: pack-pip.yml on KhiopsML/khiopsdriver-gcs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file khiops_driver_gcs-0.0.23-py3-none-manylinux_2_28_aarch64.whl.

File metadata

File hashes

Hashes for khiops_driver_gcs-0.0.23-py3-none-manylinux_2_28_aarch64.whl
Algorithm Hash digest
SHA256 c5908192ac39ef9d3f60b1ed8415df7ab489052a384d50946374295c877d7a30
MD5 e753ff7ad5e1616b9da94c1c6286d894
BLAKE2b-256 a5806d05c9d4f6773da1399fbbd2b69cb236d44e76900abbd5768d7b8617b4b4

See more details on using hashes here.

Provenance

The following attestation bundles were made for khiops_driver_gcs-0.0.23-py3-none-manylinux_2_28_aarch64.whl:

Publisher: pack-pip.yml on KhiopsML/khiopsdriver-gcs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file khiops_driver_gcs-0.0.23-py3-none-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for khiops_driver_gcs-0.0.23-py3-none-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6850a115e8a6ccf91f6b48952395ed32a31e0a0b0b1c96561bb7898f2b58c4d4
MD5 351eb6773f3e6e9c7533b25f408eb62e
BLAKE2b-256 aa8a35fcea47c39df3c20f9282be63a15a26cd39d70469fcd21c6a51a3670e8c

See more details on using hashes here.

Provenance

The following attestation bundles were made for khiops_driver_gcs-0.0.23-py3-none-macosx_11_0_arm64.whl:

Publisher: pack-pip.yml on KhiopsML/khiopsdriver-gcs

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page