Skip to main content

MLflow WebHDFS Plugins

Project description

mlflow-webhdfs

The mlflow-webhdfs package is a plugin for MLFlow that enables WebHDFS as an artifact store. This allows you to use webhdfs:// URLs as a storage location for your MLFlow artifacts, facilitating the integration of HDFS (Hadoop Distributed File System) with MLFlow.

Features

  • Integration of WebHDFS with MLFlow as an artifact store.
  • Supports webhdfs:// as an artifact URL.
  • Seamlessly upload and download MLFlow artifacts to and from HDFS.
  • Supports Kerberos authentication via MLFLOW_KERBEROS_TOKEN and MLFLOW_KERBEROS_USER environment variables.

Requirements

  • MLFlow: Version 2.4.0 or later.
  • HDFS: The Python hdfs library for interacting with WebHDFS.
  • Environment Variables (optional): If using authentication, set the appropriate environment variables (MLFLOW_KERBEROS_TOKEN or MLFLOW_KERBEROS_USER).

Installation

To install mlflow-webhdfs, use pip to install it directly from the Python Package Index (PyPI) or from source:

Install from PyPI

pip install mlflow-webhdfs

Install from Source

Clone the repository and install via setup.py:

git clone https://github.com/roach231428/mlflow-webhdfs.git
cd mlflow-webhdfs
pip install -e .

Setup

Once the package is installed, you can use WebHDFS as an artifact store in your MLFlow project. The plugin allows you to specify a webhdfs:// URL when configuring artifact locations.

Usage

In your MLFlow script, use the webhdfs protocol when setting the artifact location. Here's an example of how to use the plugin in your MLFlow code:

import mlflow

# Set the WebHDFS artifact location
artifact_uri = "webhdfs://<webhdfs_host>:<port>/path/to/store/artifacts"

# Set the artifact URI for the MLflow run
mlflow.set_tracking_uri(artifact_uri)

# Log your MLFlow experiments
with mlflow.start_run():
    mlflow.log_param("param1", 5)
    mlflow.log_metric("metric1", 0.92)
    mlflow.log_artifact("my_artifact.txt")

Environment Variables

  • MLFLOW_KERBEROS_USER: Specifies the username for hdfs.InsecureClient.
  • MLFLOW_KERBEROS_TOKEN: Specifies the token for hdfs.TokenClient. If set, the plugin will use hdfs.TokenClient instead.

If neither of these environment variables is set, the plugin will fall back to using the hdfs.InsecureClient.

Configuration

No additional configuration is needed beyond specifying the webhdfs:// artifact URI and setting up the authentication environment variables if needed.

Development

To contribute or modify the plugin, follow these steps:

  1. Clone the repository.

  2. Install the development dependencies.

    pip install -r requirements.txt
    
  3. Make your changes, and run tests (if any) to ensure the plugin functions correctly.

  4. Create a pull request with your changes.

License

This project is licensed under the Apache License 2.0. See the LICENSE file for more details.

Contact

For any issues, please open an issue on the GitHub repository. For questions or comments, you can reach out to the author at roach231428@gmail.com.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlflow_webhdfs-0.0.1.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlflow_webhdfs-0.0.1-py3-none-any.whl (8.9 kB view details)

Uploaded Python 3

File details

Details for the file mlflow_webhdfs-0.0.1.tar.gz.

File metadata

  • Download URL: mlflow_webhdfs-0.0.1.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for mlflow_webhdfs-0.0.1.tar.gz
Algorithm Hash digest
SHA256 4981d3bb40a34e6a180fe8d77e392ca67dcb2b8d62068d8970ddbc5cb8c08a90
MD5 59f30006114056de34c5cbdd60400086
BLAKE2b-256 448f114a83fce57c053c7da2a6078ab9b1df2080af7056fa0ea948b1666b8652

See more details on using hashes here.

File details

Details for the file mlflow_webhdfs-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: mlflow_webhdfs-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 8.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.3

File hashes

Hashes for mlflow_webhdfs-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e0128ce9236655423325b64ea787ae4b7308a69b184558ac206252b878caeddc
MD5 9ace10558ebd7c23633759c137d338da
BLAKE2b-256 b6a83b3edfbd9b4604a312792ff579290d673a561ef197780acec83ac6e076e6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page