MLflow WebHDFS Plugins
Project description
mlflow-webhdfs
The mlflow-webhdfs package is a plugin for MLFlow that enables WebHDFS as an artifact store. This allows you to use webhdfs:// URLs as a storage location for your MLFlow artifacts, facilitating the integration of HDFS (Hadoop Distributed File System) with MLFlow.
Features
- Integration of WebHDFS with MLFlow as an artifact store.
- Supports
webhdfs://as an artifact URL. - Seamlessly upload and download MLFlow artifacts to and from HDFS.
- Supports Kerberos authentication via
MLFLOW_KERBEROS_TOKENandMLFLOW_KERBEROS_USERenvironment variables.
Requirements
- MLFlow: Version 2.4.0 or later.
- HDFS: The Python
hdfslibrary for interacting with WebHDFS. - Environment Variables (optional): If using authentication, set the appropriate environment variables (
MLFLOW_KERBEROS_TOKENorMLFLOW_KERBEROS_USER).
Installation
To install mlflow-webhdfs, use pip to install it directly from the Python Package Index (PyPI) or from source:
Install from PyPI
pip install mlflow-webhdfs
Install from Source
Clone the repository and install via setup.py:
git clone https://github.com/roach231428/mlflow-webhdfs.git
cd mlflow-webhdfs
pip install -e .
Setup
Once the package is installed, you can use WebHDFS as an artifact store in your MLFlow project. The plugin allows you to specify a webhdfs:// URL when configuring artifact locations.
Usage
In your MLFlow script, use the webhdfs protocol when setting the artifact location. Here's an example of how to use the plugin in your MLFlow code:
import mlflow
# Set the WebHDFS artifact location
artifact_uri = "webhdfs://<webhdfs_host>:<port>/path/to/store/artifacts"
# Set the artifact URI for the MLflow run
mlflow.set_tracking_uri(artifact_uri)
# Log your MLFlow experiments
with mlflow.start_run():
mlflow.log_param("param1", 5)
mlflow.log_metric("metric1", 0.92)
mlflow.log_artifact("my_artifact.txt")
Environment Variables
MLFLOW_KERBEROS_USER: Specifies the username forhdfs.InsecureClient.MLFLOW_KERBEROS_TOKEN: Specifies the token forhdfs.TokenClient. If set, the plugin will usehdfs.TokenClientinstead.
If neither of these environment variables is set, the plugin will fall back to using the hdfs.InsecureClient.
Configuration
No additional configuration is needed beyond specifying the webhdfs:// artifact URI and setting up the authentication environment variables if needed.
Development
To contribute or modify the plugin, follow these steps:
-
Clone the repository.
-
Install the development dependencies.
pip install -r requirements.txt
-
Make your changes, and run tests (if any) to ensure the plugin functions correctly.
-
Create a pull request with your changes.
License
This project is licensed under the Apache License 2.0. See the LICENSE file for more details.
Contact
For any issues, please open an issue on the GitHub repository. For questions or comments, you can reach out to the author at roach231428@gmail.com.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlflow_webhdfs-0.0.1.tar.gz.
File metadata
- Download URL: mlflow_webhdfs-0.0.1.tar.gz
- Upload date:
- Size: 8.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4981d3bb40a34e6a180fe8d77e392ca67dcb2b8d62068d8970ddbc5cb8c08a90
|
|
| MD5 |
59f30006114056de34c5cbdd60400086
|
|
| BLAKE2b-256 |
448f114a83fce57c053c7da2a6078ab9b1df2080af7056fa0ea948b1666b8652
|
File details
Details for the file mlflow_webhdfs-0.0.1-py3-none-any.whl.
File metadata
- Download URL: mlflow_webhdfs-0.0.1-py3-none-any.whl
- Upload date:
- Size: 8.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0128ce9236655423325b64ea787ae4b7308a69b184558ac206252b878caeddc
|
|
| MD5 |
9ace10558ebd7c23633759c137d338da
|
|
| BLAKE2b-256 |
b6a83b3edfbd9b4604a312792ff579290d673a561ef197780acec83ac6e076e6
|