Airflow provider package for Azure Machine Learning
Project description
Airflow Provider for Azure Machine Learning
Source Code | Package_PyPI | Example DAGs | Example Docker Containers
This package enables you to submit workflows to Azure Machine Learning from Apache Airflow.
Pre-requisites
- Azure Account and Azure Machine Learning workspace
- To verfiy your workspace is set up successfully, you can try to access your workspace at Azure Machine Learning Studio, and try to perform basic actions like allocating compute clusters and submittnig a training job, etc.
- A running Apache Airflow instance.
Installation
In you Apache Airflow instance, run:
pip install airflow-provider-azure-machinelearning
Or, try it out by following examples in the dev folder, or Airflow's How-to-Guide to set up Airflow in Docker containers.
Configure Azure Machine Learning Connections in Airflow
To send workload to your Azure Machine Learning workspace from Airflow, you need to set up an "Azure Machine Learning" Connection in your Airflow instance:
-
Make sure this package is installed to your Airflow instance. Without this, you will not see "Azure Machine Learning" in the drop down in step 3 and will not be able to add this type of connections.
-
On Airflow web portal, navigate to
Admin
-->Connections
, and click on+
to add a new connection. -
From the "Connection Type" dropdown, select "Azure Machine Learning". You should see a form like below
-
Connection Id
is a unique identifier for your connection. You will also need to pass this string into AzureML Airflow operators. Check out those example dags. -
Description
is optional. All other fields are required. -
Tenant ID
. You can follow this instruction to retrieve it. -
Subscription ID
,Resource Group Name
, andWorkspace Name
can uniquely identify your workspace in Azure Machine Learning. After opening Azure Machine Learning Studio, select the desired workspace, then click the "Change workspace" on the upper-right corner of the website (to the left of the profile icon). Here you can find theWorkspace Name
. Now, click "View All Properties in Azure Portal'. This is Azure resource page of your workspace. From there you can retrieveSubscription ID
, andResource Group Name
. -
Client ID
andSecret
are a pair. They are basically 'username' and 'password' to the service principle based authentification process. You need to generate them in Azure Portal, and give it 'Contributor' permissions to the resource group of your workspace. That ensures your Airflow connection can read/write your Azure ML resources to facilitate workloads. Please follow the 3 simple steps below to set them up.
To create a service principal, you need to follow 3 simple steps:
- Create a
Client ID
. Follow instruction from the "Register an application with Azure AD and create a service principal" section of Azure guide howto-create-service-principal-portal.Application ID
, akaClient ID
, is the unique identifier of this service principal. - Create a
Secret
. You can create aSecret
under this application in the Azure Portal following the instructions in the "Option 2: Create a new application secret" section of this instruction. Once asecret
is successfully created, you will not be able to see the value. So we recommend you store your secret into Azure Key Vault, following this instruction. - Give this Service Principal
Contribtor
access to your Azure Machine LearningResource Group
. Repeat the instruction form the item 7 above and land on your workspaces' resource page and click on theResource Group
. From the left hand panel, selectAccess Control (IAM)
and assignContributor
role to the the Application from above. This step is important. Without it, your Airflow will not have the necessary write access to necessary resources to create compute clusters, to execute training workloads, or to upload data, etc. Here is an instruction to assign roles.
Note
If "Azure Machine Learning" is missing from the dropdown in step 3 above, it means airflow-providers-azure-machinelearning
package is not successfully installed. You can follow instructions in the Installation section to install it, and use commands like ``pip show airflow-provider-azure-machinelearning``` in the Airflow webserver container/machine to verify the package is installed correctly.
You can have many connections in one Airflow instance for different Azure Machine Learning workspaces. You can do this to:
- Orchestrate workloads across multiple workspace/subscription from 1 single DAG.
- Achieve isolation between different engineers' workload.
- Achieve isolation between experimental and production environments.
The instructions above are for adding a connection via the Airflow UI. You can also do so via the Airflow Cli. You can find more examples of how to do this via Cli at Airflow Documentation. Below is an example Airflow command:
airflow connections add \
--conn-type "azure_machine_learning" \
--conn-description "[Description]" \
--conn-host "schema" \
--conn-login "[Client-ID]" \
--conn-password "[Secret]" \
--conn-extra '{"extra__azure_machine_learning__tenantId": "[Tenant-ID]", "extra__azure_machine_learning__subscriptionId": "[Subscription-ID]", "extra__azure_machine_learning__resource_group_name": "[Resource-Group-Name]", "extra__azure_machine_learning__workspace_name": "[Workspace-Name]"}' \
"[Connection-ID]"
Examples
Check out example_dags on how to make use of this provider package. If you do not have a running Airflow instance, please refer to example docker containers, or [Apache Airflow documentations)https://airflow.apache.org/).
Dev Environment
To build this package, run its tests, run its linting tools, etc, you will need following:
- Via pip:
pip install -r dev/requirements.txt
- Via conda:
conda env create -f dev/environment.yml
Running the tests and linters
- All tests are in tests folder. To run them, from this folder, run
pytest
- This repo uses black, flake8, and isort to keep coding format consistent. From this folder, run
black .
,isort .
, andflake8
.
Issues
Please submit issues and pull requests in our official repo: https://github.com/azure/airflow-provider-azure-machinelearning.
Contributing
This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.
When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.
This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.
Trademarks
This project may contain trademarks or logos for projects, products, or services. Authorized use of Microsoft trademarks or logos is subject to and must follow Microsoft's Trademark & Brand Guidelines. Use of Microsoft trademarks or logos in modified versions of this project must not cause confusion or imply Microsoft sponsorship. Any use of third-party trademarks or logos are subject to those third-party's policies.
Release History
0.0.1
Features Added
- First preview.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file airflow-provider-azure-machinelearning-0.0.1b2.tar.gz
.
File metadata
- Download URL: airflow-provider-azure-machinelearning-0.0.1b2.tar.gz
- Upload date:
- Size: 25.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 283a3aec9f2f5f4a6a7d8ee2623631f4a0b497eddf65292f24c2a4fb41fdfb57 |
|
MD5 | 0a296748b7f6535f1da75236ac679fb4 |
|
BLAKE2b-256 | 26f6811c9244757ef64dfe7efa58f97bd7054e6d3f6e245e66e6984b4977ea41 |
File details
Details for the file airflow_provider_azure_machinelearning-0.0.1b2-py3-none-any.whl
.
File metadata
- Download URL: airflow_provider_azure_machinelearning-0.0.1b2-py3-none-any.whl
- Upload date:
- Size: 37.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef304eca5b4204385566295a44bbcca190ad1bdb00d77718e2fdc2eadfbb653f |
|
MD5 | 60d8f19799bce7a22f517acc14da8d88 |
|
BLAKE2b-256 | 0c24b2a883e5c543eeaf62b17c972fc61288a78891a198d4ef1ec4de5274e8b2 |