Skip to main content

Pulumi EKS ML Infrastructure

Project description

Pulumi EKS ML Infrastructure

Tests

An opinionated library for multi-tenant, multi-region Machine Learning platforms on AWS.

This repository provides a modular set of Pulumi components (pulumi_eks_ml) to spin up multi-tenant, multi-region ML infrastructure with minimal pain.

💡 Philosophy

This project treats infrastructure as a composable library. Instead of one giant deployment, you get modular building blocks (VPC, EKS, GPU Node Pools) that you can assemble into your own topology.

Whether it's a single cluster for testing or a global mesh for distributed workloads, you can define your architecture once in Python, then deploy identical copies across different environments thanks to Pulumi stacks.

Architectural examples with pulumi_eks_ml

Project Description Architecture
Starter Single VPC, single EKS cluster with recommended addons. diagram
EKS Multi-Region Full-mesh VPC peering across regions, each with an EKS cluster. diagram
SkyPilot Multi-Tenant Hub-and-Spoke multi-region network with multi-tenant SkyPilot API server, Cognito auth, Tailscale VPN, and isolated data planes. diagram

⚡ Quickstart

Use the starter project as the fastest path to a working EKS cluster.

# __main__.py
import pulumi

from pulumi_eks_ml import eks, eks_addons, vpc

main_region = pulumi.Config("aws").require("region")
cfg = pulumi.Config()
deployment_name = f"{pulumi.get_project()}-{pulumi.get_stack()}"
node_pools_config = cfg.require_object("node_pools")

node_pools = [eks.NodePoolConfig.from_dict(pool) for pool in node_pools_config]

vpc_resource = vpc.VPC(
    name=f"{deployment_name}-vpc",
    cidr_block="10.0.0.0/16",
    setup_internet_egress=True,
)

cluster = eks.EKSCluster(
    f"{deployment_name}-cls",
    vpc_id=vpc_resource.vpc_id,
    subnet_ids=vpc_resource.private_subnet_ids,
    node_pools=node_pools,
)

eks.cluster.EKSClusterAddonInstaller(
    f"{deployment_name}-addons",
    cluster=cluster,
    addon_types=eks_addons.recommended_addons(),
)

pulumi.export("vpc_id", vpc_resource.vpc_id)
pulumi.export("cluster_name", cluster.cluster_name)
uv sync --dev
cd projects/starter
pulumi stack init dev
pulumi config set aws:region us-west-2
uv run pulumi up

🚀 Key Features

  • ML-Optimized Compute: Pre-configured EKS clusters with Karpenter for autoscaling (Spot/On-Demand) and NVIDIA GPU drivers ready to go.
  • Global Networking: Easy Multi-Region connectivity with Hub-and-Spoke or Full Mesh VPC peering topologies.
  • Opinionated Add-ons for ML: Built-in support for ALB Controller, EBS/EFS CSI drivers, FluentBit, Metrics Server, etc...
  • Secure network with Tailscale: Secure network with Tailscale for VPN access, in additional to public/private subnet isolation.
  • SkyPilot Multi-Tenant Platform: Opinionated deployment of SkyPilot for multi-tenant, multi-region AI workloads.

📂 Repository Structure

  • pulumi_eks_ml/: The core Python library containing reusable infrastructure components.
  • projects/: Reference implementations and live infrastructure code.
    • starter/: A simple single-region EKS cluster.
    • multi-region/: A full-mesh global network connecting clusters across regions.
    • skypilot-multi-tenant/: A SkyPilot platform with isolated data planes for multiple teams.

🛠 Getting Started

Prerequisites

1. Install & Setup

# Clone the repo
git clone https://github.com/Roulbac/pulumi-eks-ml.git
cd pulumi-eks-ml

# Install dependencies
uv sync --dev

2. Deploy a Project

Navigate to one of the reference projects to see it in action.

cd projects/starter

# Initialize your stack (e.g., dev)
pulumi stack init dev

# Deploy
uv run pulumi up

For custom infrastructure, create a new folder in projects/, import pulumi_eks_ml, and define your topology (see projects/starter/__main__.py for a template).

🧪 Testing

We include both unit and integration tests (using LocalStack).

# Run Unit Tests
uv run pytest -vv tests/unit

# Run Integration Tests
uv run pytest -vv tests/integration

📄 License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pulumi_eks_ml-0.2.0.tar.gz (32.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pulumi_eks_ml-0.2.0-py3-none-any.whl (48.7 kB view details)

Uploaded Python 3

File details

Details for the file pulumi_eks_ml-0.2.0.tar.gz.

File metadata

  • Download URL: pulumi_eks_ml-0.2.0.tar.gz
  • Upload date:
  • Size: 32.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pulumi_eks_ml-0.2.0.tar.gz
Algorithm Hash digest
SHA256 a74112316b8070d1c67b128a131a815b204cd525fbc5399a62b61d41cc7529d0
MD5 4128aba7a28e0582dc8e8cf98c71c93e
BLAKE2b-256 0908fa4ae287d5f72685a527be62e439bd5f943ef52d7a03a845a9a460ad9ccf

See more details on using hashes here.

Provenance

The following attestation bundles were made for pulumi_eks_ml-0.2.0.tar.gz:

Publisher: publish.yml on Roulbac/pulumi-eks-ml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pulumi_eks_ml-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: pulumi_eks_ml-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 48.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pulumi_eks_ml-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e7e3ee5f71c97ce1467f4fbabcaa039546d4c640ea1f84e0b8765da3d3bdd5f5
MD5 ddbe89987019c1e665e2d7253bd25178
BLAKE2b-256 132216c1c6142626b3378254ef1af560ec7c0114a00d681ebe4bc6fb79312323

See more details on using hashes here.

Provenance

The following attestation bundles were made for pulumi_eks_ml-0.2.0-py3-none-any.whl:

Publisher: publish.yml on Roulbac/pulumi-eks-ml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page