Rancher + Keycloak CRD-driven plugin for Waldur Site Agent
Project description
waldur-site-agent-rancher-kc-crd
CRD-driven Rancher + Keycloak membership-sync plugin for Waldur Site
Agent. Translates Waldur Resource + ResourceProject + UserRole
state into ManagedRancherProject Custom Resources; the
rancher-keycloak-operator running inside the target Kubernetes
cluster owns the actual Rancher project + Keycloak group lifecycle.
Expected operator
This plugin only writes CRs — it relies on
waldur/rancher-keycloak-operator to be running
in the target Kubernetes cluster. That operator owns the
ManagedRancherProject and RancherProjectInventory CRD definitions
and is the only thing that talks to Rancher and Keycloak APIs.
Minimum operator version compatible with the current plugin: 0.3.0.
Recommended: 0.3.1+ — earlier 0.3.x bug-fixes a metadata-sync gap
where description / organization / projectSlug changes never reached
Rancher after the initial create. The plugin no longer emits
spec.namespace, which 0.2.x operators relied on for namespace
creation; earlier versions also miss audit fields and the
stale-project-ID cleanup fallback that the plugin assumes.
The operator's helm chart lives in its own repo under
helm/rancher-keycloak-operator/; install instructions are in the
Setup section below.
Scope
membership_sync_backend only. This plugin handles user-to-project
role bindings. It does not provision or terminate Rancher clusters,
does not process orders, and does not report usage. Order processing
and reporting are out of scope; if the offering needs them they have
to be wired through other backends.
Cluster pre-exists. Each Waldur Resource's backend_id is its
Rancher cluster ID; this plugin does not stand up Rancher clusters.
Opt-in per offering. Set
membership_sync_backend: rancher-kc-crd on the offering's
site-agent config to route its membership sync through this plugin.
Architecture
flowchart LR
subgraph Waldur ["Waldur Mastermind"]
WR[Resource]
WRP[ResourceProject]
WUR[UserRole]
WR --- WRP --- WUR
end
subgraph Agent ["Site Agent / this plugin"]
PR[pull_resource]
TR[translator.build_cr_spec]
CC[CrdClient.apply / list / delete]
SR[status_reader]
end
subgraph K8s ["Kubernetes cluster"]
CR[(ManagedRancherProject CR)]
OP[rancher-keycloak-operator]
CR -. watched by .-> OP
end
subgraph External ["External APIs"]
RA[Rancher Mgmt API]
KC[Keycloak Admin API]
end
Waldur -- "SDK GET<br/>(provider-resource-projects,<br/>list_users)" --> PR
PR --> TR --> CC -- "kube apply" --> CR
CR --> SR
SR -- "status.* + drift" --> Waldur
OP -- "v3 REST" --> RA
OP -- "Admin REST" --> KC
The plugin's responsibilities stop at writing CRs and reading their status. The operator owns:
- creating / adopting Rancher projects, namespaces, and resource quotas
- creating Keycloak parent + child groups and binding them to the
Rancher project via
ProjectRoleTemplateBinding(PRTB) withgroupPrincipalId: keycloakoidc_group://<group-name> - adding/removing users to/from Keycloak groups based on
spec.keycloak.roleBindings[].members[] - cascading cleanup on CR delete (
@kopf.on.delete)
Sequence: end-to-end membership sync
This is what happens on a single pull_resource call (i.e. one
membership-sync cycle per Resource the offering owns).
sequenceDiagram
autonumber
participant SA as Site Agent
participant PG as plugin
participant W as Waldur API
participant K as Kubernetes API
participant OP as Operator
participant R as Rancher
participant KC as Keycloak
SA->>PG: pull_resource(WaldurResource)
PG->>W: GET /api/marketplace-provider-resource-projects/?resource_uuid=…
W-->>PG: [RP1, RP2, …]
loop per ResourceProject
PG->>W: GET /api/marketplace-provider-resource-projects/<uuid>/list_users/
W-->>PG: [UserRole, …]
PG->>PG: build_cr_spec(resource, RP, users)
PG->>K: kube apply ManagedRancherProject (server-side)
K-->>OP: watch event (create/update)
OP->>R: ensure project (create or adopt by name)
OP->>R: ensure namespace + ResourceQuota
OP->>KC: ensure parent group, child groups
OP->>R: ensure PRTB(s) bound to KC groups
OP->>KC: GET /users/{id} or /users?username=X<br/>(per member)
OP->>KC: add/remove user from KC group
OP->>K: patch status.conditions + status.keycloakRoleBindings.syncedMembers
PG->>K: kube get ManagedRancherProject
K-->>PG: status.* (synced members per binding)
end
PG->>K: list ManagedRancherProject by label waldur.io/resource-uuid
K-->>PG: [all CRs for this resource]
PG->>K: kube delete (CRs whose RP no longer in Waldur)
K-->>OP: watch event (delete)
OP->>R: delete PRTBs + project (with stale-ID find-by-name fallback)
OP->>KC: delete child groups, parent group if empty
PG-->>SA: BackendResourceInfo(users=union of synced members)
Key invariants:
- Idempotent at every step. Re-applying the same CR yields the same Rancher project and Keycloak groups. Re-syncing with the same user set is a no-op on the Rancher/Keycloak side.
- One CR per ResourceProject, named
<resource.slug>-<rp.uuid[:8]>. Stable across renames. - Orphan pruning by label. CRs are stamped with
metadata.labels.waldur.io/resource-uuid; on each sync the plugin computes the set of expected CR names from the current Waldur RP list, then deletes any label-matching CR outside that set. - Lookup-only for users. The operator never creates Keycloak
users; it only binds existing ones. Users absent from Keycloak get
a
WARNING User <id> not found in Keycloakand are skipped (the PRTB and the group are still created — they're just empty for that user).
Sequence: orphan-CR pruning
sequenceDiagram
autonumber
participant SA as Site Agent
participant W as Waldur
participant K as Kubernetes
participant OP as Operator
participant R as Rancher
participant KC as Keycloak
Note over SA: pull_resource() loop completes;<br/>2 CRs applied for RP1, RP2
SA->>W: GET resource-projects?resource_uuid=…
W-->>SA: [RP1] (RP2 was deleted in Waldur)
SA->>K: list mrp -l waldur.io/resource-uuid=…
K-->>SA: [CR_RP1, CR_RP2]
Note over SA: expected={CR_RP1}<br/>found={CR_RP1, CR_RP2}<br/>orphans={CR_RP2}
SA->>K: kube delete CR_RP2
K-->>OP: watch event (delete + finalizer hold)
OP->>R: delete PRTBs for RP2's project
OP->>KC: remove members from KC groups
OP->>KC: delete KC child groups
OP->>KC: delete KC parent group (if empty)
OP->>R: delete Rancher project
alt stored project ID is stale (e.g. project recreated externally with new ID)
OP->>R: DELETE /v3/projects/<stored_id> -> 404
OP->>R: GET /v3/projects?clusterId=…&name=<projectName>
R-->>OP: {id: <current_id>}
OP->>R: DELETE /v3/projects/<current_id> -> 200
end
OP->>K: remove finalizer (CR fully gone)
Why pruning lives in the plugin, not the operator: the operator doesn't know about Waldur — the Waldur RP list is the source of truth, and only the plugin sees both sides. The label selector keeps pruning safe: CRs without the label (manually-created, or from a different source) are never touched.
Sequence: cleanup on stale rancherProjectId
When a Rancher project is externally deleted and recreated between
two operator reconciles, the CR's status.rancherProjectId points
at a non-existent ID. The cleanup falls back to finding the live
project by clusterId+projectName:
sequenceDiagram
autonumber
participant U as User / Test
participant K as Kubernetes
participant OP as Operator
participant R as Rancher
Note over U: status.rancherProjectId = p-OLD<br/>(actual Rancher project: p-NEW, same name)
U->>K: kubectl delete mrp <name>
K-->>OP: on_delete handler fires
OP->>R: DELETE /v3/projects/p-OLD
R-->>OP: 404 Not Found
Note over OP: delete_project returned False<br/>-> stored ID is stale
OP->>R: GET /v3/projects?clusterId=…&name=<projectName>
R-->>OP: [{id: p-NEW, …}]
OP->>OP: log WARNING "Stored projectId p-OLD was stale;<br/>deleting current p-NEW found by name"
OP->>R: DELETE /v3/projects/p-NEW
R-->>OP: 200 OK
OP->>K: cleanup complete; release finalizer
This was the failure mode that left orphan Rancher projects after
external recreation; fix landed in operator 0.2.2.
Setup
1. Operator: install in target cluster
The waldur/rancher-keycloak-operator must be
running in the cluster you point this plugin at. One operator
instance handles all ManagedRancherProject CRs in its
namespace; one operator can manage Rancher projects across multiple
downstream Rancher clusters (each CR specifies its own clusterId).
1a. CRDs
# Clone the operator repo (separate from waldur-site-agent):
git clone https://github.com/waldur/rancher-keycloak-operator.git
cd rancher-keycloak-operator
kubectl apply -f helm/rancher-keycloak-operator/templates/crds/
kubectl get crds | grep waldur.io
# expect:
# managedrancherprojects.waldur.io
# rancherprojectinventories.waldur.io
1b. Helm install (published image)
The published image is opennode/rancher-keycloak-operator:<version>
on Docker Hub. Pin a specific version (don't use :latest):
kubectl create namespace waldur-system
helm upgrade --install rko \
./helm/rancher-keycloak-operator \
--namespace waldur-system \
--set image.repository=opennode/rancher-keycloak-operator \
--set image.tag=0.3.1 \
--set image.pullPolicy=IfNotPresent \
--set "config.rancher.url=https://rancher.example.com" \
--set "config.rancher.bearerToken=<rancher-bearer-token>" \
--set "config.rancher.verifySsl=true" \
--set "config.keycloak.url=https://keycloak.example.com" \
--set "config.keycloak.realm=<realm>" \
--set "config.keycloak.userRealm=master" \
--set "config.keycloak.username=<kc-admin-user>" \
--set "config.keycloak.password=<kc-admin-password>" \
--set "config.keycloak.verifySsl=true"
kubectl rollout status deploy/rko-rancher-keycloak-operator -n waldur-system
1c. Smoke-test the operator (optional but recommended)
The operator repo ships a Tier-1 runbook at
docs/tier-1-runbook.md that walks through applying a hand-crafted
CR end-to-end against the configured Rancher and Keycloak before
wiring in the site-agent. Run it once per cluster to catch
credential / connectivity issues early.
1d. Required Rancher + Keycloak permissions
| System | Role / scope |
|---|---|
| Rancher token | unscoped admin OR cluster-owner across all clusters this operator instance will manage |
| Keycloak admin user | realm-admin on the target realm (group create/delete, group member add/remove, user lookup) |
2. Plugin: install on the site-agent host
The plugin is a workspace member of the
waldur-site-agent repo and is installed automatically when you run
uv sync --all-packages at the repo root. To verify it's discovered:
uv run python -c "from waldur_site_agent_rancher_kc_crd import backend; print(backend.RancherKcCrdBackend)"
2a. Site-agent host needs
- Network access to: Waldur Mastermind API, the Kubernetes API of the cluster running the operator.
- A kubeconfig file (or in-cluster service-account credentials if you run the agent inside the operator's cluster).
- The Kubernetes API user must be allowed to
get/list/create/update/delete/patchmanagedrancherprojects.waldur.ioin the chosen namespace.
2b. Configure the offering
Add a stanza to waldur-site-agent-config.yaml. Full reference at
examples/rancher-kc-crd-config.yaml.
offerings:
- name: "my-rancher-offering"
waldur_api_url: "https://waldur.example.com/api/"
waldur_api_token: "${WALDUR_API_TOKEN}"
waldur_offering_uuid: "<offering-uuid>"
backend_type: "rancher-kc-crd"
membership_sync_backend: "rancher-kc-crd"
backend_settings:
# SDK client (the plugin builds its own AuthenticatedClient).
waldur_api_url: "https://waldur.example.com/api/"
waldur_api_token: "${WALDUR_API_TOKEN}"
waldur_verify_ssl: true
# Where the operator listens.
kubeconfig_path: "~/.kube/config" # or omit for in-cluster
context: "my-cluster"
namespace: "waldur-system"
# Cluster ID comes from each Resource's `backend_id` (1:1 with
# a Rancher downstream cluster). There is no offering-level
# cluster_id setting -- if a resource's backend_id is empty,
# CR build raises a clear error rather than emitting an
# invalid CR.
# Keycloak group naming -- see note A below for the full variable
# list, including ${customer_slug}, ${project_slug}, ${resource_slug},
# and ${rp_uuid_short}. The default keeps the names compact and
# immutable; override `group_name_template` for human-readable
# group names.
parent_group_name: "c_${cluster_id}"
group_name_template: "c_${cluster_id}_${rp_uuid}_${role_name}" # default
# Map Waldur offering role names -> Rancher role template IDs.
# Roles absent from this map are skipped (operator can't bind them).
role_map:
project_member: "project-member"
project_admin: "project-owner"
create_ns: "create-ns"
# User-identity link to Keycloak. See "User identity matching" below.
keycloak_use_user_id: false
2c. Run
rancher-kc-crd is a membership_sync_backend only. Order
processing and reporting still need other backends (or none).
uv run waldur_site_agent --mode membership_sync \
--config-file waldur-site-agent-config.yaml
Observability:
# Watch CRs being created/updated/deleted
kubectl get mrp -n waldur-system -L waldur.io/resource-uuid -w
# Inspect any CR's reconciliation state
kubectl describe mrp <name> -n waldur-system
# Operator logs (cleanup, member sync, drift)
kubectl logs deploy/rko-rancher-keycloak-operator -n waldur-system --tail=100 -f
User identity matching
The plugin can match Waldur users to Keycloak users either by username
(default) or by UUID. Choose with backend_settings.keycloak_use_user_id:
false (default) — match by username
: Plugin sends UserRole.user_username. Operator does
GET /admin/realms/<realm>/users?username=X&exact=true and uses
the resulting user.id for group membership operations. Works in
both OIDC and self-hosted Waldur as long as Waldur usernames
align with Keycloak usernames — the typical OIDC mapping does this
via the preferred_username claim.
true — match by UUID
: Plugin sends UserRole.user_uuid. Operator does
GET /admin/realms/<realm>/users/{uuid} (matches the Keycloak
internal user.id). Use this only when Waldur was OIDC-provisioned
AND its user UUIDs were seeded from the Keycloak sub claim, so
that Waldur.user.uuid == Keycloak.user.id. The username path is
preferred because it tolerates UUID divergence and works in more
topologies.
The operator never creates users. A user that doesn't exist in
Keycloak under the chosen identifier gets logged as
WARNING User <id> not found in Keycloak and is skipped — the
PRTB and the group are still created and bound, the user just isn't
a member yet. They become a member on the next reconcile after the
user appears in Keycloak (e.g. their first OIDC login).
Configuration reference
| Key | Type | Required | Description |
|---|---|---|---|
waldur_api_url |
string | yes | Mastermind API root with /api/. Plugin strips trailing /api for the SDK. |
waldur_api_token |
string | yes | Long-lived token from /api/users/<uuid>/keys/. Don't use a session token. |
waldur_verify_ssl |
bool | no (default true) |
TLS verify for Waldur calls. |
kubeconfig_path |
string | no | Path to a kubeconfig file. Omit to use in-cluster credentials. |
context |
string | no | kubeconfig context to use when kubeconfig_path is set. |
namespace |
string | yes | Namespace for ManagedRancherProject CRs (typically waldur-system). |
parent_group_name |
string | no | Top-level KC group; var ${cluster_id}. Default c_${cluster_id}. |
group_name_template |
string | no | Per-role child KC group; vars listed in note A. |
role_map |
dict | yes | Waldur role name → Rancher role template ID. Roles outside the map are skipped. |
keycloak_use_user_id |
bool | no | false (default) → match by username. true → match by UUID. See above. |
spec.clusterId is resolved from each Resource's backend_id (1:1
with a Rancher cluster) — there is no offering-level cluster_id
setting. An empty backend_id raises a clear KeyError rather than
silently emitting an invalid CR.
Note A — group_name_template variables. Available substitutions:
| Variable | Source | Notes |
|---|---|---|
${cluster_id} |
Resource.backend_id |
Rancher cluster ID, opaque |
${role_name} |
UserRole.role_name | Pre-mapping (before role_map) |
${rp_uuid} |
ResourceProject.uuid |
Full 32-char hex |
${rp_uuid_short} |
first 8 chars of ${rp_uuid} |
~4B combos, collision-free; same as cr_name |
${customer_slug} |
Resource.customer_slug |
Waldur Customer (organization) slug |
${project_slug} |
Resource.project_slug |
Waldur Project slug (parent Project; RPs have no slug) |
${resource_slug} |
Resource.slug |
Waldur Resource slug (1:1 with cluster) |
${project_name} |
ResourceProject.name |
Human-readable; may contain spaces |
Default is c_${cluster_id}_${rp_uuid}_${role_name} — one Keycloak group
per (cluster × project × role), matching Rancher's per-project-PRTB
access model. The default uses ${rp_uuid} (immutable) for stability;
override only if you have a strong reason.
Recommended human-readable opt-in template:
c_${cluster_id}_${customer_slug}_${project_slug}_${rp_uuid_short}_${role_name}
Renders e.g. c_c-m-glwxdksp_hpc-demo-org_genomics-2026_8706dd1a_project_member.
Stays unique per RP via ${rp_uuid_short} while the slugs make the
group name self-explaining in the Keycloak admin UI.
Custom-template constraints.
- MUST include a per-project discriminator (
${rp_uuid},${rp_uuid_short}, or${project_name}); without it, multiple projects share one group and a user added to project A also gains access to B, C, … via the shared group's PRTBs. - Slugs (
customer_slug,project_slug,resource_slug,project_name) are mutable -- renaming the entity in Waldur creates a new Keycloak group on the next reconcile and orphans the old one (the operator adopts groups by name and never renames adopted groups). Memberships in the old group become stale. - Switching the template after deployment has the same effect as a bulk rename: every existing CR re-renders, the operator creates fresh groups, the old groups linger with their stale members. Plan a one-time manual migration if you change the template against an existing deployment.
- Keycloak's
GROUP.NAMEcolumn isvarchar(255). The plugin guards this at render time and raises aValueError(with a hint about${rp_uuid_short}) if the rendered name would exceed 255 chars, so the operator never tries to apply a CR that Keycloak would reject with HTTP 500. Stay well under by preferring${rp_uuid_short}over${rp_uuid}in long templates and keeping Waldur slugs reasonably short (say ≤ 50 chars each).
Troubleshooting
Plugin logs HTTP/1.1 401 Unauthorized from Waldur on every iteration
: waldur_api_token is a session token from /api-auth/password/ (rotates on each call). Use a long-lived
API token from /api/users/<uuid>/keys/.
pull_resource succeeds but no CRs are created
: The resource has an empty backend_id. The agent's resource fetcher
(waldur_site_agent/common/processors.py:_get_waldur_resources) drops resources without one. Fix:
POST /api/marketplace-provider-resources/<uuid>/set_backend_id/.
Plugin logs IndexError: list index out of range in processor __init__
: Customer is not registered as a service provider. Fix: POST /api/marketplace-service-providers/
with the customer URL.
Plugin logs GET .../api/api/marketplace-provider-resource-projects/... (doubled /api/)
: You're on a build older than the URL fix bundled with the orphan-pruning commit. Pull latest plugin code,
or as a workaround drop the trailing /api/ from waldur_api_url.
Operator logs WARNING User X not found in Keycloak for every user
: Identity mismatch — the chosen identifier (username by default, UUID with keycloak_use_user_id: true)
isn't resolvable in Keycloak. With the default username path: ensure Waldur usernames map to existing
Keycloak usernames. With the UUID path: align Waldur user UUIDs with the Keycloak OIDC sub.
kubectl delete mrp succeeds in 0s but the Rancher project remains
: status.rancherProjectId is stale; operator versions before 0.2.2 treated 404 on delete as success.
Upgrade the operator to 0.2.2+ — cleanup now falls back to find-by-name.
Operator status.conditions to check for any CR (kubectl describe mrp <name>):
| Condition | What status=False means |
|---|---|
RancherProjectReady |
Rancher create/adopt failed. Check operator log for httpx.HTTPStatusError. |
ResourceQuotaReady |
spec.resourceQuotas apply failed (only emitted by operator 0.3.0+). |
KeycloakGroupsReady |
Couldn't create/find parent or child KC groups. Check KC admin credentials. |
RancherBindingsReady |
PRTB creation failed — usually invalid rancherRole in role_map. |
MembershipSynced |
Per-user add/remove failed — see User X not found in Keycloak warnings. |
Development
# Workspace install
cd <repo-root>
uv sync --all-packages
# Run unit tests for this plugin only
uv run pytest plugins/rancher-kc-crd/tests/
# Integration tests (requires a real K8s cluster with operator + CRDs)
K8S_CRD_TEST=1 \
RANCHER_CLUSTER_ID=c-m-abc12345 \
KUBE_CONTEXT=docker-desktop \
uv run pytest plugins/rancher-kc-crd/tests/test_backend_integration.py -v
# Lint + format
uvx prek run --all-files
tests/ layout:
| File | Coverage |
|---|---|
test_translator.py |
15 pure tests: cr_name, group templates, role bindings, full CR build. |
test_status_reader.py |
13 pure tests: status → BackendResourceInfo + drift detection. |
test_backend_integration.py |
6 tests, K8S_CRD_TEST=1: quotas, no-client, orphan pruning. |
Companion components
| Component | Role |
|---|---|
rancher-keycloak-operator (separate repo) |
Reconciles ManagedRancherProject CRs. |
ManagedRancherProject CRD (in operator helm chart) |
API surface the plugin writes to. |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file waldur_site_agent_rancher_kc_crd-1.0.4rc12.tar.gz.
File metadata
- Download URL: waldur_site_agent_rancher_kc_crd-1.0.4rc12.tar.gz
- Upload date:
- Size: 38.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0094de40cacd5c6a41e3e2df252f643eee9dbdb986535a909d9e1cb9324d2e04
|
|
| MD5 |
817d4eb857b895a675c6b08791a9d2b2
|
|
| BLAKE2b-256 |
668fd6e39a7443881f755f541d7bafaad4ac1b15648b740d70bf168a2ddacd5e
|
File details
Details for the file waldur_site_agent_rancher_kc_crd-1.0.4rc12-py3-none-any.whl.
File metadata
- Download URL: waldur_site_agent_rancher_kc_crd-1.0.4rc12-py3-none-any.whl
- Upload date:
- Size: 26.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.11 {"installer":{"name":"uv","version":"0.11.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Debian GNU/Linux","version":"12","id":"bookworm","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b9a71b3475774489e994e690383efdc8306aaff1455edd2558f7ae931e9ab5be
|
|
| MD5 |
c98181f230b2d682f4cf2d0eaf2a9939
|
|
| BLAKE2b-256 |
b48d04278583e711695cc620f71a1da5f020d9a16a07365d899858e670f1024a
|