Skip to content

Hack The Garden Sofia Edition 03/2026 Wrap Up ​

Group picture of the hackathon attendees on the rooftop of the SAP Center Sofia

🐳 [GEP-28] Self-Hosted Shoot: Gardener-in-Docker (gind) ​

TIP

You can find out more about Self-Hosted Shoot Clusters in GEP-28.

Tracking: hackathon#8

Problem Statement ​

It should be possible to create self-hosted shoot clusters using gardenadm and run Gardener inside such a cluster. Before introducing a tool like gind (which runs the self-hosted shoot directly in Docker), we first need to support hosting Gardener inside a self-hosted shoot cluster.

Achievements ​

  • Deployed gardener-operator into the self-hosted shoot.
  • Deployed a Garden resource β€” the self-hosted shoot now serves as runtime cluster for the virtual garden.
  • Enabled the ManagedSeed controller in the shoot gardenlet, allowing the self-hosted Shoot itself to be referenced in the ManagedSeed.
  • Adapted the local setup for direct API access to both the self-hosted shoot API server and the virtual garden API server from the host machine (no port-forwarding).

Next Steps ​

  • Cleanup the code and commits, adapt documentation and make rules.
  • Open individual PRs for the different features and get them merged.
  • Introduce an e2e test for this scenario ("fully self-contained Gardener").
  • Try spinning up workerless and regular hosted shoots on this seed.

Code & Pull Requests ​

πŸ•ΉοΈ [GEP-28] Ensure System Pods Run on Control Plane Nodes ​

Tracking: hackathon#16

Problem Statement ​

System components in a self-hosted shoot (pods in garden namespace, extensions, system pods in kube-system) are not guaranteed to run exclusively on control plane nodes. Over time, they might get rescheduled to worker nodes.

Achievements ​

  • Implemented placement enforcement so that system pods exclusively run on control plane nodes.

Code & Pull Requests ​

β€οΈβ€πŸ©Ή [GEP-28] Self-Hosted Shoot Control Plane Restoration ​

Tracking: hackathon#22

Problem Statement ​

When a self-hosted shoot cluster loses its control plane, it must be possible to restore the secrets and control plane state from the ShootState resource.

Achievements ​

  • PoC branch using the ShootState for restoring secrets when restoring a control plane of a self-hosted shoot: poc/gep-28-dr.
  • Demo scripts: demo/restore-from-shootstate.
  • Fixed a bug when computing the ShootState for a self-hosted shoot cluster.

Next Steps ​

  • Eliminate hacks/workarounds: enable etcd encryption, adapt csrapprover for gardener-node-agent CSRs, fix pod network availability check, sanitize etcd data, eliminate second-phase restore.
  • Design and implement how to read/compute the ShootState (via gardenadm discover or from etcd backup).
  • Design and implement etcd backup restore.
  • Add support for restoring a self-hosted shoot with worker nodes.

Code & Pull Requests ​

πŸ—οΈ [GEP-28] Eliminate Static Admin Token After gardenadm connect ​

Tracking: hackathon#14

Problem Statement ​

After gardenadm init, the control plane components use a static token with cluster-admin privileges for bootstrapping. Once the cluster is fully connected (gardenadm connect), this should be replaced with short-lived tokens from gardener-resource-manager.

Outcome ​

Discussed and decided that this will be part of the shoot/shoot controller and should not be handled in gardenadm init or gardenadm connect explicitly. Closed in favor of Experiment with shoot/shoot controller in Self-Hosted Shoot Clusters – hackathon#45.

πŸ”‘ Functional Local Setup with Workload Identity ​

Tracking: hackathon#28

Problem Statement ​

The local development setup does not work with Workload Identity (WI), making it impossible to test WI-dependent scenarios locally.

Achievements ​

  • Initial implementation establishing trust between the local KinD cluster and the Gardener Workload Identity Issuer.
  • Identified that Machine Controller Manager is not deployed with minimal permissions β€” opened a PR to address this.

Next Steps ​

  • Clean up the code in the linked branch.
  • Verify all scenarios work as expected and current tests pass.
  • Implement new e2e tests leveraging Workload Identity or enable it for existing ones.
  • Add support for Workload Identity in other local scenarios (ETCD backups, DNS, etc.).

Code & Pull Requests ​

πŸ€– AGENTS.md / SKILLS.md for Gardener Repos ​

Tracking: hackathon#31

Problem Statement ​

AI-native development tools (Claude Code, Codex CLI, Gemini CLI) benefit from repository-level context files (AGENTS.md, SKILLS.md). The question is how to best leverage these for the Gardener ecosystem.

Achievements ​

  • Researched recent papers: Evaluating AGENTS.md (Feb 2026) and SkillsBench (Feb/Mar 2026).
  • Key findings: curated skills provide +16.2pp average improvement; LLM-generated context provides negligible or negative benefit; focused skills with 2–3 modules outperform comprehensive documentation.
  • Proposed a minimal AGENTS.md template focused on "common mistakes and confusion points" rather than comprehensive documentation.

Next Steps ​

  • Experiment with proposed AGENTS.md file in gardener org repos (not gardener/gardener).
  • If significant benefit is observed, present findings in a larger forum (Gardener Review Meeting).

πŸ“¦ PoC: Repo Tools Integration with Extension Repositories ​

Tracking: hackathon#18

Problem Statement ​

Extension repositories share ~10 almost identical make targets and hack scripts with gardener/gardener. Changes to these shared scripts result in copy-paste effort across all repositories (~20 PRs for a single fix).

Achievements ​

  • Explored a subtree approach for centralizing shared make targets and hack scripts into a separate repository.
  • Adapted gardener-extension-shoot-rsyslog-relp and pvc-autoscaler as PoC repositories.

Next Steps ​

  • Adapt additional repositories to validate the approach and catch problems early.
  • Gather feature requests based on newfound use cases.

βœ… Diki as a Service ​

Tracking: hackathon#24

Problem Statement ​

Diki compliance checks should be schedulable, exportable, and operable as a service rather than one-off CLI runs.

Achievements ​

  • Merged previous work (first PoC and diki-exporter) into a working operator: hackathon-poc branch.
  • Implemented Postgres exporter in diki-exporter.
  • Made the operator capable of running in a different cluster than the ComplianceScans (needed for seed/shoot-namespace topology).
  • PoC'ed ScheduledComplianceScans β€” spawns scans based on a cron expression.

Next Steps ​

  • More testing and cleanup.
  • Pour the work into a Gardener extension.

🧩 Extension: Generic Shoot Pack (CloudNativePG et al.) ​

Tracking: hackathon#19

Problem Statement ​

Installing upstream operators into shoot clusters requires repetitive per-operator extension development. A generic packaging mechanism would reduce this overhead.

Achievements ​

  • Developed gardener-extension-shoot-pack β€” a generic Gardener extension that uses package specifications to install operators as managed resources.
  • Ships packages for: cert-manager, CloudNativePG, Prometheus Operator, and Valkey Operator.
  • Tooling available to inspect, view, and create new package specs.

Next Steps ​

  • Clean things up.
  • Add more tests.

πŸͺ£ Fix Leaking ValidatingWebhookConfigurations in (Virtual-)Garden ​

Tracking: hackathon#21

Problem Statement ​

When deploying extensions via gardener-operator using extensions.operator.gardener.cloud resources, ValidatingWebhookConfigurations remain in the virtual-garden cluster even after removing the extension. The root cause: the --webhook-config-owner-namespace option defaults to garden namespace, preventing proper garbage collection.

Achievements ​

Next Steps ​

  • Apply the --webhook-config-owner-namespace option to each affected extension's admission deployment.

Code & Pull Requests ​

πŸ“‘ Resolve the Istio Metrics Leak ​

Tracking: hackathon#12

Problem Statement ​

Istio sidecar metrics for terminated pods accumulate indefinitely, leading to unbounded cardinality in Prometheus.

Achievements ​

  • Configured Istio metric rotation via environment variables (METRIC_ROTATION_INTERVAL, METRIC_GRACEFUL_DELETION_INTERVAL).
  • Verified correct behavior: after rotation interval, old pod metrics disappear and new pod metrics appear; long-lived connection metrics reset correctly (counters restart from 0, compatible with PromQL rate functions).
  • Fixed duplicate scraping caused by two Istio services (istio-gateway LoadBalancer and istio-gateway-internal ClusterIP) matching the same ServiceMonitor label selector.

Next Steps ​

  • The env-var approach is deprecated from Istio v1.28+.
  • Migration to annotation-based configuration (SidecarStatsEvictionInterval) will be needed when upgrading beyond v1.27.

Code & Pull Requests ​


Rooftop view from the SAP Center Sofia towards Vitosha mountain


ApeiroRA – Funded by the European Union NextGenerationEU