Resource Upgrade Control at Edge

Abstract

In edge computing scenarios such as drones, robotics, and autonomous vehicles, uncontrolled resource upgrades can cause serious safety and operational issues. The hold-and-release mechanism allows edge administrators to control when upgrades to edge resources occur, ensuring that resources cannot be upgraded without explicit confirmation from the edge.

This feature enables you to:

Hold resource upgrades at the edge using annotations edge.kubeedge.io/hold-upgrade: "true"
Release the hold only when the edge system is in a safe state
Maintain operational safety for critical edge applications

Hold and Release Mechanism

Hold Upgrades: When a Pod with edge.kubeedge.io/hold-upgrade: "true" annotation is created (during an upgrade), the edge node's edged component intercepts it and stores it in an internal queue instead of starting the Pod.
Status Reporting: A HeldUpgrade condition is added to the Pod status and reported to the cloud, indicating that the upgrade is intentionally held.
Persistence: If the edge node restarts, held Pod upgrades are automatically recovered from MetaManager and remain in the held state.
Release: When an unhold command is issued (via keadm ctl or API), the cached Pod upgrade is released to the container runtime, and the Pod starts normally.

See more details at Design Proposal for Resource Upgrade Control at Edge.

Use cases

Robotics / Robots
An automatic resource update in the middle of the operation or actuation could interrupt motion, possibly causing actuator lock-up or crash, production halt, safety hazard to nearby human operators. Robot system signals when the actuators are idle and in safe pose, only then does edged apply the updated container to ensure updates occur during defined maintenance windows or pause states.
Autonomous Car / AMR / AGV
If a resource update restarts the perception or control module mid-navigation, vehicle may stop unexpectedly, possibly causing risk of collision or failure to navigate, loss of customer trust and service reliability. The local system inside the vehicle controls when it is parked or charging, it toggles the flag or sends a signal to enable the resource update only when the car system is ready to do so. This ensures zero disruption to the driving session.
Drone / Aerospace
If the update hits mid-flight, the pod restart disconnects the telemetry stream or flight control interface. This could possibly trigger emergency landing or flyaway condition, the worst case is crash down on the ground. Edge device onboard drone (e.g. PX4 companion computer) knows flight state always, then signals when landed or in safe altitude hold mode. Resource updates only can be applied to post-flight or during downtime.
Edge AI / Machine Learning
Controls when model upgrades are applied to prevent inference disruption. Allows model validation before deployment. Ensures smooth transition between model versions.

Getting Started

Prerequisites

Before using annotations edge.kubeedge.io/hold-upgrade: "true", ensure you have:

KubeEdge v1.22.0 or later installed
MetaServer component running on edge nodes (required for unhold operations)
kubectl configured to access your Kubernetes cluster
(recommended) keadm CLI tool available on edge nodes
Edge nodes running and connected to the cloud

Deploy Resource

First, create a Deployment without the hold-upgrade annotation.

Create a file named busybox-deployment.yaml:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox-deployment
  labels:
    app: busybox
spec:
  replicas: 1
  selector:
    matchLabels:
      app: busybox
  template:
    metadata:
      labels:
        app: busybox
    spec:
      hostNetwork: true
      containers:
      - name: busybox
        image: busybox:1.36
        command: ["sh", "-c", "echo Running && sleep 3600"]
        imagePullPolicy: IfNotPresent

Apply the deployment:

kubectl apply -f busybox-deployment.yaml

Verify the deployment is running:

kubectl get pod -A -o wide

You should see the pod running on your edge node:

$ kubectl get pod -A -o wide
NAMESPACE            NAME                                         READY   STATUS    RESTARTS   AGE   IP              NODE                 NOMINATED NODE   READINESS GATES
default              busybox-deployment-74f696fddd-5mjxp          1/1     Running   0          10s   192.168.0.138   edge-node            <none>           <none>

Upgrade Resource and Hold

Now let's update the Deployment to use hold-upgrade annotation. The upgrade will be held at the edge due to the edge.kubeedge.io/hold-upgrade: "true" annotation.

apiVersion: apps/v1
kind: Deployment
metadata:
  name: busybox-deployment
  labels:
    app: busybox
spec:
  replicas: 1
  selector:
    matchLabels:
      app: busybox
  template:
    metadata:
      labels:
        app: busybox
      annotations:
        edge.kubeedge.io/hold-upgrade: "true"  # Added hold-upgrade annotations
    spec:
      hostNetwork: true
      containers:
      - name: busybox
        image: busybox:1.37 # upgrade to new version
        command: ["sh", "-c", "echo Running && sleep 3600"]
        imagePullPolicy: IfNotPresent

Unhold Resource to Upgrade

When the edge system is ready to accept the upgrade (e.g., drone has landed, robot is idle, vehicle is parked), you can release the hold using one of the following methods:

Be advised that the following commands need to be issued on the edge node system.

`keadm ctl unhold-upgrade` (recommended)

This unholds all the upgrades in the specified node.

keadm ctl unhold-upgrade node edge-node

`curl` command

If you want to unhold the upgrade on a specific pod:

curl -X POST \
  --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
  -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
  -H "Content-Type: text/plain" \
  --data "default/busybox-deployment-74f696fddd-5mjxp" \
  https://127.0.0.1:10550/api/v1/pods/unhold-upgrade

Or if you want to unhold all the upgrades in the node:

curl -X POST \
  --cacert /var/run/secrets/kubernetes.io/serviceaccount/ca.crt \
  -H "Authorization: Bearer $(cat /var/run/secrets/kubernetes.io/serviceaccount/token)" \
  -H "Content-Type: text/plain" \
  --data "edge-node" \
  https://127.0.0.1:10550/api/v1/nodes/unhold-upgrade

After releasing the hold, the new Pod will start running and the old Pod will be terminated:

$ kubectl get pod -A -o wide
NAMESPACE            NAME                                         READY   STATUS        RESTARTS   AGE     IP              NODE                 NOMINATED NODE   READINESS GATES
default              busybox-deployment-6dfb445476-5gpdv          1/1     Running       0          6m4s    192.168.0.138   edge-node            <none>           <none>
default              busybox-deployment-74f696fddd-5mjxp          1/1     Terminating   0          8m52s   192.168.0.138   edge-node            <none>           <none>

Supported Resource Types

The hold and release mechanism supports the following Kubernetes resources:

Resource	Supported	Description / Use Case
Pods	✅ Yes	Primary unit of runtime workload. Direct Pod upgrades can be held.
Deployments	✅ Yes	Pod upgrades triggered by Deployment changes are held at the edge.
StatefulSets	✅ Yes	Stateful services upgrades are held to prevent data loss or service interruption.
DaemonSets	✅ Yes	Upgrades to edge agents and system daemons can be controlled.

Appendix

Unpredictable Upgrade Order

In hybrid environments with both cloud and edge nodes, the order in which Kubernetes selects Pods (and thus nodes) for upgrade is not predictable. The default rolling update strategy might lead to unexpected behavior if certain edge nodes hold upgrades while cloud nodes proceed normally. Therefore, it's essential for users to tune the rollingUpdate parameters appropriately to avoid upgrade bottlenecks and better align with custom upgrade control mechanisms like the hold-and-release feature.

Abstract​

Use cases​

Getting Started​

Prerequisites​

Deploy Resource​

Upgrade Resource and Hold​

Unhold Resource to Upgrade​

keadm ctl unhold-upgrade (recommended)​

curl command​

Supported Resource Types​

Appendix​

Unpredictable Upgrade Order​