跳到主要内容

17 篇博文 含有标签「kubernetes edge computing」

查看所有标签

· 阅读需 15 分钟

北京时间2024年1月23日,KubeEdge发布1.16版本。新版本新增多个增强功能,在集群升级、集群易用性、边缘设备管理等方面均有大幅提升。

KubeEdge v1.16 新增特性:

新特性概览

集群升级:支持云边组件自动化升级

随着KubeEdge社区的持续发展,社区版本不断迭代;用户环境版本升级的诉求亟需解决。针对升级步骤难度大,边缘节点重复工作多的问题,v1.16.0版本的 KubeEdge 支持了云边组件的自动化升级。用户可以通过Keadm工具一键化升级云端,并且可以通过创建相应的Kubernetes API,批量升级边缘节点。

  • 云端升级

    云端升级指令使用了三级命令与边端升级进行了区分,指令提供了让用户使用更便捷的方式来对云端的KubeEdge组件进行升级。当前版本升级完成后会打印ConfigMap历史配置,如果用户手动修改过ConfigMap,用户可以选择通过历史配置信息来还原配置文件。我们可以通过help参数查看指令的指导信息:

    keadm upgrade cloud --help
    Upgrade the cloud components to the desired version, it uses helm to upgrade the installed release of cloudcore chart, which includes all the cloud components

    Usage:
    keadm upgrade cloud [flags]

    Flags:
    --advertise-address string Please set the same value as when you installed it, this value is only used to generate the configuration and does not regenerate the certificate. eg: 10.10.102.78,10.10.102.79
    -d, --dry-run Print the generated k8s resources on the stdout, not actual execute. Always use in debug mode
    --external-helm-root string Add external helm root path to keadm
    --force Forced upgrading the cloud components without waiting
    -h, --help help for cloud
    --kube-config string Use this key to update kube-config path, eg: $HOME/.kube/config (default "/root/.kube/config")
    --kubeedge-version string Use this key to set the upgrade image tag
    --print-final-values Print the final values configuration for debuging
    --profile string Sets profile on the command line. If '--values' is specified, this is ignored
    --reuse-values reuse the last release's values and merge in any overrides from the command line via --set and -f.
    --set stringArray Sets values on the command line (can specify multiple or separate values with commas: key1=val1,key2=val2)
    --values stringArray specify values in a YAML file (can specify multiple)

    升级指令样例:

    keadm upgrade cloud --advertise-address=<init时设置的值> --kubeedge-version=v1.16.0
  • 边端升级

    v1.16.0版本的KubeEdge支持通过NodeUpgradeJob的Kubernetes API进行边缘节点的一键化、批量升级。API支持边缘节点的升级预检查、并发升级、失败阈值、超时处理等功能。对此,KubeEdge支持了云边任务框架。社区开发者将无需关注任务控制、状态上报等逻辑实现,只需聚焦云边任务功能本身。

    升级API样例:

    apiVersion: operations.kubeedge.io/v1alpha1
    kind: NodeUpgradeJob
    metadata:
    name: upgrade-example
    labels:
    description: upgrade-label
    spec:
    version: "v1.16.0"
    checkItems:
    - "cpu"
    - "mem"
    - "disk"
    failureTolerate: "0.3"
    concurrency: 2
    timeoutSeconds: 180
    labelSelector:
    matchLabels:
    "node-role.kubernetes.io/edge": ""
    node-role.kubernetes.io/agent: ""
  • 兼容测试

    KubeEdge社区提供了完备了版本兼容性测试,用户在升级时仅需要保证云边版本差异不超过2个版本,就可以避免升级期间云边版本不一致带来的问题。

更多信息可参考:

https://github.com/kubeedge/kubeedge/pull/5330 https://github.com/kubeedge/kubeedge/pull/5229 https://github.com/kubeedge/kubeedge/pull/5289

支持边缘节点的镜像预下载

新版本引入了镜像预下载新特性,用户可以通过ImagePrePullJob的Kubernetes API提前在边缘节点上加载镜像,该特性支持在批量边缘节点或节点组中预下载多个镜像,帮助减少加载镜像在应用部署或更新过程,尤其是大规模场景中,带来的失败率高、效率低下等问题。

镜像预下载API示例:

apiVersion: operations.kubeedge.io/v1alpha1
kind: ImagePrePullJob
metadata:
name: imageprepull-example
labels:
description:ImagePrePullLabel
spec:
imagePrePullTemplate:
images:
- image1
- image2
nodes:
- edgenode1
- edgenode2
checkItems:
- "disk"
failureTolerate: "0.3"
concurrency: 2
timeoutSeconds: 180
retryTimes: 1

更多信息可参考:

https://github.com/kubeedge/kubeedge/pull/5310 https://github.com/kubeedge/kubeedge/pull/5331

支持使用Keadm安装Windows边缘节点

KubeEdge 1.15版本实现了在Windows上运行边缘节点,在新版本中,我们支持使用安装工具Keadm直接安装Windows边缘节点,操作命令与Linux边缘节点相同,简化了边缘节点的安装步骤。

更多信息可参考:

https://github.com/kubeedge/kubeedge/pull/4968

增加多种容器运行时的兼容性测试

新版本中新增了多种容器运行时的兼容性测试,目前已集成了containerddockerisuladcri-o 4种主流容器运行时,保障KubeEdge版本发布质量,用户在安装容器运行时过程中也可以参考该PR中的适配安装脚本。

更多信息可参考:

https://github.com/kubeedge/kubeedge/pull/5321

EdgeApplication中支持更多Deployment对象字段的Override

在新版本中,我们扩展了EdgeApplication中的差异化配置项(overriders),主要的扩展有环境变量、命令参数和资源。当您不同区域的节点组环境需要链接不同的中间件时,就可以使用环境变量(env)或者命令参数(command, args)去重写中间件的链接信息。或者当您不同区域的节点资源不一致时,也可以使用资源配置(resources)去重写cpu和内存的配置。

更多信息可参考:

https://github.com/kubeedge/kubeedge/pull/5262 https://github.com/kubeedge/kubeedge/pull/5370

支持基于Mapper-Framework的Mapper升级

1.16版本中,基于Mapper开发框架Mapper-Framework构建了Mapper组件的升级能力。新框架生成的Mapper工程以依赖引用的方式导入原有Mapper-Framework的部分功能,在需要升级时,用户能够以升级依赖版本的方式完成,简化Mapper升级流程。

  • Mapper-Framework代码解耦:

    1.16版本中将Mapper-Framework中的代码解耦为用户层和业务层。用户层功能包括设备驱动及与之强相关的部分管理面数据面能力,仍会随Mapper-Framework生成在用户Mapper工程中,用户可根据实际情况修改。业务层功能包括Mapper向云端注册、云端下发Device列表等能力,会存放在kubeedge/mapper-framework子库中。

  • Mapper升级框架:

    1.16版本Mapper-Framework生成的用户Mapper工程通过依赖引用的方式使用kubeedge/mapper-framework子库中业务层功能,实现完整的设备管理功能。后续用户能够通过升级依赖版本的方式达到升级Mapper的目的,不再需要手动修改大范围代码。

更多信息可参考:

https://github.com/kubeedge/kubeedge/pull/5308 https://github.com/kubeedge/kubeedge/pull/5326

DMI数据面内置集成Redis与TDEngine数据库

1.16版本中进一步增强DMI数据面中向用户数据库推送数据的能力,增加Redis与TDengine数据库作为内置数据库。用户能够直接在device-instance配置文件中定义相关字段,实现Mapper自动向Redis与TDengine数据库推送设备数据的功能,相关数据库字段定义为:

type DBMethodRedis struct {
// RedisClientConfig of redis database
// +optional
RedisClientConfig *RedisClientConfig `json:"redisClientConfig,omitempty"`
}
type RedisClientConfig struct {
// Addr of Redis database
// +optional
Addr string `json:"addr,omitempty"`
// Db of Redis database
// +optional
DB int `json:"db,omitempty"`
// Poolsize of Redis database
// +optional
Poolsize int `json:"poolsize,omitempty"`
// MinIdleConns of Redis database
// +optional
MinIdleConns int `json:"minIdleConns,omitempty"`
}
type DBMethodTDEngine struct {
// tdengineClientConfig of tdengine database
// +optional
TDEngineClientConfig *TDEngineClientConfig `json:"TDEngineClientConfig,omitempty"`
}
type TDEngineClientConfig struct {
// addr of tdEngine database
// +optional
Addr string `json:"addr,omitempty"`
// dbname of tdEngine database
// +optional
DBName string `json:"dbName,omitempty"`
}

更多信息可参考:

https://github.com/kubeedge/kubeedge/pull/5064

基于Mapper-Framework的USB-Camera Mapper实现

基于KubeEdge的Mapper-Framework,新版本提供了USB-Camera的Mapper样例,该Mapper根据USB协议的Camera开发,用户可根据该样例和Mapper-Framework更轻松地开发具体业务相关的Mapper。

在样例中提供了helm chart包,用户可以通过修改usbmapper-chart/values.yaml部署UBS-Camera Mapper,主要添加USB-Camera的设备文件, nodeName, USB-Camera的副本数,其余配置修改可根据具体情况而定,通过样例目录中的Dockerfile制作Mapper镜像。

global:
replicaCounts:
......
cameraUsbMapper:
replicaCount: 2 #USB-Camera的副本数
namespace: default
......
nodeSelectorAndDevPath:
mapper:
- edgeNode: "edgenode02" #USB-Camera连接的缘节点nodeName
devPath: "/dev/video0" #USB-Camera的设备文件
- edgeNode: "edgenode1"
devPath: "/dev/video17"
......

USB-Camera Mapper的部署命令如下:

helm install usbmapper-chart ./usbmapper-chart

更多信息可参考:

https://github.com/kubeedge/mappers-go/pull/122

易用性提升:基于Keadm的部署能力增强

  • 添加云边通信协议配置参数

    在KubeEdge v1.16.0中,使用keadm join边缘节点时,支持使用--hub-protocol配置云边通信协议。目前KubeEdge支持websocket和quic两种通信协议,默认为websocket协议。

    命令示例:

    keadm join --cloudcore-ipport <云节点ip>:10001 --hub-protocol=quic --kubeedge-version=v1.16.0 --token=xxxxxxxx

    说明:当--hub-protocol设置为quic时,需要将--cloudcore-ipport的端口设置为10001,并需在CloudCore的ConfigMap中打开quic开关,即设置modules.quic.enable为true。

    操作示例:使用kubectl edit cm -n kubeedge cloudcore,将quic的enable属性设置成true,保存修改后重启CloudCore的pod。

    modules:
    ......
    quic:
    address: 0.0.0.0
    enable: true #quic协议开关
    maxIncomingStreams: 10000
    port: 10001

    更多信息可参考:

    https://github.com/kubeedge/kubeedge/pull/5156

  • keadm join与CNI插件解耦

    在新版本中,keadm join边缘节点时,不需要再提前安装CNI插件,已将边缘节点的部署与CNI插件解耦。同时该功能已同步到v1.12及更高版本,欢迎用户使用新版本或升级老版本。

    说明:如果部署在边缘节点上的应用程序需要使用容器网络,则在部署完edgecore后仍然需要安装CNI插件。

    更多信息可参考:

    https://github.com/kubeedge/kubeedge/pull/5196

升级K8s依赖到v1.27

新版本将依赖的Kubernetes版本升级到v1.27.7,您可以在云和边缘使用新版本的特性。

更多信息可参考:

https://github.com/kubeedge/kubeedge/pull/5121

版本升级注意事项

新版本我们使用DaemonSet来管理边端的MQTT服务Eclipse Mosquitto了,我们能够通过云端Helm Values配置来设置是否要开启MQTT服务。使用DaemonSet管理MQTT后,我们可以方便的对边端MQTT进行统一管理,比如我们可以通过修改DaemonSet的配置将边端MQTT替换成EMQX。

但是如果您是从老版本升级到最新版本,则需要考虑版本兼容问题,同时使用原本由静态Pod管理的MQTT和使用新的DaemonSet管理的MQTT会产生端口冲突。兼容操作步骤参考:

  1. 您可以在云端执行命令,将旧的边缘节点都打上自定义标签
kubectl label nodes --selector=node-role.kubernetes.io/edge without-mqtt-daemonset=""
  1. 您可以修改MQTT DaemonSet的节点亲和性
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- ...
- key: without-mqtt-daemonset
operator: Exists
  1. 将节点MQTT改为由DaemonSet管理
# ------ 边端 ------
# 修改/lib/systemd/system/edgecore.service,将环境变量DEPLOY_MQTT_CONTAINER设置成false
# 这步可以放在更新EdgeCore前修改,这样就不用重启EdgeCore了
sed -i '/DEPLOY_MQTT_CONTAINER=/s/true/false/' /etc/systemd/system/edgecore.service

# 停止EdgeCore
systemctl daemon-reload && systemctl stop edgecore

# 删除MQTT容器,Containerd可以使用nerdctl替换docker
docker ps -a | grep mqtt-kubeedge | awk '{print $1}' | xargs docker rm -f

# 启动EdgeCore
systemctl start edgecore

# ------ 云端 ------
# 删除节点标签
kubectl label nodes <NODE_NAME> without-mqtt-daemonset

新版本的keadm join命令会隐藏with-mqtt参数,并且将默认值设置成false,如果您还想使用静态Pod管理MQTT,您仍然可以设置参数--with-mqtt来使其生效。with-mqtt参数在v1.18版本中将会被移除。

· 阅读需 13 分钟
Wack Xu

Abstract

The population of KubeEdge brings in community interests in the scalability and scale of KubeEdge. Now, Kubernetes clusters powered by KubeEdge, as fully tested, can stably support 100,000 concurrent edge nodes and manage more than one million pods. This report introduces the metrics used in the test, the test procedure, and the method to connect to an ocean of edge nodes.

Background

Fast growing technologies, such as 5G networks, industrial Internet, and AI, are giving edge computing an important role in driving digital transformation. Cities, transportation, healthcare, manufacturing, and many other fields are becoming smart thanks to edge computing. According to Gartner, by 2023, the number of intelligent edge devices may be more than 20 times that of traditional IT devices. By 2028, the embedding of sensors, storage, computing, and advanced AI functions in edge devices will grow steadily. IoT devices are of various types and in large quantities. The increasing connected IoT devices are challenging management and O&M.

At the same time, users in the KubeEdge community are expecting large-scale edge deployment. There are already some successful use cases for KubeEdge. In unmanned toll stations across China, there are nearly 100,000 edge nodes and more than 500,000 edge applications in this project, and the numbers keep growing. Another case is a vehicle-cloud collaboration platform, the industry-first cloud-edge-device system. It enables fast software upgrade and iteration for software-defined vehicles. On this platform, each vehicle is connected as an edge node, and the number of edge nodes will reach millions.

Introduction to KubeEdge

KubeEdge is the industry's first cloud native edge computing framework designed for edge-cloud collaboration. Complementing Kubernetes for container orchestration and scheduling, KubeEdge allows applications, resources, data, and devices to collaborate between edges and the cloud. Devices, edges, and the cloud are now fully connected in edge computing.

In the KubeEdge architecture, the cloud is a unified control plane, which includes native Kubernetes management components and KubeEdge-developed CloudCore components. It listens to cloud resource changes and provides reliable, efficient cloud-edge messaging. At the edge side lie the EdgeCore components, including Edged, MetaManager, and EdgeHub. They receive messages from the cloud and manage the lifecycle of containers. The device mapper and event bus are responsible for device access.

kubeedge-arch

Based on the Kubernetes control plane, KubeEdge allows nodes to be deployed more remotely and thereby extends edge-cloud collaboration. Kubernetes supports 5,000 nodes and 150,000 pods, which are far from enough for edge computing in Internet of Everything (IoE). The access of a large number of edge devices demands a scalable, centralized edge computing platform. To help users cost less and manage more in an easier way, KubeEdge, fully compatible with Kubernetes, optimizes the cloud-edge messaging and provides access support for mass edge nodes.

SLIs/SLOs

Scalability and performance are important features of Kubernetes clusters. Before performing the large-scale performance test, we need to define the measurement metrics. The Kubernetes community defines the following SLIs (Service Level Indicators) and SLOs (Service Level Objectives) to measure the cluster service quality.

  1. API Call Latency
StatusSLISLO
OfficialLatency of mutating API calls for single objects for every (resource, verb) pair, measured as 99th percentile over last 5 minutesIn default Kubernetes installation, for every (resource, verb) pair, excluding virtual and aggregated resources and Custom Resource Definitions, 99th percentile per cluster-day <= 1s
OfficialLatency of non-streaming read-only API calls for every (resource, scope) pair, measured as 99th percentile over last 5 minutesIn default Kubernetes installation, for every (resource, scope) pair, excluding virtual and aggregated resources and Custom Resource Definitions, 99th percentile per cluster-day: (a) <= 1s if scope=resource (b) <= 30s5 otherwise (if scope=namespace or scope=cluster)
  1. Pod Startup Latency
StatusSLISLO
OfficialStartup latency of schedulable stateless pods, excluding time to pull images and run init containers, measured from pod creation timestamp to when all its containers are reported as started and observed via watch, measured as 99th percentile over last 5 minutesIn default Kubernetes installation, 99th percentile per cluster-day <= 5s
WIPStartup latency of schedulable stateful pods, excluding time to pull images, run init containers, provision volumes (in delayed binding mode) and unmount/detach volumes (from previous pod if needed), measured from pod creation timestamp to when all its containers are reported as started and observed via watch, measured as 99th percentile over last 5 minutesTBD

The community also defines indicators such as in-cluster network programming latency (latency for Service updates or changes in ready pods to be reflected to iptables/IPVS rules), in-cluster network latency, DNS programming latency (latency for Service updates or changes in ready pods to be reflected to the DNS server), and DNS latency. These indicators have not yet been quantified. This test was conducted to satisfy all SLIs/SLOs in the official state.

Kubernetes Scalability Dimensions and Thresholds

Kubernetes scalability does not just mean the number of nodes (Scalability != #Nodes). Other dimensions include the number of namespaces, pods, Services, secrets, and ConfigMaps. Configurations that Kubernetes supports create the Scalability Envelope (which keeps evolving):

k8s-scalability

Obviously, it is impossible for a Kubernetes cluster to expand resource objects without limitation while satisfying SLIs/SLOs. Therefore, the industry defines the upper limits of Kubernetes resource objects.

1. Pods/node 30
2. Backends <= 50k & Services <= 10k & Backends/service <= 250
3. Pod churn 20/s
4. Secret & configmap/node 30
5. Namespaces <= 10k & Pods <= 150k & Pods/namespace <= 3k
6. ​ …..

Dimensions are sometimes not independent. As you move farther along one dimension, your cross-section wrt other dimensions gets smaller. For example, if 5000 nodes are expanded to 10,000 nodes, the specifications of other dimensions will be affected. A heavy workload is required if all scenarios are tested. In this test, we focus on the typical scenarios. We manage to host 100k edge nodes and 1000k pods in a single cluster while satisfying the SLIs/SLOs.

Test Tools

ClusterLoader2

ClusterLoader2 is an open source Kubernetes cluster performance test tool. It can test the Kubernetes SLIs/SLOs to check whether the cluster meets the service quality standards. It also visualizes data for locating cluster problems and optimizing cluster performance. After the test, users get a performance report with detailed test results.

Clusterloader2 performance metrics:

  • APIResponsivenessPrometheusSimple
  • APIResponsivenessPrometheus
  • CPUProfile
  • EtcdMetrics
  • MemoryProfile
  • MetricsForE2E
  • PodStartupLatency
  • ResourceUsageSummary
  • SchedulingMetrics
  • SchedulingThroughput
  • WaitForControlledPodsRunning
  • WaitForRunningPods

Edgemark

Edgemark is a performance test tool similar to Kubemark. It simulates deploying KubeEdge edge nodes in the KubeEdge cluster scalability test to build ultra-large Kubernetes clusters, powered by KubeEdge, with limited resources. The objective is to expose the cluster control plane downsides that occur only in large-scale deployments. The following figure illustrates the Edgemark deployment:

edgemark-deploy

  • K8s master: the master node of the Kubernetes cluster
  • Edgemark master: the master node of the simulated Kubernetes cluster
  • CloudCore: the KubeEdge cloud management component, which is responsible for edge node access
  • hollow pod: a pod started in the actual cluster. It registers with the Edgemark master as a virtual edge node by starting Edgemark in it. The Edgemark master can schedule pods to this virtual edge node.
  • hollow edgeNode: a virtual node in the simulated cluster, registered from a hollow pod

Cluster Deployment Scheme for the Test

deploy

The Kubernetes control plane is deployed with one master node. The etcd, kube-apiserver, kube-scheduler, and kube-controller are deployed as single-instance. The KubeEdge control plane is deployed with five CloudCore instances and connects to the kube-apiserver through the IP address of the master node. Hollow EdgeNodes are exposed by a load balancer and randomly connect to a CloudCore instance based on the round-robin policy of the load balancer.

Test Environment Information

Control Plane OS Version

CentOS 7.9 64bit 3.10.0-1160.15.2.el7.x86_64

Kubernetes Version

Major:"1", Minor:"23", GitVersion:"v1.23.4", GitCommit:"e6c093d87ea4cbb530a7b2ae91e54c0842d8308a", GitTreeState:"clean", BuildDate:"2022-02-16T12:38:05Z", GoVersion:"go1.17.7", Compiler:"gc", Platform:"linux/amd64"

KubeEdge Version

KubeEdge v1.11.0-alpha.0

Master Node Configurations

  • CPU
Architecture:          x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Thread(s) per core: 2
Core(s) per socket: 32
Socket(s): 2
NUMA node(s): 2
Vendor ID: GenuineIntel
CPU family: 6
Model: 106
Model name: Intel(R) Xeon(R) Platinum 8378A CPU @ 3.00GHz
Stepping: 6
CPU MHz: 2999.998
  • MEMORY
Total online memory:   256G
  • ETCD DISK
Type:   SAS_SSD
Size: 300GB

CloudCore Node Configurations

  • CPU
Architecture:          x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 106
Model name: Intel(R) Xeon(R) Platinum 8378A CPU @ 3.00GHz
Stepping: 6
CPU MHz: 2999.998
  • MEMORY
Total online memory:   48G

Component Parameter Configurations

1. kube-apiserver

--max-requests-inflight=2000
--max-mutating-requests-inflight=1000

2. kube-controller-manager

--kube-api-qps=100
--kube-api-burst=100

3. kube-scheduler

--kube-api-qps=200
--kube-api-burst=400

4. CloudCore

apiVersion: cloudcore.config.kubeedge.io/v1alpha1
kind: CloudCore
kubeAPIConfig:
kubeConfig: ""
master: ""
qps: 60000
burst: 80000
modules:
cloudHub:
advertiseAddress:
- xx.xx.xx.xx
nodeLimit: 30000
tlsCAFile: /etc/kubeedge/ca/rootCA.crt
tlsCertFile: /etc/kubeedge/certs/server.crt
tlsPrivateKeyFile: /etc/kubeedge/certs/server.key
unixsocket:
address: unix:///var/lib/kubeedge/kubeedge.sock
enable: false
websocket:
address: 0.0.0.0
enable: true
port: 10000
cloudStream:
enable: false
deviceController:
enable: false
dynamicController:
enable: false
edgeController:
buffer:
configMapEvent: 102400
deletePod: 10240
endpointsEvent: 1
podEvent: 102400
queryConfigMap: 10240
queryEndpoints: 1
queryNode: 10240
queryPersistentVolume: 1
queryPersistentVolumeClaim: 1
querySecret: 10240
queryService: 1
queryVolumeAttachment: 1
ruleEndpointsEvent: 1
rulesEvent: 1
secretEvent: 1
serviceEvent: 10240
updateNode: 15240
updateNodeStatus: 30000
updatePodStatus: 102400
enable: true
load:
deletePodWorkers: 5000
queryConfigMapWorkers: 1000
queryEndpointsWorkers: 1
queryNodeWorkers: 5000
queryPersistentVolumeClaimWorkers: 1
queryPersistentVolumeWorkers: 1
querySecretWorkers: 1000
queryServiceWorkers: 1
queryVolumeAttachmentWorkers: 1
updateNodeStatusWorkers: 10000
updateNodeWorkers: 5000
updatePodStatusWorkers: 20000
ServiceAccountTokenWorkers: 10000
nodeUpdateFrequency: 60
router:
enable: false
syncController:
enable: true

Density Test

Test Execution

Before using ClusterLoader2 to perform the performance test, we defined the test policy using the configuration file. In this test, we used the official Kubernetes density case. The configuration file we used can be obtained here:

https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/testing/density/config.yaml

The following table describes the detailed Kubernetes resource configurations:

Maximum typeMaximum value
Number of Nodes100,000
Number of Pods1,000,000
Number of Pods per node10
Number of Namespaces400
Number of Pods per Namespace2,500

For details about the test method and procedure, see the following links:

https://github.com/kubeedge/kubeedge/tree/master/build/edgemark

https://github.com/kubernetes/perf-tests/blob/master/clusterloader2/docs/GETTING_STARTED.md

Test Results

APIResponsivenessPrometheusSimple

  1. mutating API latency(threshold=1s):

    mutating-api-latency

  2. Read-only API call latency(scope=resource, threshold=1s)

    read-only-api-call-resource

  3. Read-only API call latency(scope=namespace, threshold=5s)

    read-only-api-call-namespace

  4. Read-only API call latency(scope=cluster, threshold=30s)

    read-only-api-call-cluster

PodStartupLatency

metricp50(ms)p90(ms)p99(ms)SLO(ms)
pod_startup1688275140875000
create_to_schedule001000N/A
schedule_to_run100010001000N/A
run_to_watch108716742265N/A
schedule_to_watch165727243070N/A

Note: Theoretically, the latency should always be greater than 0. Because kube-apiserver does not support RFC339NANO, the timestamp precision can only be seconds. Therefore, when the latency is low, some values collected by ClusterLoader2 are 0 due to precision loss.

Conclusion and Analysis

From the preceding test results, the API call latency and pod startup latency meet the SLIs/SLOs defined by the Kubernetes community. Therefore, the KubeEdge-powered Kubernetes clusters can stably support 100,000 concurrent edge nodes and more than one million pods. In production, the network between edge nodes and the cloud is connected according to O&M requirements due to reasons such as network security and partition management. Therefore, the number of edge nodes that can be managed by a single cluster can increase proportionally based on the proportion of offline edge nodes to online ones. In addition, data fragmentation is used on the Kubernetes control plane to store different resources to the corresponding etcd space, which allows for a larger service deployment scale.

KubeEdge's Support for Large-Scale Edge Node Access

1. Efficient Cloud-Edge Messaging

List-watch is a unified mechanism for asynchronous messaging of Kubernetes components. The list operation calls the list API of a resource to obtain full resource data through non-persistent HTTP connections. The watch operation calls the watch API of a resource to monitor resource change events and obtain incremental change data through persistent HTTP connections and block-based transmission encoding. In Kubernetes, in addition to the list-watch of a node, pods allocated to the node, and full service metadata, kubelet must also watch (by default) the running pods mounted with secrets and ConfigMaps as data volumes. The number of list-watch operations could explode with increasing nodes and pods, which heavily burdens kube-apiserver.

KubeEdge uses the two-way multiplexing edge-cloud message channel and supports the WebSocket (default) and QUIC protocols. EdgeCore at the edge initiates a connection request to CloudCore on the cloud. CloudCore list-watches Kubernetes resource changes, and delivers metadata to the edge through this two-way channel. EdgeCore uploads the metadata, such as edge node status and application status, to CloudCore through this channel. CloudCore reports the received metadata to kube-apiserver.

CloudCore aggregates the upstream and downstream data. kube-apiserver processes only several list-watch requests from CloudCore. It can be effectively unburdened and the cluster performance gets improved.

Memory usage when the native Kubernetes kube-apiserver is used under the same node and pod scales:

kube-apiserver-usage

Memory usage when kube-apiserver is used in a KubeEdge-powered Kubernetes cluster:

kubeedge-kube-apiserver-usage

2. Reliable Incremental Cloud-Edge Data Transmission

In the case of complex edge network topology or poor networking quality, cloud-edge communication may be compromised by high network latency, intermittent/frequent disconnection, and other issues. When the network recovers and edge nodes want to reconnect to the cloud, a large number of full list requests will be generated, pressuring kube-apiserver. Large-scale deployments may amplify this challenge to system stability. To solve it, KubeEdge records the version of the metadata successfully sent to the edge. When the cloud-edge network is reconnected, the cloud sends incremental metadata starting from the recorded metadata version.

3. Lightweight Edge + Edge-Cloud Messaging Optimization

EdgeCore removes native kubelet features that are not used in edge deployments, such as in-tree volume and cloud-provider, trims the status information reported by nodes, and optimizes resource usage of edge agent software. EdgeCore can run with a minimum of 70 MB memory on edge devices whose memory is as minimum as 100 MB. The WebSocket channel, edge-cloud message combination, and data trim greatly reduces the communication pressure on the edge and cloud and the access pressure on the control plane. They ensure that the system can work properly even in the case of high latency and jitter.

When 100,000 edge nodes are connected, the number of ELB connections is 100,000.

connect-number

When 100,000 edge nodes and more than 1,000,000 pods are deployed, the inbound rate of the ELB network is about 3 MB/s, and the average uplink bandwidth to each edge node is about 0.25 kbit/s.

network

Next Steps

Targeted tests will be performed on edge devices, edge-cloud messaging, and edge service mesh. In addition, for some edge scenarios, such as large-scale node network disconnection and reconnection, high latency of edge networks, and intermittent disconnection, new SLIs/SLOs need to be introduced to measure the cluster service quality and perform large-scale tests.

· 阅读需 2 分钟
Vincent Lin

As the first cloud-native edge computing community, KubeEdge provides solutions for cloud-edge synergy and has been widely adopted in industries including Transportation, Energy, Internet, CDN, Manufacturing, Smart campus, etc. With the accelerated deployment of KubeEdge in this area based on cloud-edge synergy, the community will improve the security of KubeEdge continuously in cloud-native edge computing scenarios.

The KubeEdge community attaches great importance to security and has set up Sig Security and Security Team to design KubeEdge system security and quickly respond to and handle security vulnerabilities. To conduct a more comprehensive security assessment of the KubeEdge project, the KubeEdge community cooperates with Ada Logics Ltd. and The Open Source Technology Improvement Fund performed a holistic security audit of KubeEdge and output a security auditing report, including the security threat model and security issues related to the KubeEdge project. Thank you to experts Adam Korczynski and David Korczynski of Ada Logics for their professional and comprehensive evaluation of the KubeEdge project, which has important guiding significance for the security protection of the KubeEdge project. Thank you Amir Montazery and Derek Zimmer of OSTIF and Cloud Native Computing Foundation (CNCF) who helped with this engagement.

The discovered security issues have been fixed and patched to the latest three minor release versions (v1.11.1, v1.10.2, v1.9.4) by KubeEdge maintainers according to the kubeedge security policy. Security advisories have been published here.

For more details of the threat model and the mitigations, Please check KubeEdge Threat Model And Security Protection Analysis: https://github.com/kubeedge/community/tree/master/sig-security/sig-security-audit/KubeEdge-threat-model-and-security-protection-analysis.md.

References:

Audit report: https://github.com/kubeedge/community/tree/master/sig-security/sig-security-audit/KubeEdge-security-audit-2022.pdf

OSTIF Blogpost: https://ostif.org/our-audit-of-kubeedge-is-complete-multiple-security-issues-found-and-fixed

CNCF Blogpost:

KubeEdge Threat Model And Security Protection Analysis: https://github.com/kubeedge/community/tree/master/sig-security/sig-security-audit/KubeEdge-threat-model-and-security-protection-analysis.md

· 阅读需 4 分钟
Kevin Wang
Fei Xu

KubeEdge is an open source system extending native containerized application orchestration and device management to hosts at the Edge. It is built upon Kubernetes and provides core infrastructure support for networking, application deployment and metadata synchronization between cloud and edge. It also supports MQTT and allows developers to author custom logic and enable resource constrained device communication at the Edge.

On December 6th, the KubeEdge community is proud to announce the availability of KubeEdge 1.9. This release includes a major upgrade for Custom HTTP Request Routing from Edge to Cloud through ServiceBus for Applications, CloudCore run independently of the Kubernetes Master host and containerized deployment using Helm, EdgeMesh add tls and encryption security, and compiled into rpm package, which includes:

  • Custom HTTP Request Routing from Edge to Cloud through ServiceBus for Applications

  • CloudCore run independently of the Kubernetes Master host

  • EdgeMesh add tls and encryption security

  • Enhance the ease of use of EdgeMesh

  • Support containerized deployment of CloudCore using Helm

  • Support compiled into rpm package and installed on OS such as openEuler using yum package manager

  • 40+ bug fixes and enhancements.

Please refer to CHANGELOG v1.9 for a full list of features in this release

备注

Release details - Release v1.9

备注

How to set up KubeEdge - Setup

Release Highlights

Support Custom HTTP Request Routing from Edge to Cloud through ServiceBus for Applications

A HTTP server is added to ServiceBus, to support custom http request routing from edge to cloud for applications. This simplifies the rest api access with http server on the cloud while client is in the edge.

Refer to the links for more details. (#3254, #3301)

Support CloudCore to run independently of the Kubernetes Master host

CloudCore now supports to run independently of the Kubernetes Master host, iptablesmanager has been added as an independent component, users only need to deploy the iptablesmanager to Kubernetes Master host, which now can add the iptable rules for Cloud-Edge tunnel automatically

Refer to the links for more details. (#3265)

EdgeMesh add tls and encryption security

EdgeMesh's tunnel module adds tls and encryption security capabilities. These features bring more secure protection measures to the user's edgemesh-server component and reduce the risk of edgemesh-server being attacked.

Refer to the links for more details. (EdgeMesh#127)

Enhanced the ease of use of EdgeMesh

EdgeMesh has many improvements in ease of use. Now users can easily deploy EdgeMesh's server and agent components with a single command of helm. At the same time, the restriction on service port naming is removed, and the docker0 dependency is removed, making it easier for users to use EdgeMesh.

Refer to the links for more details. (EdgeMesh#123, EdgeMesh#126, EdgeMesh#136, EdgeMesh#175)

Support containerized deployment of CloudCore using Helm

CloudCore now supports containerized deployment using Helm, which provides better containerized deployment experience.

Refer to the links for more details. (#3265)

Support compiled into rpm package and installed on OS such as openEuler using yum package manager

KubeEdge now supports compiled into rpm package and installed on OS such as openEuler using yum package manager.

Refer to the links for more details. (#3089, #3171)

In addition to the above new features, KubeEdge v1.9 also includes the following enhancements:

  • Rpminstaller: add support for openEuler (#3089)

  • Replaced 'kubeedge/pause' with multi arch image (#3114)

  • Make meta server addr configurable (#3119)

  • Added iptables to Dockerfile and made cloudcore privileged (#3129)

  • Added CustomInterfaceEnabled and CustomInterfaceName for edgecore (#3130)

  • Add experimental feature (#3131)

  • Feat(edge): node ephemeral storage info (#3157)

  • Support envFrom configmap in edge pods (#3176)

  • Update golang to 1.16 (#3190)

  • Metaserver: support shutdown server graceful (#3239)

  • Support labelselector for metaserver (#3262)

Future Outlook

With the release of v1.9, KubeEdge supports custom HTTP request routing from Edge to Cloud through ServiceBus for applications, supports CloudCore running independently of the Kubernetes Master host, supports containerized deployment of CloudCore using Helm, supports tls and encryption security and the ease of use of EdgeMesh. Thanks to Huawei, China Unicom, DaoCloud, Zhejiang University SEL Lab, ARM and other organizations for their contributions, as well as all community contributors for their support!

The community plans to further improve the user experience and the stability of KubeEdge in subsequent versions and create the best “open source” intelligent edge computing platform for everyone to freely use.

For more details regarding KubeEdge, please follow and join us here:

https://kubeedge.io

· 阅读需 4 分钟
Kevin Wang
Fei Xu

KubeEdge is an open source system extending native containerized application orchestration and device management to hosts at the Edge. It is built upon Kubernetes and provides core infrastructure support for networking, application deployment and metadata synchronization between cloud and edge. It also supports MQTT and allows developers to author custom logic and enable resource constrained device communication at the Edge.

On October 31st, the KubeEdge community is proud to announce the availability of KubeEdge 1.8. This release includes a major upgrade for Active-Active HA Support of CloudCore for Large Scale Cluster, EdgeMesh Architecture Modification, EdgeMesh Cross LAN Communication, and Kubernetes Dependencies Upgrade, which includes:

  • Active-Active HA Support of CloudCore for Large Scale Cluster [Beta]

  • EdgeMesh Architecture Modification

  • EdgeMesh Cross LAN Communication

  • Onvif Device Mapper

  • Kubernetes Dependencies Upgrade

  • 30+ bug fixes and enhancements.

Please refer to CHANGELOG v1.8 for a full list of features in this release

备注

Release details - Release v1.8

备注

How to set up KubeEdge - Setup

Release Highlights

Active-Active HA Support of CloudCore for Large Scale Cluster [Beta]

CloudCore now supports Active-Active HA mode deployment, which provides better scalability support for large scale clusters. Cloud-Edge tunnel can also work with multiple CloudCore instances. CloudCore now can add the iptable rules for Cloud-Edge tunnel automatically.

Refer to the links for more details. (#1560, #2999)

EdgeMesh Architecture Modification

EdgeMesh now has two parts: edgemesh-server and edgemesh-agent. The edgemesh-server requires a public IP address, when users use cross lan communication, it can act as a relay server in the LibP2P mode or assist the agent to establish p2p hole punching. The edgemesh-agent is used to proxy all application traffic of user nodes, acts as an agent for communication between pods at different locations.

Refer to the links for more details. (edgemesh#19)

EdgeMesh Cross LAN Communication

Users can use cross LAN communication feature to implement cross LAN edge to edge application communication and cross LAN edge to cloud application communication.

Refer to the links for more details. (edgemesh#26, edgemesh#37, edgemesh#57)

Onvif Device Mapper

Onvif Device Mapper with Golang implementation is provided, based on new Device Mapper Standard. Users now can use onvif device mapper to manage the ONVIF IP camera.

Refer to the links for more details. (mappers-go#48)

Kubernetes Dependencies Upgrade

Upgrade the vendered kubernetes version to v1.21.4, users now can use the feature of new version on the cloud and on the edge side.

Refer to the links for more details. (#3021, #3034)

In addition to the above new features, KubeEdge v1.8 also includes the following enhancements:

  • Refactor edgesite: import functions and structs instead of copying code (#2893)

  • Avoiding update cm after created a new cm (#2913)

  • Solved the checksum file download problem when ke was installed offline (#2909)

  • cloudcore support configmap dynamic update when the env of container inject from configmap or secret (#2931)

  • Remove edgemesh from edgecore (#2916)

  • keadm: support customsized labels when use join command (#2827)

  • support k8s v1.21.X (#3021)

  • Handling node/*/membership/detail (#3025)

  • sync the response message unconditionally (#3014)

  • support default NVIDIA SMI command (#2680)

Future Outlook

With the release of v1.8, KubeEdge supports Active-Active HA mode deployment, which provides better scalability support for large scale clusters, supports cross LAN communication by EdgeMesh, and supports Onvif Device Mapper. Thanks to Huawei, China Unicom, DaoCloud, Zhejiang University SEL Lab, ARM and other organizations for their contributions, as well as all community contributors for their support!

The community plans to further improve the user experience and the stability of KubeEdge in subsequent versions and create the best “open source” intelligent edge computing platform for everyone to freely use.

For more details regarding KubeEdge, please follow and join us here:

https://kubeedge.io