后浪笔记一零二四

本文档适用于 containerd version >= v2.2

Configuration Version Minimum containerd version
1 v1.0.0
2 v1.3.0
3 v2.0.0

有哪些功能是docker有,但是containerd没有的:

  • 镜像构建
  • Docker提供内置的网络管理功能(如 bridge、host、overlay 网络),containerd依赖cni插件

https://github.com/containerd/containerd/releases/download/v2.1.3/containerd-static-2.1.3-linux-amd64.tar.gz

  • containerd containerd-shim-runc-v2 containerd-stress ctr

https://github.com/moby/buildkit/releases/download/v0.23.1/buildkit-v0.23.1.linux-amd64.tar.gz

  • buildctl buildkit-cni-bridge buildkit-cni-firewall buildkit-cni-host-local buildkit-cni-loopback
  • buildkit-qemu-aarch64 buildkit-qemu-arm buildkit-qemu-i386 buildkit-qemu-ppc64le buildkit-qemu-riscv64
  • buildkit-qemu-s390x buildkit-runc buildkitd

https://github.com/kubernetes-sigs/cri-tools/releases/download/v1.33.0/crictl-v1.33.0-linux-amd64.tar.gz

  • crictl

cni-bin: https://github.com/containernetworking/plugins/releases/download/v1.7.1/cni-plugins-linux-amd64-v1.7.1.tgz

  • bandwidth dhcp firewall host-local loopback portmap sbr tap vlan
  • bridge dummy host-device ipvlan macvlan ptp static tuning vrf

systemd:

  1. containerd.service: https://raw.githubusercontent.com/containerd/containerd/main/containerd.service
  2. buildkit.service和buildkit.socket: github.com/moby/buildkit/examples/systemd/system

containerd/config.toml

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
    # 注意 shim 的 socketRoot 目录(/run/containerd/s)是硬编码的(出于安全考虑),无法通过配置改变它。
    version = 3
    disabled_plugins = ["io.containerd.differ.v1.erofs",
                   "io.containerd.snapshotter.v1.blockfile", "io.containerd.snapshotter.v1.btrfs",
                   "io.containerd.snapshotter.v1.devmapper", "io.containerd.snapshotter.v1.erofs",
                   "io.containerd.snapshotter.v1.zfs", "io.containerd.snapshotter.v1.native",
                   "io.containerd.tracing.processor.v1.otlp", "io.containerd.internal.v1.tracing",
                   "io.containerd.nri.v1.nri"]

    [plugins.'io.containerd.cri.v1.images'.registry]
      config_path = "/etc/containerd/certs.d"
    [plugins.'io.containerd.cri.v1.runtime']
      enable_cdi = false
      cdi_spec_dirs = ['/data/software/containerd/cdi']
    [plugins."io.containerd.cri.v1.runtime".containerd.runtimes.runc]
      runtime_type = "io.containerd.runc.v2"
      runtime_path = "/usr/local/bin/containerd-shim-runc-v2"
      cni_conf_dir = "/etc/cni/custom-net.d"  # 覆盖全局配置,默认值为"",表示使用全局 CNI 配置目录
    [plugins."io.containerd.cri.v1.runtime".containerd.runtimes.runc.options]
      BinaryName = "/usr/local/sbin/runc"
      SystemdCgroup = true
      # containerd就是通过github.com\containerd\containerd\api\types\runc\options这个api来传Root参数的,如果没有传,就使用config.toml中Root配置的值。
      Root = "/path/to/custom/runc/root"  # 如果该配置为"",就使用路径"/run/containerd/runc"
      CriuImagePath = ''
      CriuWorkPath = ''
    [plugins.'io.containerd.cri.v1.runtime'.cni]
      # containerd v2.1开始,bin_dir 被 bin_dirs 取代
      bin_dirs = ['/opt/cni/bin']
      conf_dir = "/etc/cni/net.d"  # 全局 CNI 配置目录
    [plugins.'io.containerd.image-verifier.v1.bindir']
      bin_dir = '/opt/containerd/image-verifier/bin'
    [plugins.'io.containerd.internal.v1.opt']
      path = '/opt/containerd'

    [plugins.'io.containerd.snapshotter.v1.overlayfs']
      root_path = "/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs"
    [plugins.'io.containerd.transfer.v1.local']
      config_path = '/etc/containerd/certs.d'

    [stream_processors]
      [stream_processors.'io.containerd.ocicrypt.decoder.v1.tar']
        accepts = ['application/vnd.oci.image.layer.v1.tar+encrypted']
        returns = 'application/vnd.oci.image.layer.v1.tar'
        path = 'ctd-decoder'
        args = ['--decryption-keys-path', '/etc/containerd/ocicrypt/keys']
        env = ['OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf']

      [stream_processors.'io.containerd.ocicrypt.decoder.v1.tar.gzip']
        accepts = ['application/vnd.oci.image.layer.v1.tar+gzip+encrypted']
        returns = 'application/vnd.oci.image.layer.v1.tar+gzip'
        path = 'ctd-decoder'
        args = ['--decryption-keys-path', '/etc/containerd/ocicrypt/keys']
        env = ['OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf']

插件

所有插件都记录在 github.com/containerd/containerd/plugins 目录下

containerd运行时不再加载动态库插件,所以config.toml文件中名为plugin_dir的根配置已过时。

io.containerd.grpc.v1.cri 和 io.containerd.cri.v1.runtime 的区别

特性 io.containerd.grpc.v1.cri io.containerd.cri.v1.runtime
定位 k8s CRI接口实现层(对接Kubelet) 运行时操作层(对接OCI运行时)
核心功能 Pod/容器生命周期管理、镜像拉取、网络配置 容器进程管理、shim交互、运行时切换
依赖关系 调用Containerd核心服务和io.containerd.cri.v1.runtime插件 被io.containerd.grpc.v1.cri插件调用

Support for the following properties of [plugins.\"io.containerd.grpc.v1.cri\".registry] is deprecated and will be removed in a future release.

  • The CRIRegistryMirrors (mirrors) property. Users should migrate to use config_path.
  • The CRIRegistryAuths (auths) property. Users should migrate to use ImagePullSecrets.
  • The CRIRegistryConfigs (configs) property. Users should migrate to use config_path.

默认开启的插件

全称 可禁用 作用
io.containerd.differ.v1.erofs Y 依赖io.containerd.snapshotter.v1.erofs插件,负责处理erofs的diff操作
io.containerd.gc.v1.scheduler 支持可配置的调度策略实现高效的资源回收
io.containerd.grpc.v1.cri k8s依赖这个插件
io.containerd.image-verifier.v1.bindir [y] 在容器镜像拉取或运行时,通过外部可执行程序对镜像进行安全验证
io.containerd.monitor.container.v1.restart 和 io.containerd.grpc.v1.cri 都依赖 io.containerd.image-verifier.v1
io.containerd.internal.v1.opt 控制二进制插件的安装位置,默认/opt/containerd
io.containerd.internal.v1.tracing Y 为 containerd 的 gRPC 调用和内部操作提供 OpenTelemetry 兼容的追踪能力
io.containerd.metadata.v1.bolt
io.containerd.monitor.container.v1.restart [y] 使用k8s时需要禁用这个插件
io.containerd.monitor.task.v1.cgroups [y] 资源监控
io.containerd.nri.v1.nri Y 用于支持实现节点资源接口(Node Resource Interface, NRI)
io.containerd.runtime.v2.task 是 containerd 运行时层的核心引擎,通过 Shim v2 架构实现了轻量、稳定的容器进程管理。
io.containerd.service.v1.diff-service
io.containerd.service.v1.tasks-service
io.containerd.shim.v1.manager [y] 建议禁用,shim v1已经过时,请使用shim v2
io.containerd.monitor.container.v1.restart和io.containerd.grpc.v1.cri都依赖io.containerd.shim.v1.manager
io.containerd.snapshotter.v1.blockfile Y
io.containerd.snapshotter.v1.btrfs Y
io.containerd.snapshotter.v1.devmapper Y
io.containerd.snapshotter.v1.erofs Y
io.containerd.snapshotter.v1.native Y native 插件直接使用宿主机的文件系统(如 ext4、xfs)构建容器 rootfs
io.containerd.snapshotter.v1.overlayfs
io.containerd.snapshotter.v1.zfs Y
io.containerd.tracing.processor.v1.otlp Y 同时支持追踪(traces)和指标(metrics)
io.containerd.transfer.v1.local 用于本地镜像导入导出

NRI is now enabled by default

NRI (Node Resource Interface) is a framework for plugging domain or vendor-specific logic into OCI-compatible container runtimes. It allows users to make changes to containers, perform extra actions, and improve the management of resources. NRI plugins are considered to be part of the container runtime, and access to NRI is controlled by restricting access to the systemwide NRI socket. See the “NRI” document for more details.

CDI is now enabled by default

CDI (Container Device Interface) provides a standard mechanism for device vendors to describe what is required to provide access to a specific resource such as a GPU beyond a simple device name. CDI is now part of the Kubernetes Device Plugin framework. See the Kubernetes Enhancement Proposal 4009.

Image verifier plugins

The transfer service now supports plugins that can verify that images are allowed to be pulled. Plugins like this can implement policy, such as enforcing that container images are signed, or that images must have particular names. Plugins are independent programs that communicate via command-line arguments and standard I/O. See more details in the image verifier plugin documentation.

criu

在 containerd 中,CRIU(Checkpoint/Restore In Userspace) 是一个用于实现容器或进程的检查点(Checkpoint)与恢复(Restore)功能的工具。

仓库地址: https://github.com/checkpoint-restore/criu.git

containerd 2.0之后,[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.*.options].CriuPath配置被移除,改为从PATH环境变量中找criu二进制可执行文件。

containerd如何配置Host Namespace

  1. 给docker.io这个registry设置镜像

假设ctr的--hosts-dir参数设置为 /data/software/containerd/certs.d

$ tree /data/software/containerd/certs.d/
/data/software/containerd/certs.d/
└── docker.io
    └── hosts.toml

$ cat /data/software/containerd/certs.d/docker.io/hosts.toml
server = "https://docker.io"

[host."https://055b251cd5000fb90fc3c01b214f2380.mirror.swr.myhuaweicloud.com"]
  capabilities = ["pull", "resolve"]
  1. Setup Default Mirror for All Registries
1
2
3
4
5
6
7
8
$ tree /etc/containerd/certs.d
/etc/containerd/certs.d
└── _default
    └── hosts.toml

$ cat /etc/containerd/certs.d/_default/hosts.toml
[host."https://registry.example.com"]
  capabilities = ["pull", "resolve"]
  1. 如何创建一个registry服务:
# 0. 部署registry
docker run -d -p 127.0.0.1:5000:5000 --restart always --name registry registry:2

# 1. 查看远程仓库有哪些镜像
curl http://localhost:5000/v2/_catalog

# 2. 查看镜像的标签列表
curl http://localhost:5000/v2/<name>/tags/list

io.containerd.content.v1.content 和 io.containerd.snapshotter.v1.overlayfs 和 io.containerd.runtime.v2.task 这三个目录的区别

content: 镜像层数据 (Blobs), 只读

snapshotter: 可写层 (upperdir),联合挂载视图

task: 运行时状态、IO 管道、配置文件、

删除时机:

  • content: 镜像删除后 GC
  • snapshotter: 容器删除后 GC
  • task: 容器进程退出后 GC

ctr 和 crictl 的区别

注意: runc的spec规范中是没有将/run挂载到tmpfs下的,但是ctr的spec规范会把/run挂载到tmpfs下

ctr是containerd的一个客户端工具

crictl是CRI(容器运行时)的命令行工具

ctr -v 输出的是containerd的版本, crictl -v 输出的是k8s的版本

ctr的常用命令:

1
2
3
4
5
6
# 推送镜像到harbor
$ ctr --namespace=k8s.io images push 镜像地址 --skip-verify --user admin:Harbor12345
# 打标签
$ ctr -n k8s.io tag   A   B
# 拉取镜像
$ ctr  images pull --user admin:Harbor12345  --tlscacert=/etc/containerd/myharbor-test.com/ca.crt 镜像地址

使用crictl命令之前,需要先配置/etc/crictl.yaml如下:

1
2
3
4
runtime-endpoint: unix:///run/containerd/containerd.sock
image-endpoint: unix:///run/containerd/containerd.sock
timeout: 10
debug: false
命令 docker ctr(containerd) crictl(kubernetes)
查看运行的容器 docker ps ctr task ls/ctr container ls crictl ps
查看镜像 docker images ctr image ls crictl images
查看容器日志 docker logs crictl logs
查看容器数据信息 docker inspect ctr container info crictl inspect
查看容器资源 docker stats crictl stats
启动/关闭已有的容器 docker start/stop ctr task start/kill crictl start/stop
运行一个新的容器 docker run ctr run 无(最小单元为 pod)
打标签 docker tag ctr image tag
创建一个新的容器 docker create ctr container create crictl create
导入镜像 docker load ctr image import
导出镜像 docker save ctr image export
删除容器 docker rm ctr container rm crictl rm
删除镜像 docker rmi ctr image rm crictl rmi
拉取镜像 docker pull ctr image pull ctictl pull
推送镜像 docker push ctr image push
登录或在容器内部执行命令 docker exec ctr task exec –exec-id=随便写只要唯一就行 crictl exec
清空不用的容器 docker image prune crictl rmi –prune

containerd 和 containerd-shim-runc-v2 和 runc

  1. 当客户端调用containerd 来创建一个容器时,containerd接收到请求后,并不会直接去操作容器,而是创建一个叫做 containerd-shim-runc-v2的进程(父进程为systemd(1)),让这个进程去操作容器;

  2. 之后,/usr/bin/containerd-shim-runc-v2会运行runc这个二进制文件去create、start容器,runc启动完容器后本身会直接退出,containerd-shim-runc-v2则会成为容器进程的父进程,负责收集容器进程的状态,上报给containerd。

containerd-stress

containerd-stress 是 containerd 项目的压力测试工具

测试 containerd 守护进程每秒创建、启动、删除容器的性能:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
$ containerd-stress -c 10 -i docker.io/library/alpine:latest --runtime io.containerd.runc.v2
# > INFO[0000] pulling docker.io/library/alpine:latest
# > INFO[0002] starting stress test run...
# > INFO[0063] worker 0 finished
# > INFO[0063] worker 3 finished
# > INFO[0063] worker 1 finished
# > INFO[0063] worker 7 finished
# > INFO[0063] worker 8 finished
# > INFO[0063] worker 9 finished
# > INFO[0063] worker 6 finished
# > INFO[0063] worker 5 finished
# > INFO[0063] worker 4 finished
# > INFO[0063] worker 2 finished
# > INFO[0063] ending test run in 60.399 seconds
# > INFO[0063] create/start/delete 987 containers in 60.399 seconds (16.341 c/sec) or (0.061 sec/c)  failures=0

压测结果说明: 60 秒内完成 987 个容器的创建、启动、删除等操作,平均每秒 16.34 个,经监测、系统磁盘写 io 操作数达到上限!!

iperf3 流量压测

1、节点间 iperf3 流量压测

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# a5 节点执行
firewall-cmd --add-port=5201/tcp
iperf3 -s
# 从 a4 节点发起压测
iperf3 -c 192.168.31.15 -n 100G -b 30G
# > Connecting to host 192.168.31.15, port 5201
# > [  4] local 10.14.0.4 port 39974 connected to 192.168.31.15 port 5201
# > [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
# > [  4]   0.00-1.00   sec  1.69 GBytes  19.5 Gbits/sec  191   2.01 MBytes
# > [  4]   1.00-2.00   sec  1.95 GBytes  19.7 Gbits/sec    0   2.01 MBytes
# > ...
# > ...
# > [  4]  46.00-47.00  sec  2.27 GBytes  19.5 Gbits/sec    0   2.90 MBytes
# > [  4]  47.00-48.00  sec  2.26 GBytes  19.4 Gbits/sec    0   2.90 MBytes
# > [  4]  48.00-48.15  sec   344 MBytes  19.4 Gbits/sec    0   2.90 MBytes
# > - - - - - - - - - - - - - - - - - - - - - - - - -
# > [ ID] Interval           Transfer     Bandwidth       Retr
# > [  4]   0.00-48.15  sec   100 GBytes  19.5 Gbits/sec  451             sender
# > [  4]   0.00-48.15  sec   100 GBytes  19.5 Gbits/sec                  receiver
# > iperf Done.

2、节点容器间 iperf3 流量压测

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
# 从 a4 节点容器发起压测
crictl exec -it 2369dff38e2db sh
iperf3 -c 10.15.0.3 -n 100G -b 30G
# > Connecting to host 10.15.0.3, port 5201
# > [  4] local 10.14.0.4 port 52846 connected to 10.15.0.3 port 5201
# > [ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
# > [  4]   0.00-1.00   sec  1.60 GBytes  15.7 Gbits/sec  204   1.98 MBytes
# > [  4]   1.00-2.00   sec  1.71 GBytes  16.7 Gbits/sec    0   2.00 MBytes
# > [  4]   2.00-3.00   sec  1.97 GBytes  16.9 Gbits/sec    0   2.01 MBytes
# > [  4]   3.00-4.00   sec  2.02 GBytes  17.3 Gbits/sec   34   2.01 MBytes
# > ...
# > ...
# > [  4]  45.00-46.00  sec  2.29 GBytes  19.7 Gbits/sec    0   2.02 MBytes
# > [  4]  46.00-47.00  sec  2.42 GBytes  18.8 Gbits/sec    0   2.02 MBytes
# > [  4]  47.00-47.20  sec   492 MBytes  20.8 Gbits/sec    0   2.02 MBytes
# > - - - - - - - - - - - - - - - - - - - - - - - - -
# > [ ID] Interval           Transfer     Bandwidth       Retr
# > [  4]   0.00-47.20  sec   100 GBytes  18.2 Gbits/sec  298             sender
# > [  4]   0.00-47.20  sec   100 GBytes  18.2 Gbits/sec                  receiver
# > iperf Done.

3、结论

节点间流量压测平均带宽 19.5 Gbits/sec,节点容器间压测平均带宽 18.2 Gbits/sec,容器化后其流量平均损耗为 (19.5 - 18.2)/19.5 = 6.7%

在linux系统中如何搭建windows和mac os环境

虽然linux桌面一言难尽,但是这不影响linux作为自己的主操作系统,因为linux中运行windows或者mac的虚拟机性能损耗极低,但是windows或者mac中运行linux虚拟机性能损耗极大。

https://github.com/dockur/windows

https://github.com/dockur/macos

  1. 创建pod-config.json
{
  "metadata": {
    "name": "windows-pod",
    "namespace": "default",
    "uid": "hdishd83djaidwnduwk28bcsb"
  },
  "log_directory": "/data/software/containerd/windows-pod/logs",
  "linux": {
    "security_context": {
      "privileged": true,
      "capabilities": {
        "add_capabilities": ["NET_ADMIN"]
      }
    }
  }
}
  1. 创建container-config.json
{
  "metadata": {
    "name": "windows-container"
  },
  "image": {
    "image": "dockurr/windows"
  },
  "command": [],
  "envs": [
    {"key": "VERSION", "value": "11"}
  ],
  "devices": [
    {"host_path": "/dev/kvm", "container_path": "/dev/kvm"},
    {"host_path": "/dev/net/tun", "container_path": "/dev/net/tun"}
  ],
  "mounts": [
    {
      "host_path": "/data/software/containerd/windows-pod/storage",
      "container_path": "/storage",
      "read_only": false
    }
  ],
  "linux": {
    "security_context": {
      "privileged": true
    }
  },
  "log_path": "windows.log"
}
  1. 创建windows.sh
#!/bin/bash

set -o errexit
set -o nounset
set -o pipefail
set -x

windows_pod=`crictl pods --name windows-pod --quiet`
if [ -n "$windows_pod" ]; then
        crictl rmp -f $windows_pod
fi
# 创建 Pod
POD_ID=$(crictl runp pod-config.json)

crictl pull dockurr/windows
# 创建容器(关联到 Pod)
CONTAINER_ID=$(crictl create $POD_ID container-config.json pod-config.json)

# 启动容器
crictl start $CONTAINER_ID
  1. 端口映射
# 查到pod id
crictl pods
# 端口映射
crictl port-forward pod-id 8006:8006

本文发表于 0001-01-01,最后修改于 0001-01-01。

本站永久域名「 jiavvc.top 」,也可搜索「 后浪笔记一零二四 」找到我。


上一篇 « 下一篇 »

赞赏支持

请我吃鸡腿 =^_^=

i ysf

云闪付

i wechat

微信

推荐阅读

Big Image