极客油画

本文档适用于 containerd version >= v2.2

1. 绿色安装containerd

1) 下载安装文件

https://github.com/containerd/containerd/releases/download/v2.1.3/containerd-static-2.1.3-linux-amd64.tar.gz
* containerd  containerd-shim-runc-v2  containerd-stress  ctr

https://github.com/opencontainers/runc/releases/download/v1.4.0/runc.amd64

https://github.com/containernetworking/plugins/releases/download/v1.7.1/cni-plugins-linux-amd64-v1.7.1.tgz
* bandwidth  dhcp   firewall     host-local  loopback  portmap  sbr     tap     vlan
* bridge     dummy  host-device  ipvlan      macvlan   ptp      static  tuning  vrf

https://raw.githubusercontent.com/containerd/containerd/main/containerd.service

go install github.com/containernetworking/cni/cnitool

2) containerd/config.toml

Configuration Version Minimum containerd version
1 v1.0.0
2 v1.3.0
3 v2.0.0
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
# 注意 shim 的 socketRoot 目录(/run/containerd/s)是硬编码的(出于安全考虑),无法通过配置改变它。
version = 3
root = '/var/lib/containerd'
state = '/run/containerd'
# 使用k8s时,需要禁用这个插件: io.containerd.monitor.container.v1.restart
disabled_plugins = ["io.containerd.differ.v1.erofs",
               "io.containerd.snapshotter.v1.blockfile", "io.containerd.snapshotter.v1.btrfs",
               "io.containerd.snapshotter.v1.devmapper", "io.containerd.snapshotter.v1.erofs",
               "io.containerd.snapshotter.v1.zfs", "io.containerd.snapshotter.v1.native",
               "io.containerd.tracing.processor.v1.otlp", "io.containerd.internal.v1.tracing",
               "io.containerd.nri.v1.nri"]

# ctr plugins ls 命令可以查看所有的插件
[plugins]

  # cri配置
  [plugins.'io.containerd.cri.v1.images'.registry]
    config_path = "/etc/containerd/certs.d"
  [plugins.'io.containerd.cri.v1.runtime']
    enable_cdi = false
    cdi_spec_dirs = ['/data/software/containerd/cdi']
    [plugins."io.containerd.cri.v1.runtime".containerd.runtimes.runc]
      runtime_type = "io.containerd.runc.v2"
      runtime_path = "/usr/local/bin/containerd-shim-runc-v2"
      cni_conf_dir = "/etc/cni/custom-net.d"  # 覆盖全局配置,默认值为"",表示使用全局 CNI 配置目录
      [plugins."io.containerd.cri.v1.runtime".containerd.runtimes.runc.options]
        BinaryName = "/usr/local/sbin/runc"
        SystemdCgroup = true
        # containerd就是通过github.com\containerd\containerd\api\types\runc\options这个api来传Root参数的,如果没有传,就使用config.toml中Root配置的值。
        Root = "/path/to/custom/runc/root"  # 如果该配置为"",就使用路径"/run/containerd/runc"
        CriuImagePath = ''
        CriuWorkPath = ''
    [plugins.'io.containerd.cri.v1.runtime'.cni]
      # containerd v2.1开始,bin_dir 被 bin_dirs 取代
      bin_dirs = ['/opt/cni/bin']
      conf_dir = "/etc/cni/net.d"  # 全局 CNI 配置目录
  [plugins.'io.containerd.grpc.v1.cri']
    disable_tcp_service = true
    stream_server_address = '127.0.0.1'
    stream_server_port = '0'
    stream_idle_timeout = '4h0m0s'
    enable_tls_streaming = false
    [plugins.'io.containerd.grpc.v1.cri'.x509_key_pair_streaming]
      tls_cert_file = ''
      tls_key_file = ''

  # 通用配置
  [plugins.'io.containerd.image-verifier.v1.bindir']
    bin_dir = '/opt/containerd/image-verifier/bin'
  [plugins.'io.containerd.internal.v1.opt']
    path = '/opt/containerd'
  [plugins.'io.containerd.snapshotter.v1.overlayfs']
    root_path = "/var/lib/containerd/io.containerd.snapshotter.v1.overlayfs"
  # 本地传输插件,主要负责容器镜像在本地文件系统上的导入(import)和导出(export)操作,还可以处理镜像的pull和push操作。
  [plugins.'io.containerd.transfer.v1.local']
    config_path = '/etc/containerd/certs.d'
    max_concurrent_downloads = 3
    concurrent_layer_fetch_buffer = 0
    max_concurrent_uploaded_layers = 3
    check_platform_supported = false

# 流处理器配置
[stream_processors]
  [stream_processors.'io.containerd.ocicrypt.decoder.v1.tar']
    accepts = ['application/vnd.oci.image.layer.v1.tar+encrypted']
    returns = 'application/vnd.oci.image.layer.v1.tar'
    path = 'ctd-decoder'
    args = ['--decryption-keys-path', '/etc/containerd/ocicrypt/keys']
    env = ['OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf']

  [stream_processors.'io.containerd.ocicrypt.decoder.v1.tar.gzip']
    accepts = ['application/vnd.oci.image.layer.v1.tar+gzip+encrypted']
    returns = 'application/vnd.oci.image.layer.v1.tar+gzip'
    path = 'ctd-decoder'
    args = ['--decryption-keys-path', '/etc/containerd/ocicrypt/keys']
    env = ['OCICRYPT_KEYPROVIDER_CONFIG=/etc/containerd/ocicrypt/ocicrypt_keyprovider.conf']

2. 插件

所有插件都记录在 github.com/containerd/containerd/plugins 目录下

containerd运行时不再加载动态库插件,所以config.toml文件中名为plugin_dir的根配置已过时。

2.1 io.containerd.grpc.v1.cri 和 io.containerd.cri.v1.runtime 的区别

特性 io.containerd.grpc.v1.cri io.containerd.cri.v1.runtime
定位 k8s CRI接口实现层(对接Kubelet) 运行时操作层(对接OCI运行时)
核心功能 Pod/容器生命周期管理、镜像拉取、网络配置 容器进程管理、shim交互、运行时切换
依赖关系 调用Containerd核心服务和io.containerd.cri.v1.runtime插件 被io.containerd.grpc.v1.cri插件调用

Support for the following properties of [plugins.\"io.containerd.grpc.v1.cri\".registry] is deprecated and will be removed in a future release.

  • The CRIRegistryMirrors (mirrors) property. Users should migrate to use config_path.
  • The CRIRegistryAuths (auths) property. Users should migrate to use ImagePullSecrets.
  • The CRIRegistryConfigs (configs) property. Users should migrate to use config_path.

2.2 默认开启的插件

全称 可禁用 作用
io.containerd.differ.v1.erofs Y 依赖io.containerd.snapshotter.v1.erofs插件,负责处理erofs的diff操作
io.containerd.gc.v1.scheduler 支持可配置的调度策略实现高效的资源回收
io.containerd.grpc.v1.cri k8s依赖这个插件
io.containerd.image-verifier.v1.bindir [y] 在容器镜像拉取或运行时,通过外部可执行程序对镜像进行安全验证
io.containerd.monitor.container.v1.restart 和 io.containerd.grpc.v1.cri 都依赖 io.containerd.image-verifier.v1
io.containerd.internal.v1.opt 控制二进制插件的安装位置,默认/opt/containerd
io.containerd.internal.v1.tracing Y 为 containerd 的 gRPC 调用和内部操作提供 OpenTelemetry 兼容的追踪能力
io.containerd.metadata.v1.bolt
io.containerd.monitor.container.v1.restart [y] 使用k8s时需要禁用这个插件
io.containerd.monitor.task.v1.cgroups [y] 资源监控
io.containerd.nri.v1.nri Y 用于支持实现节点资源接口(Node Resource Interface, NRI)
io.containerd.runtime.v2.task 是 containerd 运行时层的核心引擎,通过 Shim v2 架构实现了轻量、稳定的容器进程管理。
io.containerd.service.v1.diff-service
io.containerd.service.v1.tasks-service
io.containerd.shim.v1.manager [y] 建议禁用,shim v1已经过时,请使用shim v2
io.containerd.monitor.container.v1.restart和io.containerd.grpc.v1.cri都依赖io.containerd.shim.v1.manager
io.containerd.snapshotter.v1.blockfile Y
io.containerd.snapshotter.v1.btrfs Y
io.containerd.snapshotter.v1.devmapper Y
io.containerd.snapshotter.v1.erofs Y
io.containerd.snapshotter.v1.native Y native 插件直接使用宿主机的文件系统(如 ext4、xfs)构建容器 rootfs
io.containerd.snapshotter.v1.overlayfs
io.containerd.snapshotter.v1.zfs Y
io.containerd.tracing.processor.v1.otlp Y 同时支持追踪(traces)和指标(metrics)
io.containerd.transfer.v1.local 用于本地镜像导入导出

2.3 NRI is now enabled by default

NRI (Node Resource Interface) is a framework for plugging domain or vendor-specific logic into OCI-compatible container runtimes. It allows users to make changes to containers, perform extra actions, and improve the management of resources. NRI plugins are considered to be part of the container runtime, and access to NRI is controlled by restricting access to the systemwide NRI socket. See the “NRI” document for more details.

2.4 CDI is now enabled by default

CDI (Container Device Interface) provides a standard mechanism for device vendors to describe what is required to provide access to a specific resource such as a GPU beyond a simple device name. CDI is now part of the Kubernetes Device Plugin framework. See the Kubernetes Enhancement Proposal 4009.

2.5 Image verifier plugins

The transfer service now supports plugins that can verify that images are allowed to be pulled. Plugins like this can implement policy, such as enforcing that container images are signed, or that images must have particular names. Plugins are independent programs that communicate via command-line arguments and standard I/O. See more details in the image verifier plugin documentation.

2.6 criu

在 containerd 中,CRIU(Checkpoint/Restore In Userspace) 是一个用于实现容器或进程的检查点(Checkpoint)与恢复(Restore)功能的工具。

仓库地址: https://github.com/checkpoint-restore/criu.git

containerd 2.0之后,[plugins."io.containerd.grpc.v1.cri".containerd.runtimes.*.options].CriuPath配置被移除,改为从PATH环境变量中找criu二进制可执行文件。

3. containerd如何配置镜像加速

1) 给docker.io这个registry设置镜像

假设ctr的--hosts-dir参数设置为 /data/software/containerd/certs.d

$ tree /data/software/containerd/certs.d/
/data/software/containerd/certs.d/
└── docker.io
    └── hosts.toml

$ cat /data/software/containerd/certs.d/docker.io/hosts.toml
server = "https://docker.io"

[host."https://055b251cd5000fb90fc3c01b214f2380.mirror.swr.myhuaweicloud.com"]
  capabilities = ["pull", "resolve"]

2) Setup Default Mirror for All Registries

1
2
3
4
5
6
7
8
$ tree /etc/containerd/certs.d
/etc/containerd/certs.d
└── _default
    └── hosts.toml

$ cat /etc/containerd/certs.d/_default/hosts.toml
[host."https://registry.example.com"]
  capabilities = ["pull", "resolve"]

3) 如何创建一个registry服务:

# 0. 部署registry
docker run -d -p 127.0.0.1:5000:5000 --restart always --name registry registry:2

# 1. 查看远程仓库有哪些镜像
curl http://localhost:5000/v2/_catalog

# 2. 查看镜像的标签列表
curl http://localhost:5000/v2/<name>/tags/list

4. io.containerd.content.v1.content 和 io.containerd.snapshotter.v1.overlayfs 和 io.containerd.runtime.v2.task 这三个目录的区别

content: 镜像层数据 (Blobs), 只读

snapshotter: 可写层 (upperdir),联合挂载视图

task: 运行时状态、IO 管道、配置文件、

删除时机:

  • content: 镜像删除后 GC
  • snapshotter: 容器删除后 GC
  • task: 容器进程退出后 GC

state目录和 root目录下都包含 io.containerd.runtime.v2.task,这是为了将“运行时状态数据”与“运行时根文件系统”进行分离,这是 containerd 的标准设计

5. ctr 和 docker-cli 的区别

注意: runc的spec规范中是没有将/run挂载到tmpfs下的,但是ctr的spec规范会把/run挂载到tmpfs下({state_dir}/io.containerd.runtime.v2.task/default/{container_id}/config.json)

有哪些功能是docker有,但是containerd没有的:

  • 镜像构建:可以使用https://github.com/moby/buildkit
    • buildkit.service和buildkit.socket: github.com/moby/buildkit/examples/systemd/system
  • Docker提供内置的网络管理功能(如 bridge、host、overlay 网络),containerd依赖cni插件
命令 docker ctr(containerd)
查看运行的容器 docker ps ctr task ls/ctr container ls
查看镜像 docker images ctr image ls
查看容器日志 docker logs
查看容器数据信息 docker inspect ctr container info
查看容器资源 docker stats
启动/关闭已有的容器 docker start/stop ctr task start/kill
运行一个新的容器 docker run ctr run
打标签 docker tag ctr image tag
创建一个新的容器 docker create ctr container create
导入镜像 docker load ctr image import
导出镜像 docker save ctr image export
删除容器 docker rm ctr container rm
删除镜像 docker rmi ctr image rm
拉取镜像 docker pull ctr image pull
推送镜像 docker push ctr image push
登录或在容器内部执行命令 docker exec ctr task exec –exec-id=随便写只要唯一就行
清空不用的容器 docker image prune

从下面的代码得知,当ctr container create命令的runtime使用“io.containerd.runc.v2”,--runtime-config-path参数就会跳过,设置了也不会生效。

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
# cmd\ctr\commands\run\run_unix.go
func NewContainer(ctx context.Context, client *containerd.Client, cliContext *cli.Context) (containerd.Container, error) {
	//...
	var (
		opts  []oci.SpecOpts
		cOpts []containerd.NewContainerOpts
		spec  containerd.NewContainerOpts
	)
	//...
		opts = append(opts, oci.WithDefaultSpecForPlatform(platform), oci.WithDefaultUnixDevices)
	//...
	runtimeOpts, err := commands.RuntimeOptions(cliContext)
	//...
	spec = containerd.WithSpec(&s, opts...)
	cOpts = append(cOpts, spec)
	return client.NewContainer(ctx, id, cOpts...)
}

# cmd\ctr\commands\commands_unix.go:
func RuntimeOptions(cliContext *cli.Context) (interface{}, error) {
	// validate first
	if (cliContext.String("runc-binary") != "" || cliContext.Bool("runc-systemd-cgroup")) &&
		cliContext.String("runtime") != "io.containerd.runc.v2" {
		return nil, errors.New("specifying runc-binary and runc-systemd-cgroup is only supported for \"io.containerd.runc.v2\" runtime")
	}

	if cliContext.String("runtime") == "io.containerd.runc.v2" {
		return getRuncOptions(cliContext)
	}

	if configPath := cliContext.String("runtime-config-path"); configPath != "" {
		return &runtimeoptions.Options{
			ConfigPath: configPath,
		}, nil
	}

	return nil, nil
}

6. containerd 和 containerd-shim-runc-v2 和 runc

  1. 当客户端调用containerd 来创建一个容器时,containerd接收到请求后,并不会直接去操作容器,而是创建一个叫做 containerd-shim-runc-v2的进程(父进程为systemd(1)),让这个进程去操作容器;

  2. 之后,/usr/bin/containerd-shim-runc-v2会运行runc这个二进制文件去create、start容器,runc启动完容器后本身会直接退出,containerd-shim-runc-v2则会成为容器进程的父进程,负责收集容器进程的状态,上报给containerd。

7. containerd-stress

containerd-stress 是 containerd 项目的压力测试工具

测试 containerd 守护进程每秒创建、启动、删除容器的性能:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
$ containerd-stress -c 10 -i docker.io/library/alpine:latest --runtime io.containerd.runc.v2
# > INFO[0000] pulling docker.io/library/alpine:latest
# > INFO[0002] starting stress test run...
# > INFO[0063] worker 0 finished
# > INFO[0063] worker 3 finished
# > INFO[0063] worker 1 finished
# > INFO[0063] worker 7 finished
# > INFO[0063] worker 8 finished
# > INFO[0063] worker 9 finished
# > INFO[0063] worker 6 finished
# > INFO[0063] worker 5 finished
# > INFO[0063] worker 4 finished
# > INFO[0063] worker 2 finished
# > INFO[0063] ending test run in 60.399 seconds
# > INFO[0063] create/start/delete 987 containers in 60.399 seconds (16.341 c/sec) or (0.061 sec/c)  failures=0

压测结果说明: 60 秒内完成 987 个容器的创建、启动、删除等操作,平均每秒 16.34 个,经监测、系统磁盘写 io 操作数达到上限!!

附录A 在linux系统中如何搭建windows和mac os环境

虽然linux桌面一言难尽,但是这不影响linux作为自己的主操作系统,因为linux中运行windows或者mac的虚拟机性能损耗极低,但是windows或者mac中运行linux虚拟机性能损耗极大。

确认BIOS中已启用虚拟化扩展(Intel VT-x 或 AMD SVM)

linux上检测是否支持kvm: sudo apt install cpu-checker; sudo kvm-ok

https://github.com/dockur/windows

https://github.com/dockur/macos

ctr images pull --hosts-dir /usr/local/containerd/etc/certs.d/ docker.io/dockurr/windows:latest

ctr images ls

ip netns add cni-containerd

export CNI_PATH=/usr/local/containerd/bin

export NETCONFPATH=/usr/local/containerd/etc/cni/net.d

这里的containerd-net要和cni配置文件中的name对应起来
CNI_NETNSDIR环境变量的默认是:/var/run/netns
cnitool add ctr-net /var/run/netns/cni-containerd

ctr containers delete my-windows

hosts文件内容:
127.0.0.1 localhost

ctr container create --runc-binary /usr/local/containerd/bin/runc \
--env VERSION=2022 --env LANGUAGE=Chinese --env REGION=zh-CN \
--env KEYBOARD=zh-CN --env RAM_SIZE=8G --env CPU_CORES=4 \
--env USERNAME=docker --env PASSWORD=admin \
--mount type=bind,src=`pwd`/storage,dst=/storage,options=rbind:rw \
--mount type=bind,src=`pwd`/shared,dst=/shared,options=rbind:rw \
--mount type=bind,src=`pwd`/run,dst=/run,options=rbind:rw  \
--mount type=bind,src=`pwd`/hosts,dst=/etc/hosts,options=rbind:rw \
--privileged --pid-file `pwd`/pid \
--with-ns "network:/var/run/netns/cni-containerd" \
--cap-add CAP_NET_ADMIN \
--device /dev/kvm --device /dev/net/tun \
docker.io/dockurr/windows:latest   my-windows


ctr tasks start my-windows

执行ctr task exec --exec-id=123 my-windows ip a命令获取eth0网卡的ip,之后使用http://{eth0网卡的ip}:8006链接访问windows桌面

清理使用如下命令:

cnitool del containerd-net /var/run/netns/cni-containerd
ip netns del cni-containerd

在容器内执行capsh --print命令的作用:检测--cap-add CAP_NET_ADMIN参数是否有生效

即使容器拥有 CAP_NET_ADMIN能力,容器内运行的进程也必须以 root 用户(UID 0)运行才能使用该能力

主机上的安全模块(如 AppArmor​ 或 SELinux)可能覆盖或限制了容器能力。

  • AppArmor:检查容器的 AppArmor 配置文件。默认的 docker-default或类似配置文件可能明确拒绝了某些网络管理操作。
  • SELinux:检查 SELinux 上下文和布尔值设置。

正常日志:

2026-02-09T23:45:39.391448776+08:00 stdout F ❯ Starting Windows for Docker v5.14...
2026-02-09T23:45:39.391465095+08:00 stdout F ❯ For support visit https://github.com/dockur/windows
2026-02-09T23:45:39.457716282+08:00 stdout F ❯ CPU: Intel Core i5 12500H | RAM: 13/16 GB | DISK: 841 GB (ext4) | KERNEL: 6.14.0-37...
2026-02-09T23:45:39.457742375+08:00 stdout F 
2026-02-09T23:45:39.843755643+08:00 stdout F [1;34m❯ [1;36mBooting Windows using QEMU v10.0.6...[0m
2026-02-09T23:45:40.487375804+08:00 stdout F BdsDxe: loading Boot0004 "Windows Boot Manager" from HD(1,GPT,03CE8108-75FA-4C01-9E37-88BDAB9FF528,0x800,0x40000)/\EFI\Microsoft\Boot\bootmgfw.efi
2026-02-09T23:45:40.489271782+08:00 stdout F BdsDxe: starting Boot0004 "Windows Boot Manager" from HD(1,GPT,03CE8108-75FA-4C01-9E37-88BDAB9FF528,0x800,0x40000)/\EFI\Microsoft\Boot\bootmgfw.efi
2026-02-09T23:46:09.916272501+08:00 stdout F [1;34m❯ [1;36mWindows started successfully, visit http://127.0.0.1:8006/ to view the screen...[0m

nerdctl 是如何实现服务器重启后 netns 不丢失的:

  • 创建与挂载:当容器启动时,containerd的 CNI插件会调用 ip netns add命令创建一个新的 netns,并立即将其挂载到主机的一个固定目录下(通常是 /var/run/netns/或 /var/lib/cni/下的某个子目录),生成一个像文件一样的挂载点(例如 /var/run/netns/cni-<容器ID>)。
  • 解耦生命周期:这个挂载操作是关键。它使得该 netns 的引用计数增加,即使容器内的初始进程退出,只要这个挂载点存在,内核就不会销毁这个 netns。
  • 状态持久化:CNI 插件(如 bridge)在这个持久的 netns 内部进行所有网络配置(创建 veth pair、分配 IP、设置路由等)。这些配置都保存在这个“冻结”的 netns 中。
  • 重启时复用:当容器被重启(nerdctl restart)或重新创建(使用 –restart策略)时,containerd会检查该容器对应的 netns 挂载点是否仍然存在。如果存在,它会将新的容器进程直接放入这个已有的、状态完好的 netns​ 中,而不是创建一个新的。这样就实现了 IP 地址、路由、防火墙规则等网络状态的完全保留。

本文发表于 0001-01-01,最后修改于 0001-01-01。

本站永久域名「 jiavvc.top 」,也可搜索「 极客油画 」找到我。


上一篇 « 下一篇 »

赞赏支持

请我吃鸡腿 =^_^=

i ysf

云闪付

i wechat

微信

推荐阅读

Big Image