Rancher Cattle Cluster Agent Could not Resolve Host
起因 最近在使用rancher 导入外部k8s 集群时,遇到了一个问题: 在要导入的集群上执行命令后,创建的pod运行错误,错误日志大概如下: INFO: Environment: CATTLE_ADDRESS=100.66.209.198 CATTLE_CA_CHECKSUM= CATTLE_CLUSTER=true CATTLE_CLUSTER_AGENT_PORT=tcp://11.11.10.11:80 CATTLE_CLUSTER_AGENT_PORT_443_TCP=tcp://11.11.10.11:443 CATTLE_CLUSTER_AGENT_PORT_443_TCP_ADDR=10.96.0.125 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PORT=443 CATTLE_CLUSTER_AGENT_PORT_443_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_PORT_80_TCP=tcp://10.96.0.125:80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_ADDR=10.96.0.125 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PORT=80 CATTLE_CLUSTER_AGENT_PORT_80_TCP_PROTO=tcp CATTLE_CLUSTER_AGENT_SERVICE_HOST=10.96.0.125 CATTLE_CLUSTER_AGENT_SERVICE_PORT=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTP=80 CATTLE_CLUSTER_AGENT_SERVICE_PORT_HTTPS_INTERNAL=443 CATTLE_CLUSTER_REGISTRY= CATTLE_INGRESS_IP_DOMAIN=sslip.io CATTLE_INSTALL_UUID=333850e4-f500-43a2-a359-e1dfd94e4f35 CATTLE_INTERNAL_ADDRESS= CATTLE_IS_RKE=false CATTLE_K8S_MANAGED=true CATTLE_NODE_NAME=cattle-cluster-agent-55b9954958-5679q CATTLE_SERVER=https://xx.xx.vip CATTLE_SERVER_VERSION=v2.6.6 INFO: Using resolv.conf: search cattle-system.svc.cluster.local svc.cluster.local cluster.local nameserver 10.96.0.10 options ndots:5 ERROR: https://xx.xx.vip/ping is not accessible (Could not resolve host: xx.xx.vip) 可以很好的看出错误的原因是域名dns解析错误。 解决 在发现问题后,可以根据错误日志进行相应的解决,我们先查看coredns 的日志 kubectl logs deployment/coredns -n kube-system [ERROR] plugin/errors: 2 XX.XX.vip. A: read udp 100.108.11.198:32988->100.100.2.136:53: i/o timeout [ERROR] plugin/errors: 2 XX.XX.vip. A: read udp 100.108.11.198:53477->100.100.2.136:53: i/o timeout [ERROR] plugin/errors: 2 XX.XX.vip. AAAA: read udp 100.108.11.198:40436->100.100.2.136:53: i/o timeout 可以看到相应域名解析错误,根据以前的经验,先重启coredns容器尝试是否能解决问题,重启后发现问题未能解决(可能需要重启相应节点的机器。 ...