Kubernetes/Monitoring
Monitor cluster resources
Metric-server
In order to get cluster resources you need a metric collector plugin. Popular one was heapster now deprecated replaced by metric-server.
Install metrics-server
# General installation kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml kubectl get deployment metrics-server -n kube-system NAME READY UP-TO-DATE AVAILABLE AGE metrics-server 1/1 1 1 6m # Alternative git clone https://github.com/kubernetes-incubator/metrics-server.git kubectl apply -f ~/metrics-server/deploy/1.8+/
EKS Installation [1]
wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml -O metrics-server-0.3.6.yaml
# Edit and add following commnd with arguments to the deploy/kubernetes/metrics-server-deployment.yaml:
command:
- /metrics-server
- --logtostderr
- --kubelet-insecure-tls=true
- --kubelet-preferred-address-types=InternalIP
- --v=2
# kubelet-insecure-tls – do not check kubelet-clients CA certificate on nodes
# kubelet-preferred-address-types – how to find resources in the Kubernetes space – by using Hostname, InternalDNS,
# InternalIP, ExternalDNS or ExternalIP, for the EKS set it to the InternalIP value
# v=2 – logs detalization level
kubectl apply -f metrics-server-0.3.6.yaml
stern metrics-server -n kube-system
Get metrics, you may need to wait 1-2 minutes to complete first metrics scrape [2]
# verify metrics server API
kubectl get --raw /apis/metrics.k8s.io/
{"kind":"APIGroup","apiVersion":"v1","name":"metrics.k8s.io","versions":[{"groupVersion":"metrics.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"metrics.k8s.io/v1beta1","version":"v1beta1"}}
kubectl get apiservices | grep metrics
v1beta1.metrics.k8s.io kube-system/metrics-server True 30m
kubectl top node # CPU,memory utilization of the nodes in your cluster
kubectl top pods # CPU,memory utilization of the pods in your cluster
kubectl top pods -A # CPU,memory of pods in all namespaces
kubectl top pods -A --sort-by=memory
kubectl top pod -l run=<label> # CPU and memory of pods with a label selector:
kubectl top pod <pod-name> # CPU,memory of a specific pod
kubectl top pods --containers # CPU,memory of the containers inside the pod
[1] EKS errors if installed oob unable to fully scrape metrics:
metrics-server-aaaaaaaaaa-h64c5 metrics-server E0714 15:59:42.204640 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:ip-10-35-70-169.eu-west-1.compute.internal: unable to fetch metrics from Kubelet ip-10-35-70-169.eu-west-1.compute.internal (ip-10-35-70-169.dev.acme.com): Get https://ip-10-35-70-169.dev.acme.com:10250/stats/summary?only_cpu_and_memory=true: dial tcp: lookup ip-10-35-70-169.dev.acme.com on 172.20.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:
[2] Working metric-server scrapes metrics by default every 1 minute.
metrics-server-bbbbbbbbbb-s5pn5 metrics-server I0714 16:20:07.784364 1 manager.go:95] Scraping metrics from 4 sources metrics-server-bbbbbbbbbb-s5pn5 metrics-server I0714 16:20:07.787577 1 manager.go:120] Querying source: kubelet_summary:ip-10-00-64-185.eu-west-1.compute.internal metrics-server-bbbbbbbbbb-s5pn5 metrics-server I0714 16:20:07.812605 1 manager.go:120] Querying source: kubelet_summary:ip-10-00-70-169.eu-west-1.compute.internal metrics-server-bbbbbbbbbb-s5pn5 metrics-server I0714 16:20:07.814077 1 manager.go:120] Querying source: kubelet_summary:ip-10-00-68-179.eu-west-1.compute.internal metrics-server-bbbbbbbbbb-s5pn5 metrics-server I0714 16:20:07.814843 1 manager.go:120] Querying source: kubelet_summary:ip-10-00-69-23.eu-west-1.compute.internal metrics-server-bbbbbbbbbb-s5pn5 metrics-server I0714 16:20:07.820754 1 manager.go:148] ScrapeMetrics: time: 36.362483ms, nodes: 4, pods: 57
cAdvisor deprecated in v1.11
Every node in a Kubernetes cluster has a Kubelet process. Within each Kubelet is a cAdvisor process. The cAdvisor is continuously gathering metrics about the state of the cluster. It's always available
minikube start --extra-config=kubelet.CAdvisorPort=4194
kubectl proxy & # open a proxy to the Kubernetes API port
open $(minikube ip):4194 # cAdvisor also serves up the metrics is a helpful HTML format
# Each node provide statistics that are provided by cAdvisor. Access the node stats
curl localhost:8001/api/v1/nodes/$(kubectl get nodes -o=jsonpath="{.items[0].metadata.name}")/proxy/stats/
# Kubernetes API also gather the cAdvisor metrics at /metrics
curl localhost:8001/metrics
Liveness and Readiness probes
Check this Visual explanation
readinessProbe- checks if a pod is ready to receive a client requests, when passed, then the pod is added toendpoint. When the probe fails - the pod is not restarted, instead removed fromendpoint.livenessProbe- when the probe fails, pod gets restarted
Get service endpoints. Only healthy and ready pods will be added to the endpoint
kubectl get endpoint
Liveness and readiness probes in both Pod and Deployment manifests are at .spec.containers.image level
<syntaxhighlightjs lang=yaml>
apiVersion: v1
kind: Pod
metadata:
name: liveness-readiness-pod
spec:
containers:
- image: nginx
name: main
livenessProbe:
httpGet: # exec: or tcpSocket:
path: /healthz # not all containers have this endpoint
port: 8081
readinessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 5 # default, tell kubelet to wait 5 second after container starts, before performing the first probe
periodSeconds: 5 # default, tell kueblet to run probe ever 5s
</syntaxhighlightjs>
Logs
Container logs
Containerized applications usually write their logs to STDOUT and STDERR instead of writing their logs to files. Docker then redirects those streams to files. You can retrieve those files with the kubectl logs
These are stored on nodes in /var/log/ directory and contain everything containers send to STDOUT.
/var/log/containers/contains container logs, these are symlinks to../pods//var/log/containers/contains directory per each pod in form<namespace-<rs|deployment>/<pod-name>/0.log(logfile)0.logit's a symlink to/var/lib/docker/containers/uid-part1/uid-part2-json.log
$ ls -l /var/log/containers total 56 lrwxrwxrwx 1 root root 101 Oct 7 06:51 coredns-5644d7b6d9-hztth_kube-system_coredns-9de9395495186177f5112d795ca950dd0227e6f025f40c83ddf2a99c56802939.log -> /var/log/pods/kube-system_coredns-5644d7b6d9-hztth_5da159b3-64e7-48e4-b9f8-003f9623481d/coredns/0.log ...
In case your container logs multiple files, it will be difficult to distinguish them using kubectl logs command. Therefore you can introduce sidecars containers that tail individual logs and access them like that:
kubectl logs <pod> container-log-1kubectl logs <pod> container-log-2
kubelet runs as a process therefore writes logs to system location
/var/log
journalctl -u kubelet.service
</source>
Retrieve logs
kubectl logs <pod> <container> # container name is optional for a single container pods kubectl logs <pod> <container> --previous | -p flag # in case the container has crashed kubectl logs <pod> --all-containers=true kubectl logs --since=10m <pod> kubectl logs deployment/<pod> -c <container> # view the logs from a container within a pod within a deployment kubectl logs --tail=20 haproxy # tail x lines kubectl logs -l app=haproxy # logs from containers matching a label
Kubernetes worker nodes docker log configuration - size
The Docker runtime log configuration is setup on each node in /etc/docker/daemon.json
{
"bridge": "none",
"log-driver": "json-file",
"log-opts": {
"max-size": "10m",
"max-file": "10"
},
"live-restore": true,
"max-concurrent-downloads": 10
}
Kubernetes logs where are they coming from
Logs from the STDOUT and STDERR of containers in the pod are captured and stored inside files in /var/log/containers. This is what is presented when kubectl log is run. In order to understand why output from commands run by kubectl exec is not shown when running kubectl log, let's have a look how it all works with an example:
# Launch a pod running ubuntu that are sleeping forever kubectl run test --image=ubuntu --restart=Never -- sleep infinity # Exec into it kubectl exec -it test bash
Seen from inside the container it is the STDOUT and STDERR of PID 1 that are being captured. When you do a kubectl exec into the container a new process is created living alongside PID 1:
root@test:/# ps -auxf USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND root 7 0.0 0.0 18504 3400 pts/0 Ss 10:02 0:00 bash root 19 0.0 0.0 34396 2908 pts/0 R+ 10:05 0:00 \_ ps -auxf root 1 0.0 0.0 4528 836 ? Ss 10:01 0:00 sleep infinity
Redirecting to STDOUT is not working because /dev/stdout is a symlink to the process accessing it (/proc/self/fd/1 rather than /proc/1/fd/1).
root@test:/# ls -lrt /dev/stdout lrwxrwxrwx 1 root root 15 Nov 5 10:01 /dev/stdout -> /proc/self/fd/1
In order to see the logs from commands run with kubectl exec the logs need to be redirected to the streams that are captured by the kubelet (STDOUT and STDERR of pid 1). This can be done by redirecting output to /proc/1/fd/1.
root@test:/# echo "send-to-kubernetes-container-log" > /proc/1/fd/1
Exiting the interactive shell and checking the logs using kubectl logs should now show the output
$> kubectl logs test send-to-kubernetes-container-log
Termination message
WKubernetes allows to write a custom message to a custom file on termination. This message can be view directly using kubectl describe in Last State: Termination, Message: <custom message>
apiVersion: v1
kind: Pod
metadata:
name: pod2
spec:
containers:
- image: busybox
name: main
command:
- sh
- -c
- 'echo "I say that this container has been terminated at $(date)" > /var/termination-reason ; exit 1'
terminationMessagePath: /var/termination-reason
Troubelshooting
# get a yaml without status information (almost clean yaml manifest) kubectl -n web pod <failing-pod> -oyaml --export
References
- debug-application K8s docs
- Logging K8s docs
- Logs K8s docs
- VictoriaMetrics Github