Actions

Kubernetes/Monitoring

From Ever changing code

< Kubernetes

Monitor cluster resources

Metric-server

In order to get cluster resources you need a metric collector plugin. Popular one was heapster now deprecated replaced by metric-server.


Install metrics-server

# General installation
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml
kubectl get deployment metrics-server -n kube-system
NAME             READY   UP-TO-DATE   AVAILABLE   AGE
metrics-server   1/1     1            1           6m

# Alternative
git clone https://github.com/kubernetes-incubator/metrics-server.git
kubectl apply -f ~/metrics-server/deploy/1.8+/


EKS Installation [1]

wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml -O metrics-server-0.3.6.yaml
# Edit and add following commnd with arguments to the deploy/kubernetes/metrics-server-deployment.yaml:
        command:
          - /metrics-server
          - --logtostderr
          - --kubelet-insecure-tls=true
          - --kubelet-preferred-address-types=InternalIP
          - --v=2

# kubelet-insecure-tls – do not check kubelet-clients CA certificate on nodes
# kubelet-preferred-address-types – how to find resources in the Kubernetes space – by using Hostname, InternalDNS, 
#                                   InternalIP, ExternalDNS or ExternalIP, for the EKS set it to the InternalIP value
# v=2 – logs detalization level

kubectl apply -f metrics-server-0.3.6.yaml
stern metrics-server -n kube-system


Get metrics, you may need to wait 1-2 minutes to complete first metrics scrape [2]

# verify metrics server API
kubectl get --raw /apis/metrics.k8s.io/
{"kind":"APIGroup","apiVersion":"v1","name":"metrics.k8s.io","versions":[{"groupVersion":"metrics.k8s.io/v1beta1","version":"v1beta1"}],"preferredVersion":{"groupVersion":"metrics.k8s.io/v1beta1","version":"v1beta1"}}

kubectl get apiservices | grep metrics
v1beta1.metrics.k8s.io                 kube-system/metrics-server   True        30m

kubectl top node               # CPU,memory utilization of the nodes in your cluster
kubectl top pods               # CPU,memory utilization of the pods in your cluster
kubectl top pods -A            # CPU,memory of pods in all namespaces
kubectl top pods -A --sort-by=memory
kubectl top pod -l run=<label> # CPU and memory of pods with a label selector:
kubectl top pod <pod-name>     # CPU,memory of a specific pod
kubectl top pods --containers  # CPU,memory of the containers inside the pod


[1] EKS errors if installed oob unable to fully scrape metrics:

metrics-server-aaaaaaaaaa-h64c5 metrics-server E0714 15:59:42.204640       1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:ip-10-35-70-169.eu-west-1.compute.internal: unable to fetch metrics from Kubelet ip-10-35-70-169.eu-west-1.compute.internal (ip-10-35-70-169.dev.acme.com): Get https://ip-10-35-70-169.dev.acme.com:10250/stats/summary?only_cpu_and_memory=true: dial tcp: lookup ip-10-35-70-169.dev.acme.com on 172.20.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:


[2] Working metric-server scrapes metrics by default every 1 minute.

metrics-server-bbbbbbbbbb-s5pn5 metrics-server I0714 16:20:07.784364       1 manager.go:95] Scraping metrics from 4 sources
metrics-server-bbbbbbbbbb-s5pn5 metrics-server I0714 16:20:07.787577       1 manager.go:120] Querying source: kubelet_summary:ip-10-00-64-185.eu-west-1.compute.internal
metrics-server-bbbbbbbbbb-s5pn5 metrics-server I0714 16:20:07.812605       1 manager.go:120] Querying source: kubelet_summary:ip-10-00-70-169.eu-west-1.compute.internal
metrics-server-bbbbbbbbbb-s5pn5 metrics-server I0714 16:20:07.814077       1 manager.go:120] Querying source: kubelet_summary:ip-10-00-68-179.eu-west-1.compute.internal
metrics-server-bbbbbbbbbb-s5pn5 metrics-server I0714 16:20:07.814843       1 manager.go:120] Querying source: kubelet_summary:ip-10-00-69-23.eu-west-1.compute.internal
metrics-server-bbbbbbbbbb-s5pn5 metrics-server I0714 16:20:07.820754       1 manager.go:148] ScrapeMetrics: time: 36.362483ms, nodes: 4, pods: 57

cAdvisor deprecated in v1.11

Every node in a Kubernetes cluster has a Kubelet process. Within each Kubelet is a cAdvisor process. The cAdvisor is continuously gathering metrics about the state of the cluster. It's always available

minikube start --extra-config=kubelet.CAdvisorPort=4194
kubectl proxy &          # open a proxy to the Kubernetes API port
open $(minikube ip):4194 # cAdvisor also serves up the metrics is a helpful HTML format

# Each node provide statistics that are provided by cAdvisor. Access the node stats
curl localhost:8001/api/v1/nodes/$(kubectl get nodes -o=jsonpath="{.items[0].metadata.name}")/proxy/stats/

# Kubernetes API also gather the cAdvisor metrics at /metrics
curl localhost:8001/metrics

Liveness and Readiness probes

Check this Visual explanation

  • readinessProbe - checks if a pod is ready to receive a client requests, when passed, then the pod is added to endpoint. When the probe fails - the pod is not restarted, instead removed from endpoint.
  • livenessProbe - when the probe fails, pod gets restarted

Get service endpoints. Only healthy and ready pods will be added to the endpoint

kubectl get endpoint


Liveness and readiness probes in both Pod and Deployment manifests are at .spec.containers.image level

apiVersion: v1
kind: Pod
metadata:
  name: liveness-readiness-pod
spec:
  containers:
  - image: nginx
    name: main
    livenessProbe:
      httpGet:         # exec: or tcpSocket:
        path: /healthz # not all containers have this endpoint
        port: 8081
    readinessProbe:
      httpGet:
        path: /
        port: 80
      initialDelaySeconds: 5 # default, tell kubelet to wait 5 second after container starts, before performing the first probe
      periodSeconds: 5       # default, tell kueblet to run probe ever 5s

Logs

Container logs

Containerized applications usually write their logs to STDOUT and STDERR instead of writing their logs to files. Docker then redirects those streams to files. You can retrieve those files with the kubectl logs


These are stored on nodes in /var/log/ directory and contain everything containers send to STDOUT.

  • /var/log/containers/ contains container logs, these are symlinks to ../pods/
  • /var/log/containers/ contains directory per each pod in form <namespace-<rs|deployment>/<pod-name>/0.log(logfile)
  • 0.log it's a symlink to /var/lib/docker/containers/uid-part1/uid-part2-json.log
$ ls -l /var/log/containers
total 56
lrwxrwxrwx 1 root root 101 Oct  7 06:51 coredns-5644d7b6d9-hztth_kube-system_coredns-9de9395495186177f5112d795ca950dd0227e6f025f40c83ddf2a99c56802939.log -> /var/log/pods/kube-system_coredns-5644d7b6d9-hztth_5da159b3-64e7-48e4-b9f8-003f9623481d/coredns/0.log
...


In case your container logs multiple files, it will be difficult to distinguish them using kubectl logs command. Therefore you can introduce sidecars containers that tail individual logs and access them like that:

  • kubectl logs <pod> container-log-1
  • kubectl logs <pod> container-log-2

kubelet runs as a process therefore writes logs to system location /var/log journalctl -u kubelet.service </source>


Retrieve logs

kubectl logs <pod> <container> # container name is optional for a single container pods
kubectl logs <pod> <container> --previous | -p flag # in case the container has crashed
kubectl logs <pod> --all-containers=true
kubectl logs --since=10m <pod>
kubectl logs deployment/<pod> -c <container> # view the logs from a container within a pod within a deployment
kubectl logs --tail=20 haproxy               # tail x lines
kubectl logs -l app=haproxy                  # logs from containers matching a label


Kubernetes worker nodes docker log configuration - size

The Docker runtime log configuration is setup on each node in /etc/docker/daemon.json

{
  "bridge": "none",
  "log-driver": "json-file",
  "log-opts": {
    "max-size": "10m",
    "max-file": "10"
  },
  "live-restore": true,
  "max-concurrent-downloads": 10
}

Kubernetes logs where are they coming from

Logs from the STDOUT and STDERR of containers in the pod are captured and stored inside files in /var/log/containers. This is what is presented when kubectl log is run. In order to understand why output from commands run by kubectl exec is not shown when running kubectl log, let's have a look how it all works with an example:

# Launch a pod running ubuntu that are sleeping forever
kubectl run test --image=ubuntu --restart=Never -- sleep infinity
# Exec into it
kubectl exec -it test bash


Seen from inside the container it is the STDOUT and STDERR of PID 1 that are being captured. When you do a kubectl exec into the container a new process is created living alongside PID 1:

root@test:/# ps -auxf
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         7  0.0  0.0  18504  3400 pts/0    Ss   10:02   0:00 bash
root        19  0.0  0.0  34396  2908 pts/0    R+   10:05   0:00  \_ ps -auxf
root         1  0.0  0.0   4528   836 ?        Ss   10:01   0:00 sleep infinity


Redirecting to STDOUT is not working because /dev/stdout is a symlink to the process accessing it (/proc/self/fd/1 rather than /proc/1/fd/1).

root@test:/# ls -lrt /dev/stdout
lrwxrwxrwx 1 root root 15 Nov  5 10:01 /dev/stdout -> /proc/self/fd/1


In order to see the logs from commands run with kubectl exec the logs need to be redirected to the streams that are captured by the kubelet (STDOUT and STDERR of pid 1). This can be done by redirecting output to /proc/1/fd/1.

root@test:/# echo "send-to-kubernetes-container-log" > /proc/1/fd/1


Exiting the interactive shell and checking the logs using kubectl logs should now show the output

$> kubectl logs test
send-to-kubernetes-container-log

Termination message

WKubernetes allows to write a custom message to a custom file on termination. This message can be view directly using kubectl describe in Last State: Termination, Message: <custom message>

apiVersion: v1
kind: Pod
metadata:
  name: pod2
spec:
  containers:
  - image: busybox
    name: main
    command:
    - sh
    - -c
    - 'echo "I say that this container has been terminated at $(date)" > /var/termination-reason ; exit 1'
    terminationMessagePath: /var/termination-reason

Troubelshooting

# get a yaml without status information (almost clean yaml manifest)
kubectl -n web pod <failing-pod> -oyaml --export

References