Linux Namespaces and Control Groups

From Ever changing code

provide security and isolation by controlling what a process can see
control groups
provide resource management and reporting, by controlling what a process can access

Linux Namespaces

Namespaces has been brought to Linux kernel in version 3.8

Namespaces provide
isolation so that other pieces of the system remain unaffected by whatever is within the namespace. Docker uses namespaces of various kinds to provide the isolation that containers need in order to remain portable and refrain from affecting the remainder of the host system. The purpose of each namespace is to wrap a particular global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource.

Namespaces in Linux kernel (6 ns):

Each Namespace can be given its own set of UIDs and GUIDs. (Docker 1.12+ experimental) maps container users to host users. This can break other isolation items; allows for 32 nested mappings.
IPC (Inter-Process Communication)
eg. swarm services allowed to communicate with containers but not outside; isolates system resources from a process, while giving processes created in an IPC namespace visibility to each other allowing for interprocess communication (aka exchange data). Creates a separate message queue for each container that enables such IPC comms.
UTS (Unix Time Sharing)
namespace isolation of hostname for each container; allows a single system to appear to have a different hostname and domain names to different processes. This namespace determines what hostname and domain name the process running inside that namespace sees.
controls the mountpoints that are visible to each container; allows processes to be mounted in different trees; similar to chroot
PID (Process ID)
provides processes with independent set of process IDs (PIDs); allow to avoid PIDs conflicts
allows containers to have its own network stack of eg. IPs, routing tables, iptables rules, network devices

Namespaces operations

Network namespace
sudo ip netns add test1_ns
sudo ip netns list

# list iptables rules within 'test1_ns' network namespace
sudo ip netns exec test1_ns iptables -L

# create iptables rules within 'test1_ns' network namespace
vagrant@u18cli-3:~$ sudo ip netns exec test1_ns bash # noticed user change because of sudo
root@u18cli-3:~# iptables -A INPUT -p tcp -m tcp --dport 80 -j ACCEPT
root@u18cli-3:~# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:http # <- only exists
                                                                           #    in 'test1_ns'
Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
root@u18cli-3:~# exit          # leave the namespace
vagrant@u18cli-3:~$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         

Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         

Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination

  • pivot_root - moves the root file system of the current process to the directory put_old and makes new_root the new root file system
  • unshare - run program with some namespaces unshared from parent
  • ip netns - manage network namespace

Control Groups

Think as a process containers, to avoid any confusion with Docker Containers got renamed into Control Groups.

Control Groups (Cgroups)
provide resource limitation and reporting capability within the container space. They allow for granular control over what host resources are allocated to container(s). It's Linux kernel feature that limits the resource usage of a process or group of processes.

Common Control Groups (cgrups) subsystems, as new subsystems are constantly developed

blkio (disk)
subsystem allows or limit and measure I/Os for each process group; allows or denies access to devices in a cgroup based on I/o
subsystem allows to monitor CPU by a group of processes, enables you to set weights and keep track of usage per CPU
(cpu account) generates automatic reports on CPU resources used by tasks in a cgroup
allows to pin a process or group of processes to one CPU
allows or denies access to devices by tasks in a cgroup
suspends or resumes tasks in a cgroup. The SIGSTOP signal is sent to the whole container
memory is managed in blocks known as pages. A page is 4096 bytes, it means 1 Megabyte is equal to 256 pages of memory. Memory cgroup allows you track memory use by a process/group of processes down do the specific memory page.
net_cls (network bandwith)
allows tags network packets with classid allowing identification of packets origination from a particular cgroup task. This allows to set QOS based on sources and destinations.
net_prio (network priority)
subsystem provides a way to set the priority of network traffic dynamically.

  • Kernel 2.6.24 - Cgroups v1
  • Kernel 3.1.0 - Cgroups v2

PIDs inside cgroups are allowed to have its own process hierarchy that are independent from each other. It allows one process to live in many trees.