Linux Namespaces and Control Groups
- namespaces
- provide security and isolation by controlling what a process can see
- control groups
- provide resource management and reporting, by controlling what a process can access
Linux Namespaces
Namespaces has been brought to Linux kernel in version 3.8
- Namespaces provide
- isolation so that other pieces of the system remain unaffected by whatever is within the namespace. Docker uses namespaces of various kinds to provide the isolation that containers need in order to remain portable and refrain from affecting the remainder of the host system. The purpose of each namespace is to wrap a particular global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource.
Namespaces in Linux kernel (6 ns):
- User
- Each Namespace can be given its own set of UIDs and GUIDs. (Docker 1.12+ experimental) maps container users to host users. This can break other isolation items; allows for 32 nested mappings.
- IPC (Inter-Process Communication)
- eg. swarm services allowed to communicate with containers but not outside; isolates system resources from a process, while giving processes created in an IPC namespace visibility to each other allowing for interprocess communication (aka exchange data). Creates a separate message queue for each container that enables such IPC comms.
- UTS (Unix Time Sharing)
- namespace isolation of hostname for each container; allows a single system to appear to have a different hostname and domain names to different processes. This namespace determines what hostname and domain name the process running inside that namespace sees.
- Mount
- controls the mountpoints that are visible to each container; allows processes to be mounted in different trees; similar to chroot
- PID (Process ID)
- provides processes with independent set of process IDs (PIDs); allow to avoid PIDs conflicts
- Network
- allows containers to have its own network stack of eg. IPs, routing tables, iptables rules, network devices
Namespaces operations
- Network namespace
sudo ip netns add test1_ns
sudo ip netns list
# list iptables rules within 'test1_ns' network namespace
sudo ip netns exec test1_ns iptables -L
# create iptables rules within 'test1_ns' network namespace
vagrant@u18cli-3:~$ sudo ip netns exec test1_ns bash # noticed user change because of sudo
root@u18cli-3:~# iptables -A INPUT -p tcp -m tcp --dport 80 -j ACCEPT
root@u18cli-3:~# iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:http # <- only exists
                                                                           #    in 'test1_ns'
Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         
Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
root@u18cli-3:~# exit          # leave the namespace
vagrant@u18cli-3:~$ sudo iptables -L
Chain INPUT (policy ACCEPT)
target     prot opt source               destination         
Chain FORWARD (policy ACCEPT)
target     prot opt source               destination         
Chain OUTPUT (policy ACCEPT)
target     prot opt source               destination
- Tooling
- pivot_root- moves the root file system of the current process to the directory put_old and makes new_root the new root file system
- unshare- run program with some namespaces unshared from parent
- ip netns- manage network namespace
Control Groups
Think as a process containers, to avoid any confusion with Docker Containers got renamed into Control Groups.
- Control Groups (Cgroups)
- provide resource limitation and reporting capability within the container space. They allow for granular control over what host resources are allocated to container(s). It's Linux kernel feature that limits the resource usage of a process or group of processes.
Common Control Groups (cgrups) subsystems, as new subsystems are constantly developed
- blkio (disk)
- subsystem allows limit and measure I/Os for each process group
- cpu
- subsystem allows to monitor CPU by a group of processes, enables you to set weights and keep track of usage per CPU
- cpuacct
- (cpu account) generates automatic reports on CPU resources used by tasks in a cgroup
- cpuset
- allows to pin a process or group of processes to one CPU
- devices
- allows or denies access to devices by tasks in a cgroup
- freezer
- suspends or resumes tasks in a cgroup. The SIGSTOP signal is sent to the whole container
- memory
- memory is managed in blocks known as pages. A page is 4096 bytes, it means 1 Megabyte is equal to 256 pages of memory. Memory cgroup allows you track memory use by a process/group of processes down do the specific memory page.
- net_cls (network bandwith)
- allows tags network packets with classid allowing identification of packets origination from a particular cgroup task. This allows to set QOS based on sources and destinations.
- net_prio (network priority)
- subsystem provides a way to set the priority of network traffic dynamically.
- Implementations
- Kernel 2.6.24 - Cgroups v1
- Kernel 3.1.0 - Cgroups v2
PIDs inside cgroups are allowed to have its own process hierarchy that are independent from each other. It allows one process to live in many trees.