Typically, we use a container program like Docker to work with containers. However, basic containers can also be implemented using a combination of Linux commands. In this article, we will take a look at how containers are isolated and operated through Linux commands.
When studying containers, there is one command that you are bound to hear about: chroot. If you check the manual to see what chroot is, it looks like this.
NAME
chroot - run command or interactive shell with special root directory SYNOPSIS
chroot [OPTION] NEWROOT [COMMAND [ARG]...]
chroot OPTION
DESCRIPTION
Run COMMAND with root directory set to NEWROOT.
This is a command that allows you to run commands in a special root directory or launch an interactive shell, and a command that runs commands in a new root directory with the structure chroot <directory> <command>. Now that you know what chroot is, it is time to see how it works in practice by creating a new folder and running the command chroot.
test@jungnas:~/container$ mkdir new_root
test@jungnas:~/container$ sudo chroot new_root ls
chroot: failed to run command ‘ls’: No such file or directory
It does not work as we thought it would because there is no ls command. The reason is that the folder is empty, so there is no executable file to run the command in the newly created root. To do this, we will use alpine Linux, which is the lightest and most popular distribution for container environments. alpine Linux is also available as a tar.gz file, making it a good distribution to use for this example.test@jungnas:~/container$ mkdir alpinelinux
test@jungnas:~/container$ cd alpinelinux
test@jungnas:~/container/alpinelinux$ ls
test@jungnas:~/container/alpinelinux$ wget https://dl-cdn.alpinelinux.org/alpine/latest-stable/releases/x86_64/alpine-minirootfs-3.17.3-x86_64.tar.gz
Now that we have unzipped it, let's run the chroot command into the alpinelinux directory.
test@jungnas:~/container/alpinelinux$ sudo chroot alpinelinux ls /
bin dev etc home lib media mnt opt proc root run sbin srv sys tmp usr var
I ran the command ls / and you can see what the root directory of the unzipped folder looks like, so why not actually run shell? In alpine Linux, we do not have a bash shell, so we run it with sh.
test@jungnas:~/container/alpinelinux$ sudo chroot alpinelinux sh
/ # ls
bin dev etc home lib media mnt opt proc root run sbin srv sys tmp usr var
/ # cat /etc/hostname
localhost
/ # cat /etc/passwd
root:x:0:0:root:/root:/bin/ash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/mail:/sbin/nologin
news:x:9:13:news:/usr/lib/news:/sbin/nologin
uucp:x:10:14:uucp:/var/spool/uucppublic:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
man:x:13:15:man:/usr/man:/sbin/nologin
postmaster:x:14:12:postmaster:/var/mail:/sbin/nologin
cron:x:16:16:cron:/var/spool/cron:/sbin/nologin
ftp:x:21:21::/var/lib/ftp:/sbin/nologin
sshd:x:22:22:sshd:/dev/null:/sbin/nologin
at:x:25:25:at:/var/spool/cron/atjobs:/sbin/nologin
squid:x:31:31:Squid:/var/cache/squid:/sbin/nologin
xfs:x:33:33:X Font Server:/etc/X11/fs:/sbin/nologin
games:x:35:35:games:/usr/games:/sbin/nologin
cyrus:x:85:12::/usr/cyrus:/sbin/nologin
vpopmail:x:89:89::/var/vpopmail:/sbin/nologin
ntp:x:123:123:NTP:/var/empty:/sbin/nologin
smmsp:x:209:209:smmsp:/var/spool/mqueue:/sbin/nologin
guest:x:405:100:guest:/dev/null:/sbin/nologin
nobody:x:65534:65534:nobody:/:/sbin/nologin
/ # whoami
root
/ # exit
test@jungnas:~/container/alpinelinux$
As you can see, it behaves like a separate distribution from the actual host operating system. However, chroot can cause processes to see the parent root, so pivot_root is preferred over chroot in real-world container implementations.
Linux supports namespaces, a feature that allows you to put a specific process into a namespace and only see what that namespace allows. If you want to see the namespaces currently running on Linux, you can do so with the following command.
test@jungnas:~$ lsns
NS TYPE NPROCS PID USER COMMAND
4026531834 time 2 2949039 test bash
4026531835 cgroup 2 2949039 test bash
4026531836 pid 2 2949039 test bash
4026531837 user 2 2949039 test bash
4026531838 uts 2 2949039 test bash
4026531839 ipc 2 2949039 test bash
4026531840 net 2 2949039 test bash
4026531841 mnt 2 2949039 test bash
This feature allows you to isolate different features of Linux. To change the namespace, we use the `unshare` command. Let's see what this looks like with the `man unshare` command.
NAME unshare - run program in new namespaces
SYNOPSIS
unshare [options] [program [arguments]]
DESCRIPTION
The unshare command creates new namespaces (as specified by the command-line options described
below) and then executes the specified program. If program is not given, then "${SHELL}" is run (default: /bin/sh).
As described, the unshare command creates a new namespace as specified by the command line options described below. You can check the description saying After creating the new namespace, run the specified program. Isolation targets can be almost anything from pids to networks, but in this example, we will isolate pids, cgroups, and make them into containers.
If you have tried running the ps command inside a container, you can see that the pid is always 1, as shown below.
/ # ps -ef
PIDUSERTIMECOMMAND1 root 0:00nginx: master process nginx -g daemon off;
30 nginx 0:08nginx: worker process
31 nginx 0:08nginx: worker process
32 root 0:00 sh
38 root 0:00 ps -ef
This is because we isolated the pid from the host and gave it its own namespace. The command to create and run a new namespace for the pid using the unshare command is as follows.
sudo unshare --pid <command>
Based on the description above, it looks like the sh command should run in the new namespace immediately with pid 1. Let's test to see if that is actually the case.
ubuntu@ip-10-1-1-227:~/container$ sudo unshare --pid sh
# ls
alpinelinux
# ls
sh: 2: Cannot
fork
# ls
sh: 3: Cannot
fork
Something is wrong, I am getting a weird error without even doing a pid check. The cause of this is in the sh process. In the `sudo unshare --pid sh` command, the parent process of the sh process should be unshare, but sudo is the parent. To fix this, you can add the -fork option.
ubuntu@ip-10-1-1-227:~/container$ sudo unshare --pid --fork sh
# ps
PIDTTYTIMECMD1339 pts/100:00:00 sudo
1340 pts/100:00:00 unshare
1341 pts/100:00:00 sh
1342 pts/100:00:00 ps
# ps
PIDTTYTIMECMD1339 pts/100:00:00 sudo
1340 pts/100:00:00 unshare
1341 pts/100:00:00 sh
1343 pts/100:00:00 ps
The fork issue has been resolved, but there are still some oddities. We want to see pid's represented as 1, like containers, but they are looking weird, like 1339. We can see why this is happening by looking at the ps command.
This ps works by reading the virtual files in /proc. This ps does not need to be setuid kmem or have any privileges to run. Do not give this ps any special authorizations.
This is because the ps command reads and represents the /proc directory. We know how to change the root directory via the chroot command, so let's change the root via chroot and mount an additional /proc directory to run the ps command.
ubuntu@ip-10-1-1-227:~/container$ sudo unshare --pid --fork chroot alpinelinux sh
/ # mount -t proc proc proc
/ # ps
PIDUSERTIMECOMMAND1 root 0:00 sh
3 root 0:00 ps
You can see that the pid has been successfully isolated.
cgroup is short for *Control groups*, and it is a feature that limits the resources that can be used by processes belonging to a given group. In ubuntu 22.04, cgroup v2 is used in the container we created above. Therefore, we can limit the CPU usage by following the command below.
1. First, in order to use cgroup, you need to install the following packages.
sudo apt-get install cgroup-tools
2. First, verify that the CPU and CPU SET controllers are available in the /sys/fs/cgroup/cgroup.controllers file.
cat /sys/fs/cgroup/cgroup.controllers
This is a normal situation if the command outputs the following when executed.
cpuset cpu io memory hugetlb pids rdma
3. Enable CPU-specific controllers
echo "+cpu" >> /sys/fs/cgroup/cgroup.subtree_control
echo "+cpuset" >> /sys/fs/cgroup/cgroup.subtree_control
This command can use the cpu and cpuset controllers for subgroups of /sys/fs/cgroup.
mkdir /sys/fs/cgroup/Example/
After creating that folder, you will see that it automatically creates a bunch of files in the subfolder.
ubuntu@ip-10-1-1-227:~/container$ ls /sys/fs/cgroup/Example/
cgroup.controllers cpu.max.burst io.prio.class memory.reclaim
cgroup.events cpu.pressure io.stat memory.stat
cgroup.freeze cpu.stat io.weight memory.swap.current
cgroup.kill cpu.uclamp.max memory.current memory.swap.events
cgroup.max.depth cpu.uclamp.min memory.events memory.swap.high
cgroup.max.descendants cpu.weight memory.events.local memory.swap.max
cgroup.pressure cpu.weight.nice memory.high memory.zswap.current
cgroup.procs cpuset.cpus memory.low memory.zswap.max
cgroup.stat cpuset.cpus.effective memory.max pids.current
cgroup.subtree_control cpuset.cpus.partition memory.min pids.events
cgroup.threads cpuset.mems memory.numa_stat pids.max
cgroup.type cpuset.mems.effective memory.oom.group pids.peak
cpu.idle io.max memory.peak
cpu.max io.pressure memory.pressure
These files are the enabled controllers, and by default, the newly created subgroup inherits access to CPU and memory resources on all systems without restrictions.
5. Enable CPU-specific controllers to get controllers that are only related to the CPU.
echo "+cpu" >> /sys/fs/cgroup/Example/cgroup.subtree_control
echo "+cpuset" >> /sys/fs/cgroup/Example/cgroup.subtree_control
You can only use controllers that control CPU time through this command.
mkdir /sys/fs/cgroup/Example/tasks/
echo "1" > /sys/fs/cgroup/Example/tasks/cpuset.cpus
This directory will be used to put the tasks that will limit the actual CPU underneath.
echo "200000 1000000" > /sys/fs/cgroup/Example/tasks/cpu.max