Let’s Make a Simple Container Using Commands

Typically, we use a container program like Docker to work with containers. However, basic containers can also be implemented using a combination of Linux commands. In this article, we will take a look at how containers are isolated and operated through Linux commands.

Changing the root directory with chroot

When studying containers, there is one command that you are bound to hear about: chroot. If you check the manual to see what chroot is, it looks like this.

NAME
chroot - run command or interactive shell with special root directory SYNOPSIS

chroot [OPTION] NEWROOT [COMMAND [ARG]...]

chroot OPTION

DESCRIPTION
Run COMMAND with root directory set to NEWROOT.

This is a command that allows you to run commands in a special root directory or launch an interactive shell, and a command that runs commands in a new root directory with the structure chroot <directory> <command>. Now that you know what chroot is, it is time to see how it works in practice by creating a new folder and running the command chroot.

test@jungnas:~/container$ mkdir new_root test@jungnas:~/container$ sudo chroot new_root ls chroot: failed to run command ‘ls’: No such file or directory

It does not work as we thought it would because there is no ls command. The reason is that the folder is empty, so there is no executable file to run the command in the newly created root. To do this, we will use alpine Linux, which is the lightest and most popular distribution for container environments. alpine Linux is also available as a tar.gz file, making it a good distribution to use for this example.

test@jungnas:~/container$ mkdir alpinelinux test@jungnas:~/container$ cd alpinelinux test@jungnas:~/container/alpinelinux$ ls test@jungnas:~/container/alpinelinux$ wget https://dl-cdn.alpinelinux.org/alpine/latest-stable/releases/x86_64/alpine-minirootfs-3.17.3-x86_64.tar.gz

Now that we have unzipped it, let's run the chroot command into the alpinelinux directory.

test@jungnas:~/container/alpinelinux$ sudo chroot alpinelinux ls / bin dev etc home lib media mnt opt proc root run sbin srv sys tmp usr var

I ran the command ls / and you can see what the root directory of the unzipped folder looks like, so why not actually run shell? In alpine Linux, we do not have a bash shell, so we run it with sh.

test@jungnas:~/container/alpinelinux$ sudo chroot alpinelinux sh / # ls bin dev etc home lib media mnt opt proc root run sbin srv sys tmp usr var / # cat /etc/hostname localhost / # cat /etc/passwd root:x:0:0:root:/root:/bin/ash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin sync:x:5:0:sync:/sbin:/bin/sync shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown halt:x:7:0:halt:/sbin:/sbin/halt mail:x:8:12:mail:/var/mail:/sbin/nologin news:x:9:13:news:/usr/lib/news:/sbin/nologin uucp:x:10:14:uucp:/var/spool/uucppublic:/sbin/nologin operator:x:11:0:operator:/root:/sbin/nologin man:x:13:15:man:/usr/man:/sbin/nologin postmaster:x:14:12:postmaster:/var/mail:/sbin/nologin cron:x:16:16:cron:/var/spool/cron:/sbin/nologin ftp:x:21:21::/var/lib/ftp:/sbin/nologin sshd:x:22:22:sshd:/dev/null:/sbin/nologin at:x:25:25:at:/var/spool/cron/atjobs:/sbin/nologin squid:x:31:31:Squid:/var/cache/squid:/sbin/nologin xfs:x:33:33:X Font Server:/etc/X11/fs:/sbin/nologin games:x:35:35:games:/usr/games:/sbin/nologin cyrus:x:85:12::/usr/cyrus:/sbin/nologin vpopmail:x:89:89::/var/vpopmail:/sbin/nologin ntp:x:123:123:NTP:/var/empty:/sbin/nologin smmsp:x:209:209:smmsp:/var/spool/mqueue:/sbin/nologin guest:x:405:100:guest:/dev/null:/sbin/nologin nobody:x:65534:65534:nobody:/:/sbin/nologin / # whoami root / # exit test@jungnas:~/container/alpinelinux$

As you can see, it behaves like a separate distribution from the actual host operating system. However, chroot can cause processes to see the parent root, so pivot_root is preferred over chroot in real-world container implementations.

Linux Namespace

Linux supports namespaces, a feature that allows you to put a specific process into a namespace and only see what that namespace allows. If you want to see the namespaces currently running on Linux, you can do so with the following command.

test@jungnas:~$ lsns NS TYPE NPROCS PID USER COMMAND 4026531834 time 2 2949039 test bash 4026531835 cgroup 2 2949039 test bash 4026531836 pid 2 2949039 test bash 4026531837 user 2 2949039 test bash 4026531838 uts 2 2949039 test bash 4026531839 ipc 2 2949039 test bash 4026531840 net 2 2949039 test bash 4026531841 mnt 2 2949039 test bash

This feature allows you to isolate different features of Linux. To change the namespace, we use the `unshare` command. Let's see what this looks like with the `man unshare` command.

NAME unshare - run program in new namespaces

SYNOPSIS
unshare [options] [program [arguments]]

DESCRIPTION
The unshare command creates new namespaces (as specified by the command-line options described
below) and then executes the specified program. If program is not given, then "${SHELL}" is run (default: /bin/sh).

As described, the unshare command creates a new namespace as specified by the command line options described below. You can check the description saying After creating the new namespace, run the specified program. Isolation targets can be almost anything from pids to networks, but in this example, we will isolate pids, cgroups, and make them into containers.

PID Namespaces

If you have tried running the ps command inside a container, you can see that the pid is always 1, as shown below.

/ # ps -ef PIDUSERTIMECOMMAND1 root 0:00nginx: master process nginx -g daemon off; 30 nginx 0:08nginx: worker process 31 nginx 0:08nginx: worker process 32 root 0:00 sh 38 root 0:00 ps -ef

This is because we isolated the pid from the host and gave it its own namespace. The command to create and run a new namespace for the pid using the unshare command is as follows.

sudo unshare --pid <command>

Based on the description above, it looks like the sh command should run in the new namespace immediately with pid 1. Let's test to see if that is actually the case.

ubuntu@ip-10-1-1-227:~/container$ sudo unshare --pid sh # ls alpinelinux # ls sh: 2:Cannotfork # ls sh: 3:Cannot fork

Something is wrong, I am getting a weird error without even doing a pid check. The cause of this is in the sh process. In the `sudo unshare --pid sh` command, the parent process of the sh process should be unshare, but sudo is the parent. To fix this, you can add the -fork option.

ubuntu@ip-10-1-1-227:~/container$ sudo unshare --pid --fork sh # ps PIDTTYTIMECMD1339 pts/100:00:00 sudo 1340 pts/100:00:00 unshare 1341 pts/100:00:00 sh 1342 pts/100:00:00 ps # ps PIDTTYTIMECMD1339 pts/100:00:00 sudo 1340 pts/100:00:00 unshare 1341 pts/100:00:00 sh 1343 pts/100:00:00 ps

The fork issue has been resolved, but there are still some oddities. We want to see pid's represented as 1, like containers, but they are looking weird, like 1339. We can see why this is happening by looking at the ps command.

This ps works by reading the virtual files in /proc. This ps does not need to be setuid kmem or have any privileges to run. Do not give this ps any special authorizations.

This is because the ps command reads and represents the /proc directory. We know how to change the root directory via the chroot command, so let's change the root via chroot and mount an additional /proc directory to run the ps command.

ubuntu@ip-10-1-1-227:~/container$ sudo unshare --pid --fork chroot alpinelinux sh / # mount -t proc proc proc / # ps PIDUSERTIMECOMMAND1 root 0:00 sh 3 root 0:00 ps

You can see that the pid has been successfully isolated.

Cgroup

cgroup is short for *Control groups*, and it is a feature that limits the resources that can be used by processes belonging to a given group. In ubuntu 22.04, cgroup v2 is used in the container we created above. Therefore, we can limit the CPU usage by following the command below.

1. First, in order to use cgroup, you need to install the following packages.

sudo apt-get install cgroup-tools

2. First, verify that the CPU and CPU SET controllers are available in the /sys/fs/cgroup/cgroup.controllers file.

cat /sys/fs/cgroup/cgroup.controllers

This is a normal situation if the command outputs the following when executed.

cpuset cpu io memory hugetlb pids rdma

3. Enable CPU-specific controllers

echo "+cpu" >> /sys/fs/cgroup/cgroup.subtree_control echo "+cpuset" >> /sys/fs/cgroup/cgroup.subtree_control

This command can use the cpu and cpuset controllers for subgroups of /sys/fs/cgroup.

Create a subgroup named Example under /sys/fs/cgroup.

mkdir /sys/fs/cgroup/Example/

After creating that folder, you will see that it automatically creates a bunch of files in the subfolder.

ubuntu@ip-10-1-1-227:~/container$ ls /sys/fs/cgroup/Example/ cgroup.controllers cpu.max.burst io.prio.class memory.reclaim cgroup.events cpu.pressure io.stat memory.stat cgroup.freeze cpu.stat io.weight memory.swap.current cgroup.kill cpu.uclamp.max memory.current memory.swap.events cgroup.max.depth cpu.uclamp.min memory.events memory.swap.high cgroup.max.descendants cpu.weight memory.events.local memory.swap.max cgroup.pressure cpu.weight.nice memory.high memory.zswap.current cgroup.procs cpuset.cpus memory.low memory.zswap.max cgroup.stat cpuset.cpus.effective memory.max pids.current cgroup.subtree_control cpuset.cpus.partition memory.min pids.events cgroup.threads cpuset.mems memory.numa_stat pids.max cgroup.type cpuset.mems.effective memory.oom.group pids.peak cpu.idle io.max memory.peak cpu.max io.pressure memory.pressure

These files are the enabled controllers, and by default, the newly created subgroup inherits access to CPU and memory resources on all systems without restrictions.

5. Enable CPU-specific controllers to get controllers that are only related to the CPU.

echo "+cpu" >> /sys/fs/cgroup/Example/cgroup.subtree_control echo "+cpuset" >> /sys/fs/cgroup/Example/cgroup.subtree_control

You can only use controllers that control CPU time through this command.

Create the /sys/fs/cgroup/Example/tasks/ directory and enable the CPU controller.

mkdir /sys/fs/cgroup/Example/tasks/ echo "1" > /sys/fs/cgroup/Example/tasks/cpuset.cpus

This directory will be used to put the tasks that will limit the actual CPU underneath.

Set the CPU time distribution control so that all processes in the /sys/fs/cgroup/Example/tasks subgroup can only run on the CPU for 0.2 seconds every second, which is one fifth of a second.

echo "200000 1000000" > /sys/fs/cgroup/Example/tasks/cpu.max

목차

Changing the root directory with chroot

Linux Namespace

PID Namespaces

Cgroup