와탭랩스 블로그 오픈 이벤트 😃
자세히 보기
Tech
2023-10-06
Let’s Make a Simple Container Using Commands
blog_main.jpg

Typically, we use a container program like Docker to work with containers. However, basic containers can also be implemented using a combination of Linux commands. In this article, we will take a look at how containers are isolated and operated through Linux commands.

Changing the root directory with chroot

When studying containers, there is one command that you are bound to hear about: chroot. If you check the manual to see what chroot is, it looks like this.
 
NAME
chroot - run command or interactive shell with special root directory SYNOPSIS

chroot [OPTION] NEWROOT [COMMAND [ARG]...]

chroot OPTION

DESCRIPTION
Run COMMAND with root directory set to NEWROOT.

This is a command that allows you to run commands in a special root directory or launch an interactive shell, and a command that runs commands in a new root directory with the structure chroot <directory> <command>. Now that you know what chroot is, it is time to see how it works in practice by creating a new folder and running the command chroot.



test@jungnas:~/container$ mkdir new_root

test@jungnas:~/container$ sudo chroot new_root ls

chroot: failed to run command ‘ls’: No such file or directory


It does not work as we thought it would because there is no ls command. The reason is that the folder is empty, so there is no executable file to run the command in the newly created root. To do this, we will use alpine Linux, which is the lightest and most popular distribution for container environments. alpine Linux is also available as a tar.gz file, making it a good distribution to use for this example.

test@jungnas:~/container$ mkdir alpinelinux

test@jungnas:~/container$ cd alpinelinux

test@jungnas:~/container/alpinelinux$ ls

test@jungnas:~/container/alpinelinux$ wget https://dl-cdn.alpinelinux.org/alpine/latest-stable/releases/x86_64/alpine-minirootfs-3.17.3-x86_64.tar.gz


Now that we have unzipped it, let's run the chroot command into the alpinelinux directory.



test@jungnas:~/container/alpinelinux$ sudo chroot alpinelinux ls /

bin dev etc home lib media mnt opt proc root run sbin srv sys tmp usr var


I ran the command ls / and you can see what the root directory of the unzipped folder looks like, so why not actually run shell? In alpine Linux, we do not have a bash shell, so we run it with sh.



test@jungnas:~/container/alpinelinux$ sudo chroot alpinelinux sh

/ # ls

bin dev etc home lib media mnt opt proc root run sbin srv sys tmp usr var

/ # cat /etc/hostname

localhost

/ # cat /etc/passwd

root:x:0:0:root:/root:/bin/ash

bin:x:1:1:bin:/bin:/sbin/nologin

daemon:x:2:2:daemon:/sbin:/sbin/nologin

adm:x:3:4:adm:/var/adm:/sbin/nologin

lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

sync:x:5:0:sync:/sbin:/bin/sync

shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown

halt:x:7:0:halt:/sbin:/sbin/halt

mail:x:8:12:mail:/var/mail:/sbin/nologin

news:x:9:13:news:/usr/lib/news:/sbin/nologin

uucp:x:10:14:uucp:/var/spool/uucppublic:/sbin/nologin

operator:x:11:0:operator:/root:/sbin/nologin

man:x:13:15:man:/usr/man:/sbin/nologin

postmaster:x:14:12:postmaster:/var/mail:/sbin/nologin

cron:x:16:16:cron:/var/spool/cron:/sbin/nologin

ftp:x:21:21::/var/lib/ftp:/sbin/nologin

sshd:x:22:22:sshd:/dev/null:/sbin/nologin

at:x:25:25:at:/var/spool/cron/atjobs:/sbin/nologin

squid:x:31:31:Squid:/var/cache/squid:/sbin/nologin

xfs:x:33:33:X Font Server:/etc/X11/fs:/sbin/nologin

games:x:35:35:games:/usr/games:/sbin/nologin

cyrus:x:85:12::/usr/cyrus:/sbin/nologin

vpopmail:x:89:89::/var/vpopmail:/sbin/nologin

ntp:x:123:123:NTP:/var/empty:/sbin/nologin

smmsp:x:209:209:smmsp:/var/spool/mqueue:/sbin/nologin

guest:x:405:100:guest:/dev/null:/sbin/nologin

nobody:x:65534:65534:nobody:/:/sbin/nologin

/ # whoami

root

/ # exit

test@jungnas:~/container/alpinelinux$


As you can see, it behaves like a separate distribution from the actual host operating system. However, chroot can cause processes to see the parent root, so pivot_root is preferred over chroot in real-world container implementations.

Linux Namespace

Linux supports namespaces, a feature that allows you to put a specific process into a namespace and only see what that namespace allows. If you want to see the namespaces currently running on Linux, you can do so with the following command.



test@jungnas:~$ lsns

NS TYPE NPROCS PID USER COMMAND

4026531834 time 2 2949039 test bash

4026531835 cgroup 2 2949039 test bash

4026531836 pid 2 2949039 test bash

4026531837 user 2 2949039 test bash

4026531838 uts 2 2949039 test bash

4026531839 ipc 2 2949039 test bash

4026531840 net 2 2949039 test bash

4026531841 mnt 2 2949039 test bash


This feature allows you to isolate different features of Linux. To change the namespace, we use the `unshare` command. Let's see what this looks like with the `man unshare` command.

NAME unshare - run program in new namespaces

SYNOPSIS
unshare [options] [program [arguments]]

DESCRIPTION
The unshare command creates new namespaces (as specified by the command-line options described
below) and then executes the specified program. If program is not given, then "${SHELL}" is run (default: /bin/sh).

As described, the unshare command creates a new namespace as specified by the command line options described below. You can check the description saying After creating the new namespace, run the specified program. Isolation targets can be almost anything from pids to networks, but in this example, we will isolate pids, cgroups, and make them into containers.

PID Namespaces

If you have tried running the ps command inside a container, you can see that the pid is always 1, as shown below.



/ # ps -ef

PIDUSERTIMECOMMAND1 root 0:00nginx: master process nginx -g daemon off;

30 nginx 0:08nginx: worker process

31 nginx 0:08nginx: worker process

32 root 0:00 sh

38 root 0:00 ps -ef


This is because we isolated the pid from the host and gave it its own namespace. The command to create and run a new namespace for the pid using the unshare command is as follows.



sudo unshare --pid <command>

Based on the description above, it looks like the sh command should run in the new namespace immediately with pid 1. Let's test to see if that is actually the case.



ubuntu@ip-10-1-1-227:~/container$ sudo unshare --pid sh

# ls

alpinelinux

# ls

sh: 2:
Cannot fork

# ls

sh: 3:
Cannot fork

Something is wrong, I am getting a weird error without even doing a pid check. The cause of this is in the sh process. In the `sudo unshare --pid sh` command, the parent process of the sh process should be unshare, but sudo is the parent. To fix this, you can add the -fork option.



ubuntu@ip-10-1-1-227:~/container$ sudo unshare --pid --fork sh

# ps

PIDTTYTIMECMD1339 pts/100:00:00 sudo

1340 pts/100:00:00 unshare

1341 pts/100:00:00 sh

1342 pts/100:00:00 ps

# ps

PIDTTYTIMECMD1339 pts/100:00:00 sudo

1340 pts/100:00:00 unshare

1341 pts/100:00:00 sh

1343 pts/100:00:00 ps


The fork issue has been resolved, but there are still some oddities. We want to see pid's represented as 1, like containers, but they are looking weird, like 1339. We can see why this is happening by looking at the ps command.

This ps works by reading the virtual files in /proc. This ps does not need to be setuid kmem or have any privileges to run. Do not give this ps any special authorizations.

This is because the ps command reads and represents the /proc directory. We know how to change the root directory via the chroot command, so let's change the root via chroot and mount an additional /proc directory to run the ps command.



ubuntu@ip-10-1-1-227:~/container$ sudo unshare --pid --fork chroot alpinelinux sh

/ # mount -t proc proc proc

/ # ps

PIDUSERTIMECOMMAND1 root 0:00 sh

3 root 0:00 ps


You can see that the pid has been successfully isolated.

Cgroup

cgroup is short for *Control groups*, and it is a feature that limits the resources that can be used by processes belonging to a given group. In ubuntu 22.04, cgroup v2 is used in the container we created above. Therefore, we can limit the CPU usage by following the command below.

1. First, in order to use cgroup, you need to install the following packages.

sudo apt-get install cgroup-tools 

2. First, verify that the CPU and CPU SET controllers are available in the /sys/fs/cgroup/cgroup.controllers file.

cat /sys/fs/cgroup/cgroup.controllers           

This is a normal situation if the command outputs the following when executed.

cpuset cpu io memory hugetlb pids rdma             

3. Enable CPU-specific controllers





echo "+cpu" >> /sys/fs/cgroup/cgroup.subtree_control

echo "+cpuset" >> /sys/fs/cgroup/cgroup.subtree_control


This command can use the cpu and cpuset controllers for subgroups of /sys/fs/cgroup.

  1. Create a subgroup named Example under /sys/fs/cgroup.



mkdir /sys/fs/cgroup/Example/

After creating that folder, you will see that it automatically creates a bunch of files in the subfolder.



ubuntu@ip-10-1-1-227:~/container$ ls /sys/fs/cgroup/Example/

cgroup.controllers cpu.max.burst io.prio.class memory.reclaim

cgroup.events cpu.pressure io.stat memory.stat

cgroup.freeze cpu.stat io.weight memory.swap.current

cgroup.kill cpu.uclamp.max memory.current memory.swap.events

cgroup.max.depth cpu.uclamp.min memory.events memory.swap.high

cgroup.max.descendants cpu.weight memory.events.local memory.swap.max

cgroup.pressure cpu.weight.nice memory.high memory.zswap.current

cgroup.procs cpuset.cpus memory.low memory.zswap.max

cgroup.stat cpuset.cpus.effective memory.max pids.current

cgroup.subtree_control cpuset.cpus.partition memory.min pids.events

cgroup.threads cpuset.mems memory.numa_stat pids.max

cgroup.type cpuset.mems.effective memory.oom.group pids.peak

cpu.idle io.max memory.peak

cpu.max io.pressure memory.pressure


These files are the enabled controllers, and by default, the newly created subgroup inherits access to CPU and memory resources on all systems without restrictions.

5. Enable CPU-specific controllers to get controllers that are only related to the CPU.



echo "+cpu" >> /sys/fs/cgroup/Example/cgroup.subtree_control

echo "+cpuset" >> /sys/fs/cgroup/Example/cgroup.subtree_control


You can only use controllers that control CPU time through this command.

  1. Create the /sys/fs/cgroup/Example/tasks/ directory and enable the CPU controller.



mkdir /sys/fs/cgroup/Example/tasks/

echo "1" > /sys/fs/cgroup/Example/tasks/cpuset.cpus


This directory will be used to put the tasks that will limit the actual CPU underneath.

  1. Set the CPU time distribution control so that all processes in the /sys/fs/cgroup/Example/tasks subgroup can only run on the CPU for 0.2 seconds every second, which is one fifth of a second.



echo "200000 1000000" > /sys/fs/cgroup/Example/tasks/cpu.max

와탭 모니터링을 무료로 체험해보세요!