Estimated reading time: 17 minutes

  1. Run the Docker daemon as a non-root user (Rootless mode) Estimated reading time: 17 minutes. Rootless mode allows running the Docker daemon and containers as a non-root user to mitigate potential vulnerabilities in the daemon and the container runtime.
  2. Normally, docker containers are run using the user root.I'd like to use a different user, which is no problem using docker's USER directive. But this user should be able to use sudo inside the container.

Rootless mode allows running the Docker daemon and containers as a non-rootuser to mitigate potential vulnerabilities in the daemon andthe container runtime.

In most cases, you will only interact with the Docker CLI. However, running an application with Docker means that you have to run the Docker Daemon with root privileges. It actually binds to a Unix socket instead of a TCP port. By default, users can only access the Unix socket using sudo command, which is owned by the user root.

Rootless mode does not require root privileges even during the installation ofthe Docker daemon, as long as the prerequisites are met.

Rootless mode was introduced in Docker Engine v19.03 as an experimental feature.Rootless mode graduated from experimental in Docker Engine v20.10.

How it works

Rootless mode executes the Docker daemon and containers inside a user namespace.This is very similar to userns-remap mode, except thatwith userns-remap mode, the daemon itself is running with root privileges,whereas in rootless mode, both the daemon and the container are running withoutroot privileges.

Rootless mode does not use binaries with SETUID bits or file capabilities,except newuidmap and newgidmap, which are needed to allow multipleUIDs/GIDs to be used in the user namespace.

Prerequisites

  • You must install newuidmap and newgidmap on the host. These commands are provided by the uidmap package on most distros.

  • /etc/subuid and /etc/subgid should contain at least 65,536 subordinateUIDs/GIDs for the user. In the following example, the user testuser has65,536 subordinate UIDs/GIDs (231072-296607).

Distribution-specific hint

Note: We recommend that you use the Ubuntu kernel.

  • No preparation is needed.

  • overlay2 storage driver is enabled by default(Ubuntu-specific kernel patch).

  • Known to work on Ubuntu 16.04, 18.04, and 20.04.

  • Add kernel.unprivileged_userns_clone=1 to /etc/sysctl.conf (or/etc/sysctl.d) and run sudo sysctl --system.

  • To use the overlay2 storage driver (recommended), runsudo modprobe overlay permit_mounts_in_userns=1 (Debian-specific kernel patch, introduced in Debian 10). Add the configuration to /etc/modprobe.d for persistence.

  • Installing fuse-overlayfs is recommended. Run sudo pacman -S fuse-overlayfs.

  • Add kernel.unprivileged_userns_clone=1 to /etc/sysctl.conf (or/etc/sysctl.d) and run sudo sysctl --system

  • Installing fuse-overlayfs is recommended. Run sudo zypper install -y fuse-overlayfs.

  • sudo modprobe ip_tables iptable_mangle iptable_nat iptable_filter is required.This might be required on other distros as well depending on the configuration.

  • Known to work on openSUSE 15.

  • Installing fuse-overlayfs is recommended. Run sudo dnf install -y fuse-overlayfs.

  • You might need sudo dnf install -y iptables.

  • When SELinux is enabled, you may face can't open lock file /run/xtables.lock: Permission denied error.A workaround for this is to sudo dnf install -y policycoreutils-python-utils && sudo semanage permissive -a iptables_t.This issue is tracked in moby/moby#41230.

  • Known to work on CentOS 8 and Fedora 33.

  • Add user.max_user_namespaces=28633 to /etc/sysctl.conf (or /etc/sysctl.d) and run sudo sysctl --system.

  • systemctl --user does not work by default. Run dockerd-rootless.sh directly without systemd.

Known limitations

  • Only the following storage drivers are supported:
    • overlay2 (only if running with kernel 5.11 or later, or Ubuntu-flavored kernel, or Debian-flavored kernel)
    • fuse-overlayfs (only if running with kernel 4.18 or later, and fuse-overlayfs is installed)
    • vfs
  • Cgroup is supported only when running with cgroup v2 and systemd. See Limiting resources.
  • Following features are not supported:
    • AppArmor
    • Checkpoint
    • Overlay network
    • Exposing SCTP ports
  • To use the ping command, see Routing ping packets.
  • To expose privileged TCP/UDP ports (< 1024), see Exposing privileged ports.
  • IPAddress shown in docker inspect and is namespaced inside RootlessKit’s network namespace.This means the IP address is not reachable from the host without nsenter-ing into the network namespace.
  • Host network (docker run --net=host) is also namespaced inside RootlessKit.

Install

Note

If the system-wide Docker daemon is already running, consider disabling it:$ sudo systemctl disable --now docker.service

If you installed Docker 20.10 or later with RPM/DEB packages, you should have dockerd-rootless-setuptool.sh in /usr/bin.

Run dockerd-rootless-setuptool.sh install as a non-root user to set up the daemon:

If dockerd-rootless-setuptool.sh is not present, you may need to install the docker-ce-rootless-extras package manually, e.g.,

Without

If you do not have permission to run package managers like apt-get and dnf,consider using the installation script available at https://get.docker.com/rootless.

The binaries will be installed at ~/bin.

See Troubleshooting if you faced an error.

Uninstall

To remove the systemd service of the Docker daemon, run dockerd-rootless-setuptool.sh uninstall:

To remove the data directory, run rootlesskit rm -rf ~/.local/share/docker.

To remove the binaries, remove docker-ce-rootless-extras package if you installed Docker with package managers.If you installed Docker with https://get.docker.com/rootless (Install without packages),remove the binary files under ~/bin:

Usage

Daemon

The systemd unit file is installed as ~/.config/systemd/user/docker.service.

Use systemctl --user to manage the lifecycle of the daemon:

To launch the daemon on system startup, enable the systemd service and lingering:

Starting Rootless Docker as a systemd-wide service (/etc/systemd/system/docker.service)is not supported, even with the User= directive.

To run the daemon directly without systemd, you need to run dockerd-rootless.sh instead of dockerd.

The following environment variables must be set:

  • $HOME: the home directory
  • $XDG_RUNTIME_DIR: an ephemeral directory that is only accessible by the expected user, e,g, ~/.docker/run.The directory should be removed on every host shutdown.The directory can be on tmpfs, however, should not be under /tmp.Locating this directory under /tmp might be vulnerable to TOCTOU attack.

Remarks about directory paths:

  • The socket path is set to $XDG_RUNTIME_DIR/docker.sock by default.$XDG_RUNTIME_DIR is typically set to /run/user/$UID.
  • The data dir is set to ~/.local/share/docker by default.The data dir should not be on NFS.
  • The daemon config dir is set to ~/.config/docker by default.This directory is different from ~/.docker that is used by the client.

Client

You need to specify the socket path explicitly.

To specify the socket path using $DOCKER_HOST:

To specify the socket path using docker context:

Best practices

Rootless Docker in Docker

To run Rootless Docker inside “rootful” Docker, use the docker:<version>-dind-rootlessimage instead of docker:<version>-dind.

The docker:<version>-dind-rootless image runs as a non-root user (UID 1000).However, --privileged is required for disabling seccomp, AppArmor, and mountmasks.

Expose Docker API socket through TCP

To expose the Docker API socket through TCP, you need to launch dockerd-rootless.shwith DOCKERD_ROOTLESS_ROOTLESSKIT_FLAGS='-p 0.0.0.0:2376:2376/tcp'.

Expose Docker API socket through SSH

To expose the Docker API socket through SSH, you need to make sure $DOCKER_HOSTis set on the remote host.

Routing ping packets

On some distributions, ping does not work by default.

Add net.ipv4.ping_group_range = 0 2147483647 to /etc/sysctl.conf (or/etc/sysctl.d) and run sudo sysctl --system to allow using ping.

Exposing privileged ports

To expose privileged ports (< 1024), set CAP_NET_BIND_SERVICE on rootlesskit binary.

Or add net.ipv4.ip_unprivileged_port_start=0 to /etc/sysctl.conf (or/etc/sysctl.d) and run sudo sysctl --system.

Limiting resources

Limiting resources with cgroup-related docker run flags such as --cpus, --memory, --pids-limitis supported only when running with cgroup v2 and systemd.See Changing cgroup version to enable cgroup v2.

If docker info shows none as Cgroup Driver, the conditions are not satisfied.When these conditions are not satisfied, rootless mode ignores the cgroup-related docker run flags.See Limiting resources without cgroup for workarounds.

If docker info shows systemd as Cgroup Driver, the conditions are satisfied.However, typically, only memory and pids controllers are delegated to non-root users by default.

To allow delegation of all controllers, you need to change the systemd configuration as follows:

Note

Delegating cpuset requires systemd 244 or later.

Limiting resources without cgroup

Even when cgroup is not available, you can still use the traditional ulimit and cpulimit,though they work in process-granularity rather than in container-granularity,and can be arbitrarily disabled by the container process.

Without

Install ipsw macos catalina. For example:

  • To limit CPU usage to 0.5 cores (similar to docker run --cpus 0.5):docker run <IMAGE> cpulimit --limit=50 --include-children <COMMAND>
  • To limit max VSZ to 64MiB (similar to docker run --memory 64m):docker run <IMAGE> sh -c 'ulimit -v 65536; <COMMAND>'

  • To limit max number of processes to 100 per namespaced UID 2000(similar to docker run --pids-limit=100):docker run --user 2000 --ulimit nproc=100 <IMAGE> <COMMAND>

Troubleshooting

Errors when starting the Docker daemon

[rootlesskit:parent] error: failed to start the child: fork/exec /proc/self/exe: operation not permitted

This error occurs mostly when the value of /proc/sys/kernel/unprivileged_userns_clone is set to 0:

To fix this issue, add kernel.unprivileged_userns_clone=1 to/etc/sysctl.conf (or /etc/sysctl.d) and run sudo sysctl --system.

[rootlesskit:parent] error: failed to start the child: fork/exec /proc/self/exe: no space left on device

This error occurs mostly when the value of /proc/sys/user/max_user_namespaces is too small:

Docker

To fix this issue, add user.max_user_namespaces=28633 to/etc/sysctl.conf (or /etc/sysctl.d) and run sudo sysctl --system.

[rootlesskit:parent] error: failed to setup UID/GID map: failed to compute uid/gid map: No subuid ranges found for user 1001 (“testuser”)

This error occurs when /etc/subuid and /etc/subgid are not configured. See Prerequisites.

could not get XDG_RUNTIME_DIR

Docker Run Without Sudo

This error occurs when $XDG_RUNTIME_DIR is not set.

Docker Run Without Sudo Code

On a non-systemd host, you need to create a directory and then set the path:

Note:You must remove the directory every time you log out.

On a systemd host, log into the host using pam_systemd (see below).The value is automatically set to /run/user/$UID and cleaned up on every logout.

systemctl --user fails with “Failed to connect to bus: No such file or directory”

This error occurs mostly when you switch from the root user to an non-root user with sudo:

Instead of sudo -iu <USERNAME>, you need to log in using pam_systemd. For example:

  • Log in through the graphic console
  • ssh <USERNAME>@localhost
  • machinectl shell <USERNAME>@

The daemon does not start up automatically

You need sudo loginctl enable-linger $(whoami) to enable the daemon to startup automatically. See Usage.

iptables failed: iptables -t nat -N DOCKER: Fatal: can’t open lock file /run/xtables.lock: Permission denied

This error may happen when SELinux is enabled on the host.

A known workaround is to run the following commands to disable SELinux for iptables:

This issue is tracked in moby/moby#41230.

docker pull errors

docker: failed to register layer: Error processing tar file(exit status 1): lchown <FILE>: invalid argument

This error occurs when the number of available entries in /etc/subuid or/etc/subgid is not sufficient. The number of entries required vary acrossimages. However, 65,536 entries are sufficient for most images. SeePrerequisites.

docker: failed to register layer: ApplyLayer exit status 1 stdout: stderr: lchown <FILE>: operation not permitted

This error occurs mostly when ~/.local/share/docker is located on NFS.

A workaround is to specify non-NFS data-root directory in ~/.config/docker/daemon.json as follows:

docker run errors

--cpus, --memory, and --pids-limit are ignored

This is an expected behavior on cgroup v1 mode.To use these flags, the host needs to be configured for enabling cgroup v2.For more information, see Limiting resources.

Networking errors

Can't Run Docker Without Sudo

docker run -p fails with cannot expose privileged port

docker run -p fails with this error when a privileged port (< 1024) is specified as the host port.

When you experience this error, consider using an unprivileged port instead. For example, 8080 instead of 80.

To allow exposing privileged ports, see Exposing privileged ports.

ping doesn’t work

Ping does not work when /proc/sys/net/ipv4/ping_group_range is set to 1 0:

For details, see Routing ping packets.

Without

IPAddress shown in docker inspect is unreachable

This is an expected behavior, as the daemon is namespaced inside RootlessKit’snetwork namespace. Use docker run -p instead.

--net=host doesn’t listen ports on the host network namespace

This is an expected behavior, as the daemon is namespaced inside RootlessKit’snetwork namespace. Use docker run -p instead.

Network is slow

Docker with rootless mode uses slirp4netns as the default network stack if slirp4netns v0.4.0 or later is installed.If slirp4netns is not installed, Docker falls back to VPNKit.

Installing slirp4netns may improve the network throughput.See RootlessKit documentation for the benchmark result.

Also, changing MTU value may improve the throughput.The MTU value can be specified by adding Environment='DOCKERD_ROOTLESS_ROOTLESSKIT_MTU=<INTEGER>'to ~/.config/systemd/user/docker.service and then running systemctl --user daemon-reload.

docker run -p does not propagate source IP addresses

This is because Docker with rootless mode uses RootlessKit’s builtin port driver by default.

The source IP addresses can be propagated by adding Environment='DOCKERD_ROOTLESS_ROOTLESSKIT_PORT_DRIVER=slirp4netns'to ~/.config/systemd/user/docker.service and then running systemctl --user daemon-reload.

Note that this configuration decreases throughput.See RootlessKit documentation for the benchmark result.

Tips for debugging

Entering into dockerd namespaces

The dockerd-rootless.sh script executes dockerd in its own user, mount, and network namespaces.

For debugging, you can enter the namespaces by runningnsenter -U --preserve-credentials -n -m -t $(cat $XDG_RUNTIME_DIR/docker.pid).

security, namespaces, rootless
Coments are closed
Scroll to top