Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

volume idmap fails on rootless containers but succeeds with sudo (should work with both) #1632

Closed
james-lawrence opened this issue Jan 1, 2025 · 5 comments

Comments

@james-lawrence
Copy link

james-lawrence commented Jan 1, 2025

some background here - just in case its needed.

Problem: I want to run a rootless container starting systemd using /usr/sbin/init with a volume mounted with a specific idmap. currently this fails, however running the same command with sudo succeeds and does what I want, and running the command without the idmap succeeds. this implies the following:

  • the mount itself is valid, host permissions are good.
  • the idmap does what I want as confirmed with the sudo version.

the problem lies with the fact the rootless command with the idmap fails with a mount_setattr EPERM error. from what I've read about the mount_setattr command what I'm trying to do should work.

using the minimal example:

podman run -it -v "$(pwd)/test:/opt/test:rw" --userns host ubuntu:latest bash -c 'cat /proc/self/uid_map /proc/self/gid_map && ls -lha /opt/test'
         0       1000          1
         1     100000      65536
         0       1000          1
         1     100000      65536
total 8.0K
drwxr-xr-x 2 root root 4.0K Dec 31 18:29 .
drwxr-xr-x 3 root root 4.0K Jan  1 17:08 ..

mounting the volume is clearly possible in the rootless context.

sudo podman run -it -v "$(pwd)/test:/opt/test:rw,idmap=uids=1000-1000-1;gids=1000-1000-1" --userns host ubuntu:latest bash -c 'cat /proc/self/uid_map /proc/self/gid_map && ls -lha /opt/test'
         0          0 4294967295
         0          0 4294967295
total 8.0K
drwxr-xr-x 2 ubuntu ubuntu 4.0K Dec 31 18:29 .
drwxr-xr-x 1 root   root   4.0K Jan  1 17:05 ..

the idmap clearly is workable.

podman run -it -v "$(pwd)/test:/opt/test:rw,idmap=uids=1000-1000-1;gids=1000-1000-1" --userns host ubuntu:latest bash -c 'cat /proc/self/uid_map /proc/self/gid_map && ls -lha /opt/test'
Error: crun: mount_setattr `/opt/test`: Operation not permitted: OCI permission denied

either the volume can be mounted or it cant. the mapping is either valid or it isnt. adding a idmap shouldn't suddenly choke.

taking a look at mount_setattr there are 4 reasons for mount_setattr to return EPERM:

One of the mounts had at least one of MOUNT_ATTR_NOATIME, MOUNT_ATTR_NODEV, MOUNT_ATTR_NODIRATIME, MOUNT_ATTR_NOEXEC, MOUNT_ATTR_NOSUID, or MOUNT_ATTR_RDONLY set and the flag is locked. Mount attributes becomes locked on a mount if: ...

MOUNT_ATTR_NOATIME - I'm assuming this is unset.
MOUNT_ATTR_NODIRATIME - I'm assuming this is unset.
MOUNT_ATTR_NOEXEC - docs be default this is not set.
MOUNT_ATTR_NOSUID - docs by default this is set.
MOUNT_ATTR_RDONLY - docs by default volumes are mounted RW.

Defaults mean setting the idmap should fail this by default. due to NOSUID being set by default. but since it doesn't in the sudo case I'm guessing the sudo case prevents the flag from getting locked.

A valid file descriptor value was specified in userns_fd, but the file descriptor refers to the initial user namespace.

Im hoping this isnt the case.

An attempt was made to add an ID mapping to a mount that is already ID mapped.

also possible, but I'm guessing not the case due to sudo working.

The caller does not have CAP_SYS_ADMIN in the initial user namespace.

this is possible and root would resolve this problem.

Taking what we've learned:

# ensuring my current user has the sys_admin capability.
± capsh --print | grep -i current
# output
#  Current: cap_sys_admin=i
#  Current IAB: cap_sys_admin

# ensuring podman has the sys_admin capability (purely for debugging this) 
sudo setcap -r /usr/bin/podman
sudo setcap 'cap_sys_admin=ep' /usr/bin/podman
getcap /usr/bin/podman
# output: /usr/bin/podman cap_sys_admin=ep

# ensure both noexec, nosuid, and rdonly are not set and give the container the sys_admin cap just in case.
± podman run -it -v "$(pwd)/test:/opt/test:rw,exec,suid,idmap=uids=1000-1000-1;gids=1000-1000-1" --userns host --cap-add=sys_admin ubuntu:latest bash -c 'cat /proc/self/uid_map /proc/self/gid_map && ls -lha /opt/test'
# output:  Error: crun: mount_setattr `/opt/test`: Operation not permitted: OCI permission denied

any guidance would be appreciated. my current guess is its the lack of sys_admin cap for my current user and that I didn't properly set my user to have the capability in the initial user namespace. not 100% sure how to do that.

edit:

additional notes - the mount_setattr idmap'd notes section implies the following:

  • the notes imply that The caller must have the CAP_SYS_ADMIN capability in the user namespace the filesystem was mounted in this is different from the the EPERM statement about the initial user namespace. is it possible crun just isn't setting this cap for the usernamespace for the current user?
  • notes imply overflow user ID which i believe would be incredibly useful to set settable as an option and default to the default user id (typically root unless overridden by the --user option). currently looks like its set to the end of the uid/gid space. im assuming for security reasons?
@james-lawrence
Copy link
Author

as an aside im happy to do this work if its agreeable and someone points me in the correct direction.

@rhatdan
Copy link
Member

rhatdan commented Jan 2, 2025

The kernel does not allow rootless users to do idmap.Nothing crun can do about it.

@james-lawrence
Copy link
Author

james-lawrence commented Jan 2, 2025

documentation i linked to says otherwise.

the notes imply that The caller must have the CAP_SYS_ADMIN capability in the user namespace the filesystem was mounted in this is different from the the EPERM statement about the initial user namespace. is it possible crun just isn't setting this cap for the usernamespace for the current user?

@rhatdan
Copy link
Member

rhatdan commented Jan 2, 2025

I am not sure Podman is assuming the CAP_SYS_ADMIN permissions. You could try this directly on the crun command rather then Podman, which would get you closer to what you expect.

@james-lawrence
Copy link
Author

@rhatdan im not sure either, I didnt realize podman sets up a user namespace before calling crun. i'll back track. thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants