-
-
Notifications
You must be signed in to change notification settings - Fork 392
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SELinux "denied" errors #697
Comments
@honarkhah Good catch, please SSH into your node, and execute that command sudo Ideally, before that, apply the generate .pp to double check that it works. If not, re-run the command and apply again the .pp, until you get vault working, and then either open a PR with the content of the .te added to the policy section in locals.tf, or just copy it here so that I add it myself. |
@honarkhah FYI, you can apply the .pp with |
@honarkhah Any update on this, I await the content of your .te to add it to our kube-hetzner policy so that others can run vault and similar software too. |
I have tried it couple of times, but I still get permission denied!
policy.te file:
But as I see the error in container changed to
|
Ah, finally got it working, the issue was the socket was created before, after deleting and let it to create with new policy it worked, I will create a PR.
|
I have similar problem with the Filebeat.
Pod's manifest:
|
Hi! I'm also getting these
Any workarounds? Edit: I think it would be great if local-storage is supported by kube-hetzner from the get-go! It's an important feature to have minimum file-latency e.g. for databases. |
Have you guys followed the instructions? |
ah, sorry - I must have skipped that part. will try |
I use this project to setup cheaper dev/test/staging kubernetes clusters. After reinstalling an older cluster I got the following problems:
These issues have been solved by disabling Selinux. Probably not the smartest Idea but I don't mind to do it on a throwaway clusters.
From this use case of mine I can conclude that it is probably best to have the Selinux settings be exposed in the kube.tf.example similar to how extra_firewall_rules are. Preferably with documentation on how to configure it like with the Examples section in the readme.md This is necessary because different users have different needs and it's not possible to cover everything in locals.tf |
nevermind, some permissions are missing. back to the drawing board
I suspect though, that this is highly workload dependent! @mysticaltech basically, we need to allow every kind of file-/folder-access to support any workloads that make use of local-storage. So any suggestions whats missing in above |
@honarkhah Good job, will merge later on tonight. @maaft Test it via applying th generated .pp file, if works great, if not, re-run the command for it to post the latest errors. Normally, it will convert all uses cases, so if the above is your final .te that is proven to work, I will integrate it. @michailkorenev Perfect, .te noted, will integrate, also make sure to test with .pp as described above. @phinx110 Please grab your .te, test your .pp, if all good post your .te, don't worry no need to disable SELinux I need your help to make this project compatible with SELinux while being has secure as possible, just generate and share your .te file please, will merge later tonight. |
@honarkhah Good job, will merge later on tonight. @maaft Test it via applying the generated .pp file, if works great, if not, re-run the command for it to post the latest errors. Normally, it will convert all uses cases, so if the above is your final .te that is proven to work, I will integrate it. @michailkorenev Perfect, .te noted, will integrate, also make sure to test with .pp as described above. @phinx110 Please grab your .te, test your .pp, if all good post your .te, don't worry no need to disable selinux. |
@mysticaltech its a very time-consuming process:
What I want to say is, wouldn't it be easier to allow just everything regarding files and folders? If yes, where to find such a list? |
@maaft No worries than, I will include a quick flag "enable_selinux" to turn off selinux at the k3s level, how it was before. It has always been active at the OS level, but not the k3s level. |
Now for best security, you would leave it active, and after you run your cluster, issue the above command to pickup the denied stuff, submit a PR and voila. |
@maaft My theory is that applications actually are not that different. Probably the new updated policy with the values above, will fix most issues, and after a few iterations, we will have convered the whole playing field. So my advice, both @maaft and @phinx110 see the command above, generate your te and pp file, test pp, if good and everything works, submit your final .te content here that is just text. That way we will really map all possible use cases, there aren't so many. |
@mysticaltech I can help if there is WIP or a list of features that need to develop! |
Thanks @honarkhah, really appreciate it! |
@maaft @phinx110 @michailkorenev Along with the recently merged SELinux rules additions by @honarkhah, I have merged your respective rules provided above. Please give v2.0.4 a shot, you do not need to create the snapshot, but you do not need to update the module and at least create new nodepools (while taking bringing the count to the old nodepools to 0 after draining the old nodes), or if you prefer completely recreate the cluster. If that works, great! If not, please let's iterate a little bit, either you add your rules to locals.tf directly via a PR, or you create a discussion about additional SELinux rules and add them there tagging me explicitly so that I add them ASAP. I believe we are near to mapping all the possible use cases, I also proactively added permissions based on GPT-4 recommendations of common uses cases. If that approach works, great! If however in the future we see the need for custom rules, we will provide a variable for you to add it, and if necessary, we will provide the ability to turn off SELinux at the k3s level (not the OS level), hopefully that will not be necessary so that clusters deployed with this tool stay as secure as possible. Again to create your needed policy if you see denied errors:
|
@mysticaltech
I need this to load the "tun" (openvpn) and "conntrack" (nftables) kernel modules:
Regarding the fluentbit-cluster-monitoring it was a mess. I got the following 3 configs:
After loading all these modules with I understand the desire to have a fully locked down k3s installation however, even with good instructions provided, it still remains a tedious iterative process, especially if you have to do this for multiple individual failing components separately, in order to have the semodules separated as well, like I did, and in the end it still did not end up working for me. With the older version of this project I had a working setup but after a reinstall I suddenly had to spend quite some time to get it working the correct way. In the end I have decided to just give up on this and disable selinux altogether to get on with it because of timepresure. I simply cannot afford to keep working on this any longer then I currently have and I assume there are quite a few other developers who end up in the same position as I am. The main reason to have "selinux at the k3s level" is if you run your production on Hetzner. In my case we run our production on AWS and use Hetzner as a cheaper environments to run different version/copies of our software for different departments (dev/test/staging). I would strongly consider the ability to disable "selinux at the k3s level" as a valid "feature" that I would like to request. |
@maaft Apparently sometimes node automatic upgrade breaks the networking changes, we are currently investigating. Please try to see if you can reapply cloud-init manually, maybe it's saved on the machine. Or look at the code to see what it does and do thst manually. If it works for you, we will be able to create a boot script that does that automatically. |
okay, thanks - will try it later. What are the current workarounds? Disable automatic upgrades? Edit: Also, this must be something new. My 1.5 year old cluster that a created with your repo back in the days, continues to do node upgrades successfully. Edit2: As suggested, I executed:
on the failing node. Still no network connection afterwards. I noticed that the commands issued by cloud-init actually do nothing! I executed the commands from This is highly unstable though and I'm very much interested more stable solution if you have any. LogsCloud-init v. 23.3-8.1 running 'init' at Tue, 23 Apr 2024 10:12:22 +0000. Up 215.38 seconds. ci-info: ++++++++++++++++++++++++++++++++++++++++Net device info++++++++++++++++++++++++++++++++++++++++ ci-info: +--------+------+------------------------------+-----------------+--------+-------------------+ ci-info: | Device | Up | Address | Mask | Scope | Hw-Address | ci-info: +--------+------+------------------------------+-----------------+--------+-------------------+ ci-info: | enp7s0 | True | 10.255.0.101 | 255.255.255.255 | global | 86:00:00:7e:2d:fc | ci-info: | enp7s0 | True | fe80::f88a:c1f2:363d:31b5/64 | . | link | 86:00:00:7e:2d:fc | ci-info: | eth0 | True | 49.13.59.226 | 255.255.255.255 | global | 96:00:03:1b:cd:01 | ci-info: | eth0 | True | fe80::8f1f:5511:4709:5e88/64 | . | link | 96:00:03:1b:cd:01 | ci-info: | lo | True | 127.0.0.1 | 255.0.0.0 | host | . | ci-info: | lo | True | ::1/128 | . | host | . | ci-info: +--------+------+------------------------------+-----------------+--------+-------------------+ ci-info: +++++++++++++++++++++++++++++Route IPv4 info++++++++++++++++++++++++++++++ ci-info: +-------+-------------+------------+-----------------+-----------+-------+ ci-info: | Route | Destination | Gateway | Genmask | Interface | Flags | ci-info: +-------+-------------+------------+-----------------+-----------+-------+ ci-info: | 0 | 0.0.0.0 | 10.0.0.1 | 0.0.0.0 | enp7s0 | UG | ci-info: | 1 | 0.0.0.0 | 172.31.1.1 | 0.0.0.0 | eth0 | UG | ci-info: | 2 | 10.0.0.0 | 10.0.0.1 | 255.0.0.0 | enp7s0 | UG | ci-info: | 3 | 10.0.0.1 | 0.0.0.0 | 255.255.255.255 | enp7s0 | UH | ci-info: | 4 | 172.31.1.1 | 0.0.0.0 | 255.255.255.255 | eth0 | UH | ci-info: | 5 | 172.31.1.1 | 0.0.0.0 | 255.255.255.255 | eth0 | UH | ci-info: +-------+-------------+------------+-----------------+-----------+-------+ ci-info: +++++++++++++++++++Route IPv6 info+++++++++++++++++++ ci-info: +-------+-------------+---------+-----------+-------+ ci-info: | Route | Destination | Gateway | Interface | Flags | ci-info: +-------+-------------+---------+-----------+-------+ ci-info: | 0 | fe80::/64 | :: | enp7s0 | U | ci-info: | 1 | fe80::/64 | :: | eth0 | U | ci-info: | 3 | local | :: | eth0 | U | ci-info: | 4 | local | :: | enp7s0 | U | ci-info: | 5 | multicast | :: | enp7s0 | U | ci-info: | 6 | multicast | :: | eth0 | U | ci-info: +-------+-------------+---------+-----------+-------+Cloud-init v. 23.3-8.1 running 'modules:config' at Tue, 23 Apr 2024 10:12:28 +0000. Up 221.53 seconds. Cloud-init v. 23.3-8.1 running 'modules:final' at Tue, 23 Apr 2024 10:12:31 +0000. Up 224.38 seconds. #cloud-config from 1 filesinit.cfgdebug: true
|
@maaft Thanks for confirming and trying it out. With that I can work on a permanent fix. Will do ASAP 🙏 @kube-hetzner/maintainers FYI |
And yes, turning off upgrade would solve that, but not necessary, will create a fix. Probably a systemctl shell that run after each reboot to make sure everything is kosher. |
Forcing a re-run of However that seems rather brittle because 1. cloud-init files are never updated after node creation (changing I guess one approach to workaround this somewhat is to reduce the cloud init config to the bare minimum of just invoking a script that does the setup and have that script reprovisioned. That should also help with updating the SELinux rules I guess. Nonetheless the automatic "atomic" updates somehow rolling back the system prior the first cloud-init run (without rerunning cloud-init since the state for that is in a stateful part of the system) seems like a deeper issue that shouldn't be just hacked around :/ I've had to completely remove and re-add nodes due to this because even re-initializing the node through a combination of forced cloud-init reruns and manually tainting terraform state didn't help since I couldn't recover the kubelet's node key in that state sometimes. |
@jhass totally agree, the underlying issue has to be fixed |
@mysticaltech Hey friend! Did you have time to investigate this issue? Anything I can do to help? |
@maaft Today will put my head on this, it will be fixed. Keep you posted 🙏 |
@jhass Thanks for sharing this. The nodes are pretty stateless and unchanging apart from the updates, there is indeed a deeper bug in microos, but it will get fixed probably (someone was asked to signal it and I tagged the project lead too), but anyways even if not, running cloudinit after each reboot is a quick "backup" work-around, even though it does not pull the new user-data, this is actually not needed as it pretty much does not change. |
Are you sure it is MicroOS? Because my old clusters (provisioned with a version of this repo ~ 1.5 years ago, continue to run and upgrade their OS flawlessly. If the reason is really a bug in MicroOS, all my clusters would be non-functional by now. .. which would be an absolute disaster since we're running prod on this. So the option that it might be MicroOS bug gives me some serious shivers and I'll probably not sleep very well until this is resolved :3 |
@maaft Just turn off automatic upgrades for production until we get to rhe the bottom of this. See readme. My guess is that there was one upgrade that has gone wrong or a particular upgrade edge case that is being hit. So in prod, to have peace of mind, best to turn off for now. i will update the docs. |
@mysticaltech Kube-Hetzner version used: 2.0.2
Commands for removing floating IP config:
|
@mysticaltech, another one here, this time from Prometheus node exporter.
|
@mysticaltech Found some more:
How can I persist these rules so that new nodepools (and auto-scaled nodes) also get the new rules? |
Currently they are persisted when added to the module, but you can do that as well locally until it's merged in the repo. |
New ones, required by SigNoz to mount hostfs (/) to read the host metrics:
|
I believe we should consider disabling SELinux by default, at least for agent nodes. I believe the whole point is to make it secure for one own's cluster, not to include a lot of default rules since that defeats the purpose. This could be a topic of discussion for the v3. I plan to open a discussion soon on this repo so we can talk about defaults and behaviors since the rewrite is going very well and is nearing completion. |
@aleksasiriski You have a point, will run it through o1, see what it says |
New ones for juicefs:
|
Another one here for node-exporter:
|
And another one:
This time for |
Have you disabled all three of the provided CNIs to have Istio or is it running alongside one of the supported ones? |
With Calico |
Vector failed to run
Trying to follow the steps from a comment #697#issuecomment-1496607924
After all performed steps, all started to work on that node.
And some questions
|
And by the way ... cat kube.tf | grep selinux
disable_selinux = true sestatus
SELinux status: enabled
SELinuxfs mount: /sys/fs/selinux
SELinux root directory: /etc/selinux
Loaded policy name: targeted
Current mode: enforcing
Mode from config file: enforcing
Policy MLS status: enabled
Policy deny_unknown status: allowed
Memory protection checking: actual (secure)
Max kernel policy version: 33 |
Description
Hi, I am trying to integrate the vault-csi-provider into my cluster which I got an error!
I have checked on the nodes and realized it is not able to create that file on the host os, I just have disabled the policy enforcing to test if that is causing it and was able to run the vault csi provider on the node that setenforce flag was disabled as you can see in below.
Is there any way that I can attach customized policy based on the need?
Kube.tf file
Screenshots
No response
Platform
Linux
The text was updated successfully, but these errors were encountered: