Introducing rkt’s ability to detect privilege escalation attacks on containers

Intel's Clear Containers technology allows admins to benefit from the ease of container-based deployment without giving up the security of virtualization. For more than a year, rkt's KVM stage1 has supported VM-based container isolation, but we can build more advanced security features atop it. Using introspection technology, we can automatically detect a wide range of privilege escalation attacks on containers and provide appropriate remediation, making it significantly more difficult for attackers to make a single compromised container the beachhead for an infrastructure-wide assault.

Today we announce rkt’s ability to automatically detect privilege escalation attacks on containers. If such an attack is detected, the container will automatically shut down and a new instance will be started. Direct integration with rkt means users will benefit from this detection and remediation technology with minimal local configuration changes and without having to modify their application containers in any way.

Background: Security separation on Linux

The Unix security model inherited by Linux separates users into two classes: privileged (the root user) and unprivileged (any other user on the system), with the kernel enforcing the separation. If an attacker is able to exploit a vulnerability in the kernel, they can bypass that separation and cause a process running as an unprivileged user to gain root privileges. From there they can attack the rest of the system with ease.

This scenario is difficult to detect and handle, since in a traditional environment the kernel is the most privileged component in a system. If the kernel has been tampered with, the kernel can no longer be trusted to provide accurate information. Virtualized environments add another level of privilege in the form of the hypervisor, and by integrating with the hypervisor, it becomes possible to obtain a more accurate view of the actual system state of virtualized guests.

Implementation

In our implementation, the kernel notifies the hypervisor each time a process is created or destroyed. The permissions associated with that process are stored at the hypervisor level and verified to ensure that they are internally consistent. For instance, if a process is running as an unprivileged user, it should not be able to directly create a child process that is running as root. An attack on the kernel may be able to modify the kernel’s internal representation of this state, but will not be able to affect the hypervisor’s state.

This state can then be verified whenever a process performs an action requiring a permissions check. For example, when a process requests that a file be opened, the kernel now calls out to the hypervisor. The hypervisor is then able to examine the process state and ensure that it remains consistent with its internal representation of process state. If so, execution is allowed to continue. If not, this indicates that the kernel’s internal process state has been modified and the administrator can be alerted that the container has been compromised. The container state can be saved to disk and the container either terminated or restarted in a clean state.

By isolating examination to cases where a permissions check is performed, the overhead of this approach is minimised to the point where most real-world use cases will see no measurable performance impact.

What about SUID binaries?

The SUID flag on an executable file indicates that executable should run as a different user, no matter who executes it. This is most commonly used to allow users to execute a subset of binaries as root even if they are themselves an unprivileged user. This violates the expectations outlined above – it becomes legitimate for an unprivileged process to gain root privileges.

This can be avoided by simply having the kernel notify the hypervisor that such a transition has occurred, allowing the hypervisor to update its internal state. However, if an attacker is able to influence the kernel’s control flow, it is potentially possible for them to trigger the same state update for illegitimate transitions. We avoid this by storing process-specific data in an otherwise unused CPU register when entering the SUID execution path via legitimate means. When the hypervisor receives a notification that the kernel wishes to update a process’s credentials, it reads this CPU register and verifies that the state matches. If so, the update is allowed to proceed. If not, it is treated as any other unauthorized privilege escalation.

Why rkt?

rkt incorporates a pluggable architecture, allowing for multiple “stage1” modules that are responsible for configuring and starting a container. This makes it straightforward to incorporate additional container management and monitoring code with minimal modification to rkt itself. This feature was implemented with under 30 altered lines of code in the core rkt runtime, all additional modifications being isolated in the stage1.

Does this solve all security issues?

Privilege escalation detection doesn’t solve all security issues, but plays an important role in container security. It can identify privilege escalation attacks that are triggered by modification of existing kernel state. Vulnerabilities such as “Dirty COW”, which instead rely on injecting new code into legitimate SUID applications, would not be caught by this monitoring. Attacks that occur entirely in userspace will also not be identified – a vulnerability in a privileged component within a container would still allow an attacker to gain that component’s privileges.

Next steps

The Kernel Self Protection Project continues to develop Linux features that will help mitigate many kernel vulnerabilities before they even reach the point of being exploitable, and the GRSecurity project already provides patches that will block many of these attacks. We see these as complementary to our work, and hope that eventually the kernel will become secure enough that it can be entirely trusted without needing additional monitoring. Until then, we hope that this feature will help identify new exploits earlier.

The bleeding edge of this work, along with more details, can be found in the pull request that adds privilege escalation detection to rkt and the KVM stage1. Check it out today and join us in this new approach to securing the internet.

FAQ

How is this different from SELinux?

SELinux is a kernel feature that allows fine-grained policies restricting the behaviour of applications, and rkt makes use of SELinux to increase isolation between containers running on a shared system. However, SELinux suffers from the same issue as traditional Unix privilege separation – it relies on the kernel to impose those restrictions. If the kernel can be tampered with, SELinux isolation can be disabled.

What about overhead?

Some specific workloads such as the serving of significant quantities of static content may be impacted, but most workloads will have negligible overhead.

Introducing rkt’s ability to detect privilege escalation attacks on containers

Background: Security separation on Linux

Implementation

What about SUID binaries?

Why rkt?

Does this solve all security issues?

Next steps

FAQ

How is this different from SELinux?

What about overhead?

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112