CVE-2026-31431 Copy Fail: A 4-Byte Kernel Write That Escapes Containers
If you run Linux containers in production, the answer to "are we exposed?" is almost certainly yes. CVE-2026-31431, nicknamed Copy Fail, is a privilege escalation in the Linux kernel's algif_aead crypto code that gives any unprivileged process a 4-byte write into the page cache of any readable file. From there, it is a clean container-to-host escape on Kubernetes, and the seccomp profile most platform teams trust does not stop it.
Disclosure landed on May 1, 2026. The PoC is on GitHub. The page-cache trick that turns a 4-byte write into root execution depends on a property every Kubernetes node has by default, which means most clusters running unpatched kernels are exposed today.
Here is what is going on and what to do about it.
TLDR
| Detail | Info |
|---|---|
| CVE | CVE-2026-31431 |
| Nickname | Copy Fail |
| Class | Linux kernel privilege escalation, container escape |
| Subsystem | Crypto API, algif_aead |
| Disclosed | May 1, 2026 |
| Found by | Xint |
| Patch | Mainline commit a664bf3d603d |
| Affected | Most Linux distros on unpatched kernels: Ubuntu, RHEL 8/9, Debian, Fedora, SUSE, Amazon Linux, Arch, CloudLinux |
| What runtime-default seccomp does | Nothing |
| What you do | Patch the host kernel, drop a custom seccomp profile blocking AF_ALG, audit privileged DaemonSets |
What Happened
Xint disclosed CVE-2026-31431 on May 1, 2026 with a working proof of concept. The bug lives in an in-place optimization the Linux kernel added to its AEAD crypto path back in 2017, where the kernel reuses the source buffer as the destination during cryptographic operations to avoid an allocation.
The optimization is unsafe when it is driven by the userspace crypto API (AF_ALG sockets) and combined with the splice() syscall. By racing those two, an unprivileged process can persuade the kernel to perform a deterministic 4-byte write into the page cache of any file the process can read.
Four bytes does not sound like much. The trick is that the page cache is shared. Every container on a node that uses the same base image layer is reading from the same physical pages. So is the host. So is kube-proxy. So are the privileged DaemonSets on every other node that pulled the same image.
The published Kubernetes PoC (Percivalll/Copy-Fail-CVE-2026-31431-Kubernetes-PoC) targets /usr/sbin/ipset, which kube-proxy invokes as root. An unprivileged pod corrupts the page-cache copy of ipset, then waits for kube-proxy to run it. When the DaemonSet executes the binary, it pulls the corrupted bytes from the cache and the attacker gets root execution on the node.
How the Exploit Works
The exploit chains three things: the AF_ALG userspace crypto API, the splice() syscall, and the kernel's page cache. Here is the sequence.
Step 1: Open an AF_ALG socket
The userspace crypto API lets any process ask the kernel to do crypto. You do not need root, and you do not need any capability. A plain socket() call is enough:
int s = socket(AF_ALG, SOCK_SEQPACKET, 0);
struct sockaddr_alg sa = {
.salg_family = AF_ALG,
.salg_type = "aead",
.salg_name = "authencesn(hmac(sha256),cbc(aes))",
};
bind(s, (struct sockaddr *)&sa, sizeof(sa));
That is the surface area. The algif_aead template is what enables the in-place optimization. No syscalls beyond socket, bind, setsockopt, and splice are required.
Step 2: Splice a target page in
splice() lets you move bytes between a file descriptor and a pipe without copying through userspace. The exploit uses it to point the kernel's AEAD operation at a page from the target file (a setuid binary, or a binary like ipset that a privileged process will execute):
int pipefd[2];
pipe(pipefd);
int target = open("/usr/sbin/ipset", O_RDONLY);
splice(target, NULL, pipefd[1], NULL, 4096, 0);
splice(pipefd[0], NULL, alg_fd, NULL, 4096, 0);
The page cache now holds the file content. Because the kernel reuses the source as the destination during the AEAD transform, the encrypted output gets written back over the same page the file was read from.
Step 3: Race the bound check and write 4 bytes
The exploit forces the AEAD operation into a path where the scatter-gather list bounds check happens before, but the actual copy happens after, an attacker-controlled length change. That race produces a 4-byte write at a controlled offset into the page that is now serving as the kernel's view of /usr/sbin/ipset.
Four bytes is enough to plant a near-immediate jump or to redirect a function-pointer inside an ELF binary. The exploit picks an offset that turns the binary into a small loader for the attacker payload.
Step 4: Wait for the privileged process
Now the attacker waits. As soon as a privileged process on the host or in another container reads the file, it reads the corrupted bytes. On a Kubernetes node, kube-proxy runs /usr/sbin/ipset regularly to manage iptables rules, so the wait is measured in seconds.
When kube-proxy runs the corrupted binary, the attacker pivots from an unprivileged pod to root execution on the node.
Why runtime-default seccomp does not save you
Most platform teams assume seccomp=runtime-default keeps userspace-crypto-API tricks like this out of containers. It does not.
The juliet.sh test write-up confirmed this on both Talos v1.12.2 (containerd 2.1.6) and Amazon EKS (containerd 2.2.1). A non-root pod with all capabilities dropped and seccompProfile.type: RuntimeDefault opened an AF_ALG socket on every distro tested. Pod Security Standards restricted did not block it either.
The reason is that the default profiles deny socket(AF_VSOCK, ...) but not socket(AF_ALG, ...). AF_ALG is considered a normal userspace API. Until the kernel patches roll out, "default seccomp" effectively means "no protection against this CVE."
Are You Affected?
If you run any modern Linux distro and have not picked up the kernel update from May 1, 2026 or later, assume yes.
Check your kernel version
# Host kernel
uname -r
# Patch landed in mainline. The fix is the cherry-pick of commit a664bf3d603d.
# Distro CVE trackers will tell you the first patched package version.
# Ubuntu: ubuntu.com/security/CVE-2026-31431
# RHEL: access.redhat.com/security/cve/CVE-2026-31431
# Debian: security-tracker.debian.org/tracker/CVE-2026-31431
# Amazon Linux: alas.aws.amazon.com (search for CVE-2026-31431)
# SUSE: suse.com/security/cve/CVE-2026-31431
Check whether AF_ALG is reachable from your pods
Drop this into a debug pod in a non-production cluster to confirm exposure:
kubectl run alg-check --rm -it --restart=Never \
--image=alpine:3.20 -- sh -c '
apk add --no-cache python3
python3 -c "
import socket
try:
s = socket.socket(socket.AF_ALG, socket.SOCK_SEQPACKET, 0)
s.bind((\"aead\", \"authencesn(hmac(sha256),cbc(aes))\"))
print(\"VULNERABLE: AF_ALG bind succeeded\")
except OSError as e:
print(f\"BLOCKED: {e}\")
"
'
If you see VULNERABLE: AF_ALG bind succeeded, your pods can reach the kernel surface that Copy Fail needs.
Check for known IoCs
The published PoC writes a marker file under /tmp on the host after pivot. Search your nodes:
# On each node
sudo find / -name "copyfail-*" -mtime -7 2>/dev/null
# Audit recent ipset binary modifications
sudo stat /usr/sbin/ipset
If you see binaries with modification times that do not match your distro package install date, treat the node as compromised.
What to Do Right Now
1. Patch the host kernel
This is the only real fix. The mainline commit is a664bf3d603d. As of May 3, 2026 distros are at varying states of patch availability:
| Distro | Status |
|---|---|
| Ubuntu | Most kernels not yet patched, monitor USN |
| Debian sid/unstable | Patched |
| Debian stable/bookworm | Not patched |
| RHEL 8/9 | Patches in progress |
| Fedora | Patches in progress |
| SUSE/SLES | Patches in progress |
| Amazon Linux | Patches in progress |
| CloudLinux | Not patched |
| Arch Linux | Likely patched on linux package update |
Apply the kernel update and reboot the nodes through your normal node-maintenance flow. If you are running a managed Kubernetes service, the cloud vendor will roll out node images in their usual cadence. AWS, GCP, and Azure all have advisories tied to this CVE; check their status pages for your cluster's node image SKU.
2. Block AF_ALG with a custom seccomp profile
A custom Localhost seccomp profile that denies socket(AF_ALG, ...) blocks the syscall path the exploit needs. This is your "in the meantime" mitigation while you wait for the kernel patch to roll across all your nodes.
Save this as /var/lib/kubelet/seccomp/no-af-alg.json on every node:
{
"defaultAction": "SCMP_ACT_ALLOW",
"architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_X86", "SCMP_ARCH_X32"],
"syscalls": [
{
"names": ["socket", "socketpair"],
"action": "SCMP_ACT_ERRNO",
"errnoRet": 1,
"args": [
{
"index": 0,
"value": 38,
"op": "SCMP_CMP_EQ"
}
]
}
]
}
38 is AF_ALG. The SCMP_ACT_ERRNO action returns EPERM to the caller, which is what you want: the exploit's bind() will fail before it can begin the splice race.
Apply it to your workloads with a pod spec like this:
apiVersion: v1
kind: Pod
metadata:
name: app
spec:
securityContext:
seccompProfile:
type: Localhost
localhostProfile: no-af-alg.json
containers:
- name: app
image: your-image:tag
For org-wide rollout, plug it into your admission controller (Kyverno, OPA Gatekeeper, or Pod Security Standards via seccompProfile.type=Localhost) so pods cannot be scheduled without it.
3. Audit your privileged DaemonSets
Copy Fail needs a privileged process that re-reads a file from the page cache after corruption. On a stock Kubernetes node, kube-proxy running ipset is the easy target. Take a pass over every DaemonSet in kube-system and your platform namespaces:
kubectl get daemonsets --all-namespaces -o json \
| jq -r '.items[] | select(.spec.template.spec.containers[]?.securityContext.privileged == true)
| "\(.metadata.namespace)/\(.metadata.name)"'
For each one:
- Identify which host binaries it executes.
- Decide whether it actually needs
privileged: trueor whether targeted capabilities would do. - Where you can, run those binaries from the container image rather than the host filesystem so a host-side page-cache poison cannot reach them.
4. Tighten image-layer overlap on shared nodes
The PoC works because base layers are deduplicated. Two pods running the same image share the same physical page in the kernel's page cache. A poison from one pod is what the other pod reads.
Multi-tenancy mitigations that already help here:
- Run untrusted workloads in their own node pool with sandboxing (gVisor, Kata, Firecracker). All three move the kernel out of reach.
- Pin sensitive privileged DaemonSets to dedicated nodes with
nodeSelectorand taints, so pods from less trusted namespaces never share a node with them. - For high-blast-radius nodes (control plane, ingress, Vault, secrets operators), set
spec.runtimeClassNameto a sandboxed runtime class.
5. Rotate node-bound secrets if you found IoCs
If a node looked compromised, treat anything that has been on it as exposed:
- Service account tokens mounted into pods on the node
- kubelet client certificate
- Secrets mounted as volumes in any pod scheduled on that node
- Cloud instance role credentials (force a new instance, do not just rotate the role)
- etcd certificates if the node was a control-plane node
Why This Matters for DevOps Teams
A few things stand out about Copy Fail beyond the immediate CVE:
Default seccomp is a marketing default, not a security default. "We use runtime-default seccomp" is something most teams have written into their compliance docs. Copy Fail is the latest demonstration that this profile is permissive by design, not restrictive. AF_ALG joins a small list of network families that pop up in CVE write-ups every few years. Build a habit of layering a custom profile that blocks what you do not need.
Page-cache sharing is a multi-tenancy boundary you probably forgot existed. The kernel's page cache is shared, and that sharing is what turns a 4-byte write into a privilege escalation. If you treat every node as a single security domain, your blast radius is "the entire node and every pod on it" the moment any pod gets the kernel to misbehave. Sandboxed runtimes are no longer a niche concern.
Your privileged DaemonSets are the targets. kube-proxy, CSI drivers, CNI plugins, log collectors, monitoring agents. The pattern is the same: a high-privilege process re-reading a file from the page cache. Take inventory, and prefer images that ship their own copies of any binary they execute.
Kernel CVEs are part of the platform team's job again. For most of the container era, "the kernel" was a thing the cloud handled for you. Copy Fail is a reminder that the kernel sits underneath every abstraction you have built, and that an unpatched node's exposure is not bounded by your application security posture.
Key Takeaways
- Patch the host kernel. The mainline fix is
a664bf3d603d. Until it lands, every Linux node is exposed. - Drop a custom seccomp profile that blocks
socket(AF_ALG, ...). Do not assumeruntime-defaultor PSS Restricted has you covered. - Audit privileged DaemonSets. They are the targets that turn a 4-byte write into root.
- Run untrusted workloads on sandboxed runtimes (gVisor, Kata, Firecracker) on dedicated node pools.
- Rotate node-scoped secrets if you find evidence of compromise.
- Layer your defenses. Kernel patch + custom seccomp + sandboxed runtimes + pinned privileged DaemonSets is the picture, not any one of those alone.
The 4-byte write is the easy part to fix. The page-cache sharing it exploits is going to be there for a long time.
Sources: Microsoft Security Blog, Wiz, juliet.sh, Kubernetes PoC repo, OVHcloud
We earn commissions when you shop through the links below.
DigitalOcean
Cloud infrastructure for developers
Simple, reliable cloud computing designed for developers
DevDojo
Developer community & tools
Join a community of developers sharing knowledge and tools
SMTPfast
Developer-first email API
Send transactional and marketing email through a clean REST API. Detailed logs, webhooks, and embeddable signup forms in one dashboard.
QuizAPI
Developer-first quiz platform
Build, generate, and embed quizzes with a powerful REST API. AI-powered question generation and live multiplayer.
Want to support DevOps Daily and reach thousands of developers?
Become a SponsorFound an issue?