2026-05-03

12 min read

CVE-2026-31431 Copy Fail: A 4-Byte Kernel Write That Escapes Containers

If you run Linux containers in production, the answer to "are we exposed?" is almost certainly yes. CVE-2026-31431, nicknamed Copy Fail, is a privilege escalation in the Linux kernel's algif_aead crypto code that gives any unprivileged process a 4-byte write into the page cache of any readable file. From there, it is a clean container-to-host escape on Kubernetes, and the seccomp profile most platform teams trust does not stop it.

Disclosure landed on May 1, 2026. The PoC is on GitHub. The page-cache trick that turns a 4-byte write into root execution depends on a property every Kubernetes node has by default, which means most clusters running unpatched kernels are exposed today.

Here is what is going on and what to do about it.

TLDR

Detail	Info
CVE	CVE-2026-31431
Nickname	Copy Fail
Class	Linux kernel privilege escalation, container escape
Subsystem	Crypto API, `algif_aead`
Disclosed	May 1, 2026
Found by	Xint
Patch	Mainline commit `a664bf3d603d`
Affected	Most Linux distros on unpatched kernels: Ubuntu, RHEL 8/9, Debian, Fedora, SUSE, Amazon Linux, Arch, CloudLinux
What runtime-default seccomp does	Nothing
What you do	Patch the host kernel, drop a custom seccomp profile blocking AF_ALG, audit privileged DaemonSets

What Happened

Xint disclosed CVE-2026-31431 on May 1, 2026 with a working proof of concept. The bug lives in an in-place optimization the Linux kernel added to its AEAD crypto path back in 2017, where the kernel reuses the source buffer as the destination during cryptographic operations to avoid an allocation.

The optimization is unsafe when it is driven by the userspace crypto API (AF_ALG sockets) and combined with the splice() syscall. By racing those two, an unprivileged process can persuade the kernel to perform a deterministic 4-byte write into the page cache of any file the process can read.

Four bytes does not sound like much. The trick is that the page cache is shared. Every container on a node that uses the same base image layer is reading from the same physical pages. So is the host. So is kube-proxy. So are the privileged DaemonSets on every other node that pulled the same image.

The published Kubernetes PoC (Percivalll/Copy-Fail-CVE-2026-31431-Kubernetes-PoC) targets /usr/sbin/ipset, which kube-proxy invokes as root. An unprivileged pod corrupts the page-cache copy of ipset, then waits for kube-proxy to run it. When the DaemonSet executes the binary, it pulls the corrupted bytes from the cache and the attacker gets root execution on the node.

How the Exploit Works

The exploit chains three things: the AF_ALG userspace crypto API, the splice() syscall, and the kernel's page cache. Here is the sequence.

Step 1: Open an AF_ALG socket

The userspace crypto API lets any process ask the kernel to do crypto. You do not need root, and you do not need any capability. A plain socket() call is enough:

int s = socket(AF_ALG, SOCK_SEQPACKET, 0);
struct sockaddr_alg sa = {
    .salg_family = AF_ALG,
    .salg_type   = "aead",
    .salg_name   = "authencesn(hmac(sha256),cbc(aes))",
};
bind(s, (struct sockaddr *)&sa, sizeof(sa));

That is the surface area. The algif_aead template is what enables the in-place optimization. No syscalls beyond socket, bind, setsockopt, and splice are required.

Step 2: Splice a target page in

splice() lets you move bytes between a file descriptor and a pipe without copying through userspace. The exploit uses it to point the kernel's AEAD operation at a page from the target file (a setuid binary, or a binary like ipset that a privileged process will execute):

int pipefd[2];
pipe(pipefd);
int target = open("/usr/sbin/ipset", O_RDONLY);
splice(target, NULL, pipefd[1], NULL, 4096, 0);
splice(pipefd[0], NULL, alg_fd, NULL, 4096, 0);

The page cache now holds the file content. Because the kernel reuses the source as the destination during the AEAD transform, the encrypted output gets written back over the same page the file was read from.

Step 3: Race the bound check and write 4 bytes

The exploit forces the AEAD operation into a path where the scatter-gather list bounds check happens before, but the actual copy happens after, an attacker-controlled length change. That race produces a 4-byte write at a controlled offset into the page that is now serving as the kernel's view of /usr/sbin/ipset.

Four bytes is enough to plant a near-immediate jump or to redirect a function-pointer inside an ELF binary. The exploit picks an offset that turns the binary into a small loader for the attacker payload.

Step 4: Wait for the privileged process

Now the attacker waits. As soon as a privileged process on the host or in another container reads the file, it reads the corrupted bytes. On a Kubernetes node, kube-proxy runs /usr/sbin/ipset regularly to manage iptables rules, so the wait is measured in seconds.

When kube-proxy runs the corrupted binary, the attacker pivots from an unprivileged pod to root execution on the node.

Why runtime-default seccomp does not save you

Most platform teams assume seccomp=runtime-default keeps userspace-crypto-API tricks like this out of containers. It does not.

The juliet.sh test write-up confirmed this on both Talos v1.12.2 (containerd 2.1.6) and Amazon EKS (containerd 2.2.1). A non-root pod with all capabilities dropped and seccompProfile.type: RuntimeDefault opened an AF_ALG socket on every distro tested. Pod Security Standards restricted did not block it either.

The reason is that the default profiles deny socket(AF_VSOCK, ...) but not socket(AF_ALG, ...). AF_ALG is considered a normal userspace API. Until the kernel patches roll out, "default seccomp" effectively means "no protection against this CVE."

Are You Affected?

If you run any modern Linux distro and have not picked up the kernel update from May 1, 2026 or later, assume yes.

Check your kernel version

# Host kernel
uname -r

# Patch landed in mainline. The fix is the cherry-pick of commit a664bf3d603d.
# Distro CVE trackers will tell you the first patched package version.
# Ubuntu:        ubuntu.com/security/CVE-2026-31431
# RHEL:          access.redhat.com/security/cve/CVE-2026-31431
# Debian:        security-tracker.debian.org/tracker/CVE-2026-31431
# Amazon Linux:  alas.aws.amazon.com (search for CVE-2026-31431)
# SUSE:          suse.com/security/cve/CVE-2026-31431

Check whether AF_ALG is reachable from your pods

Drop this into a debug pod in a non-production cluster to confirm exposure:

kubectl run alg-check --rm -it --restart=Never \
  --image=alpine:3.20 -- sh -c '
apk add --no-cache python3
python3 -c "
import socket
try:
    s = socket.socket(socket.AF_ALG, socket.SOCK_SEQPACKET, 0)
    s.bind((\"aead\", \"authencesn(hmac(sha256),cbc(aes))\"))
    print(\"VULNERABLE: AF_ALG bind succeeded\")
except OSError as e:
    print(f\"BLOCKED: {e}\")
"
'

If you see VULNERABLE: AF_ALG bind succeeded, your pods can reach the kernel surface that Copy Fail needs.

Check for known IoCs

The published PoC writes a marker file under /tmp on the host after pivot. Search your nodes:

# On each node
sudo find / -name "copyfail-*" -mtime -7 2>/dev/null

# Audit recent ipset binary modifications
sudo stat /usr/sbin/ipset

If you see binaries with modification times that do not match your distro package install date, treat the node as compromised.

What to Do Right Now

1. Patch the host kernel

This is the only real fix. The mainline commit is a664bf3d603d. As of May 3, 2026 distros are at varying states of patch availability:

Distro	Status
Ubuntu	Most kernels not yet patched, monitor USN
Debian sid/unstable	Patched
Debian stable/bookworm	Not patched
RHEL 8/9	Patches in progress
Fedora	Patches in progress
SUSE/SLES	Patches in progress
Amazon Linux	Patches in progress
CloudLinux	Not patched
Arch Linux	Likely patched on `linux` package update

Apply the kernel update and reboot the nodes through your normal node-maintenance flow. If you are running a managed Kubernetes service, the cloud vendor will roll out node images in their usual cadence. AWS, GCP, and Azure all have advisories tied to this CVE; check their status pages for your cluster's node image SKU.

2. Block AF_ALG with a custom seccomp profile

A custom Localhost seccomp profile that denies socket(AF_ALG, ...) blocks the syscall path the exploit needs. This is your "in the meantime" mitigation while you wait for the kernel patch to roll across all your nodes.

Save this as /var/lib/kubelet/seccomp/no-af-alg.json on every node:

{
  "defaultAction": "SCMP_ACT_ALLOW",
  "architectures": ["SCMP_ARCH_X86_64", "SCMP_ARCH_X86", "SCMP_ARCH_X32"],
  "syscalls": [
    {
      "names": ["socket", "socketpair"],
      "action": "SCMP_ACT_ERRNO",
      "errnoRet": 1,
      "args": [
        {
          "index": 0,
          "value": 38,
          "op": "SCMP_CMP_EQ"
        }
      ]
    }
  ]
}

38 is AF_ALG. The SCMP_ACT_ERRNO action returns EPERM to the caller, which is what you want: the exploit's bind() will fail before it can begin the splice race.

Apply it to your workloads with a pod spec like this:

apiVersion: v1
kind: Pod
metadata:
  name: app
spec:
  securityContext:
    seccompProfile:
      type: Localhost
      localhostProfile: no-af-alg.json
  containers:
    - name: app
      image: your-image:tag

For org-wide rollout, plug it into your admission controller (Kyverno, OPA Gatekeeper, or Pod Security Standards via seccompProfile.type=Localhost) so pods cannot be scheduled without it.

3. Audit your privileged DaemonSets

Copy Fail needs a privileged process that re-reads a file from the page cache after corruption. On a stock Kubernetes node, kube-proxy running ipset is the easy target. Take a pass over every DaemonSet in kube-system and your platform namespaces:

kubectl get daemonsets --all-namespaces -o json \
  | jq -r '.items[] | select(.spec.template.spec.containers[]?.securityContext.privileged == true)
           | "\(.metadata.namespace)/\(.metadata.name)"'

For each one:

Identify which host binaries it executes.
Decide whether it actually needs privileged: true or whether targeted capabilities would do.
Where you can, run those binaries from the container image rather than the host filesystem so a host-side page-cache poison cannot reach them.

4. Tighten image-layer overlap on shared nodes

The PoC works because base layers are deduplicated. Two pods running the same image share the same physical page in the kernel's page cache. A poison from one pod is what the other pod reads.

Multi-tenancy mitigations that already help here:

Run untrusted workloads in their own node pool with sandboxing (gVisor, Kata, Firecracker). All three move the kernel out of reach.
Pin sensitive privileged DaemonSets to dedicated nodes with nodeSelector and taints, so pods from less trusted namespaces never share a node with them.
For high-blast-radius nodes (control plane, ingress, Vault, secrets operators), set spec.runtimeClassName to a sandboxed runtime class.

5. Rotate node-bound secrets if you found IoCs

If a node looked compromised, treat anything that has been on it as exposed:

Service account tokens mounted into pods on the node
kubelet client certificate
Secrets mounted as volumes in any pod scheduled on that node
Cloud instance role credentials (force a new instance, do not just rotate the role)
etcd certificates if the node was a control-plane node

Why This Matters for DevOps Teams

A few things stand out about Copy Fail beyond the immediate CVE:

Default seccomp is a marketing default, not a security default. "We use runtime-default seccomp" is something most teams have written into their compliance docs. Copy Fail is the latest demonstration that this profile is permissive by design, not restrictive. AF_ALG joins a small list of network families that pop up in CVE write-ups every few years. Build a habit of layering a custom profile that blocks what you do not need.

Page-cache sharing is a multi-tenancy boundary you probably forgot existed. The kernel's page cache is shared, and that sharing is what turns a 4-byte write into a privilege escalation. If you treat every node as a single security domain, your blast radius is "the entire node and every pod on it" the moment any pod gets the kernel to misbehave. Sandboxed runtimes are no longer a niche concern.

Your privileged DaemonSets are the targets. kube-proxy, CSI drivers, CNI plugins, log collectors, monitoring agents. The pattern is the same: a high-privilege process re-reading a file from the page cache. Take inventory, and prefer images that ship their own copies of any binary they execute.

Kernel CVEs are part of the platform team's job again. For most of the container era, "the kernel" was a thing the cloud handled for you. Copy Fail is a reminder that the kernel sits underneath every abstraction you have built, and that an unpatched node's exposure is not bounded by your application security posture.

Key Takeaways

Patch the host kernel. The mainline fix is a664bf3d603d. Until it lands, every Linux node is exposed.
Drop a custom seccomp profile that blocks socket(AF_ALG, ...). Do not assume runtime-default or PSS Restricted has you covered.
Audit privileged DaemonSets. They are the targets that turn a 4-byte write into root.
Run untrusted workloads on sandboxed runtimes (gVisor, Kata, Firecracker) on dedicated node pools.
Rotate node-scoped secrets if you find evidence of compromise.
Layer your defenses. Kernel patch + custom seccomp + sandboxed runtimes + pinned privileged DaemonSets is the picture, not any one of those alone.

The 4-byte write is the easy part to fix. The page-cache sharing it exploits is going to be there for a long time.

Sources: Microsoft Security Blog, Wiz, juliet.sh, Kubernetes PoC repo, OVHcloud

Proudly Sponsored By

We earn commissions when you shop through the links below.

Svix

Webhooks as a service

Svix Dispatch sends your webhooks for you: retries with exponential backoff, signed payloads, idempotency keys, and a delivery log your customers can see.

Learn more

DigitalOcean

Cloud infrastructure for developers

Simple, reliable cloud computing designed for developers

Learn more

DevDojo

Developer community & tools

Join a community of developers sharing knowledge and tools

Learn more

SMTPfast

Developer-first email API

Send transactional and marketing email through a clean REST API. Detailed logs, webhooks, and embeddable signup forms in one dashboard.

Learn more

QuizAPI

Developer-first quiz platform

Build, generate, and embed quizzes with a powerful REST API. AI-powered question generation and live multiplayer.

Learn more

Want to support DevOps Daily and reach thousands of developers?

Become a Sponsor

Published: 2026-05-03|Last updated: 2026-05-03T13:30:00Z

Dirty Frag (CVE-2026-43284 + CVE-2026-43500): Local Root on Every Major Linux Distro

2026-05-08|12 min read

CVE-2025-55182 React2Shell: 766 Next.js Hosts Breached in 24 Hours

2026-04-03|11 min read

5 DevOps Books Worth Reading in 2026

2026-03-26|7 min read

Also worth your time on this topic

Article

Your Container Is Not a Security Boundary: GhostLock (CVE-2026-43499)

A 15-year-old Linux kernel bug just got a public exploit that breaks out of containers and hands any local user root on the host. GhostLock is a reminder that the container is not your security boundary, the shared kernel is. Here is what actually shrinks the blast radius.

Checklist

Kubernetes Security Checklist

Essential security checklist for Kubernetes clusters to ensure production readiness.

1-2 hours

Exercise

Running Docker Containers on Your Linux Server

Install Docker and Docker Compose on Ubuntu, run your first container, deploy a WordPress stack with docker-compose, and set up Nginx as a reverse proxy in front of your containers.

60 minutes

CVE-2026-31431 Copy Fail: A 4-Byte Kernel Write That Escapes Containers

TLDR

What Happened

How the Exploit Works

Step 1: Open an AF_ALG socket

Step 2: Splice a target page in

Step 3: Race the bound check and write 4 bytes

Step 4: Wait for the privileged process

Why runtime-default seccomp does not save you

Are You Affected?

Check your kernel version

Check whether AF_ALG is reachable from your pods

Check for known IoCs

What to Do Right Now

1. Patch the host kernel

2. Block AF_ALG with a custom seccomp profile

3. Audit your privileged DaemonSets

4. Tighten image-layer overlap on shared nodes

5. Rotate node-bound secrets if you found IoCs

Why This Matters for DevOps Teams

Key Takeaways

Svix

DigitalOcean

DevDojo

SMTPfast

QuizAPI

Tags

Related Posts

Dirty Frag (CVE-2026-43284 + CVE-2026-43500): Local Root on Every Major Linux Distro

CVE-2025-55182 React2Shell: 766 Next.js Hosts Breached in 24 Hours

5 DevOps Books Worth Reading in 2026

Also worth your time on this topic

Your Container Is Not a Security Boundary: GhostLock (CVE-2026-43499)

Kubernetes Security Checklist

Running Docker Containers on Your Linux Server