Incident Report: CodeOracle memory page divergence

the_matter_labs · May 27, 2025, 11:57am

Incident summary

On Wednesday, May 14, 2025, a bug was found in the implementation of the ZKsync sequencer’s virtual machine (VM) that led to an unprovable batch being generated on Abstract. This led to a delay in finalization and withdrawals from the chain while the incident was being resolved. The sequencer VM was patched, and the fix was rolled out to all chains in the Elastic Network.

An emergency protocol upgrade was prepared and deployed to finalize the previously unproved batch. The ZKsync Security Council manually validated the batch that originally contained an issue.

No user funds were ever at risk and transactions were still processed normally during the incident.

Root cause

ZKsync has two different implementations of EraVM. The sequencer VM executes transactions quickly and returns results to users, while the prover VM later re-executes the same batch inside a ZK circuit to generate the proof. Both VMs must produce identical state roots and any behavioural drift makes the batch unprovable and therefore it is not possible to finalize such batch.

EraVM manages data in memory pages. The prover VM always expects a page to be available once it is allocated. However, the sequencer VM erases un-needed pages at the end of a call frame to save RAM. This asymmetry became critical inside CodeOracle, the contract that de-commits user-supplied bytecode into new memory page or loads from the existing one.

The bug appeared in transaction ran out of gas inside the CodeOracle system contract, right after bytecode de-commitment. In that situation, the CodeOracle call frame reverts but the memory page with the bytecode is already created. It causes the sequencer VM to immediately erase the page, even though CodeOracle had already registered the page as containing valid bytecode.

A later transaction in the same batch reused the now-cleared page and read zeros, while the prover VM expected the correct code. The two VMs therefore diverged, and the batch could not be proven.

Impact

Because the prover could not reproduce the sequencer’s state root, the affected batch #16529 on the Abstract network and subsequent batches until batch #16788, remained not finalized. No other batches or networks were affected. Those transactions that accessed the problematic memory page were incorrectly marked as failed by the sequencer VM: they paid gas and had their nonces bumped, but their intended state changes did not appear onchain.

Finalization on Abstract halted for roughly two days while the Security Council replayed the batch. During this window all new transactions were still accepted, ordered, and executed onchain, but they could not be finalized and so withdrawals were not possible to finalize on L1. User balances remained correct onchain and no funds were ever at risk. The issue affected 10 transactions and the ability to finalize and bridge assets until the fix was deployed and the backlog of batches was proven.

Mitigation efforts

On Wednesday evening (UTC-time), May 13, the Matter Labs and Abstract engineering teams were alerted after the prover failed to generate a batch. The Security Council was informed about the issue and was asked to independently verify the batch in question. The Emergency Board, made up of the Security Council, Guardians, and the ZKsync Foundation were alerted and asked to be prepared for a potential emergency upgrade.

By around 03:00 UTC on Thursday, May 14, the teams had isolated the root cause and confirmed that the problem was limited to ten transactions failing inside Abstract batch 16529. The Matter Labs team prepared a node patch to fix the root cause and began updating ZK chains to the patched node version.

To resolve the problem with the unprovable batch, two options were considered:

(1) revert the batch; or
(2) execute an emergency upgrade to approve it.

As the batch contained no corrupted storage writes, the Security Council decided to manually review and approve it. Once approved, the patched sequencer replayed batch 16529, the backlog of batches finalized, and normal operation resumed.

On Friday, May 15, all mainnet ZK chains were patched and the emergency upgrade was executed.

Scope

Versions with patch:

Node
- V28 - v28.2.1
- V27 - v27.5.7
Prover
- v27 - 20.3.4
- v28 - 21.1.0

The transaction that reproduced the issue: abscan.org

The node patch: Commit d3ef62f

Closing statement

We appreciate the support of the Elastic Network chains and partners during this incident and are sorry for the disruption it caused. Thank you to the teams, including chain and node operators, who worked rapidly to apply the patches.