Density matrix simulations of the 2D surface code up to surface-97
Quantum Error Correction (QEC) is arguably one of the most important ingredients in unlocking fully fault tolerant quantum computing. Analysing the behaviour and performance of different error correction codes in the presence of different types of noise is of central importance for the development and improvement of QEC protocols. Certain theoretical bounds have been established through analytical derivations, but these often rely on simplified models of noise. In order to investigate the effect of more realistic noise models on code performance, numerical simulations are needed.
In this blog post, we use Fermioniq’s quantum circuit emulator Ava to simulate quantum error correction for 2D surface codes within the density matrix framework. By leveraging Ava’s ability to simulate large noisy quantum circuits with intermediate measurements, we go as large as the surface-97 code. As far as we are aware, this is one of the largest simulations of error correcting codes with generic noise that has been performed.
Simulating quantum error correction
It is possible to simulate certain classes of quantum circuits and noise models efficiently (e.g. those composed of Clifford gates), but these unfortunately do not capture the vast majority of realistic noise models. Instead, full density matrix simulations can be used; however, these are extremely resource-demanding: full density matrix simulations on a 574 GB RAM device like Nvidia’s GH200 are limited to only 18 qubits. Trajectory-based emulations may also be used, but these can only go up to 36 qubits when using statevector simulations (with 574 GB RAM available), and typically come with the additional disadvantage that very many trajectories are required for expectation values to converge.
Large quantum circuits that generate a limited amount of entanglement can be efficiently and accurately simulated using tensor networks. Ava uses a tensor network description of the density matrix which allows it to emulate noisy circuits acting on many more than 18 qubits in the presence of generic noise channels. To demonstrate Ava’s capabilities, in this blog post we focus on amplitude-damping noise, a non-Clifford type of noise which is hard to simulate classically.
Surface codes
Similarly to classical error correction codes (like Hamming codes), QEC codes use redundancy when encoding a single logical qubit into several data qubits. In addition to the data qubits, one also needs ancilla qubits in order to detect errors. The data and ancillary qubits together are referred to as physical qubits. Each error detection round, the data qubits are entangled with the ancilla qubits, which are then measured in order to extract so-called error syndromes without altering the logical state of the code. Finally, a correction is applied based on the obtained syndrome.
One of the most popular and well studied QEC codes is the surface code. This is a family of error correcting codes that depend on an integer d, the code distance, which is the number of local operations on data qubits required to change the logical state of the code. A distance-d surface code consists of a d by d grid of data qubits, and (d^2) -1 ancilla qubits placed in between. The physical qubit layout of surface code 17 (d=3) and surface code 97 (d=7) can be seen in Figure 1. The ancilla qubits are used for carrying out either X or Z parity checks on groups of neighbouring data qubits. Such parity checks are performed via the quantum circuits in Figure 2.
Idle Qubit with Amplitude Damping Noise
In the experiments described in this blog post, we study how well the surface codes can preserve the logical zero state under amplitude damping noise on the data qubits. Amplitude damping is a common problem in quantum hardware development, where the (physical) state 1 of a data qubit spontaneously changes to the 0 state.
We examine 3 sizes of surface code (17, 49 & 97) under varying noise strengths (i.e. for several values of the amplitude damping parameter γ∈[0.01, 0.05, 0.10]). For each experiment we perform 5 rounds of error detection. The circuit diagram of a single round can be seen in Figure 3. The round starts with the application of amplitude damping noise channels to all data qubits, introducing the errors that we will later correct for. We then perform Z-parity and X-parity checks for every ancillary qubit, the outcomes of which define the syndrome. Finally, in the post-processing phase, the syndrome is fed to a Minimum Weight Perfect Matching decoder (from the pymatching library) which infers the most likely set of errors that occurred given the syndrome.
Because the qubits are all initialised in the 0 state, neither the amplitude damping channels nor the Z-parity gates and measurements have any effect on the state in the first round. The X-parity gates + measurements on the other hand ensure that we prepare one of the degenerate logical zero states. Consequently, it is not until round 2 that we expect to actually see any errors occur.
Given that we are running a density matrix simulation, it is possible to obtain a mixture of different logical states. We define the success rate as the probability of reading out the unchanged logical zero state from the data qubits, after correcting any inferred errors. To compute the success rate, we sample 10,000 bit-strings after every error detection round. (This can easily be done within the simulation without altering the quantum state, since we have access to the tensor network representation of its density matrix). Based on the syndrome, we then correct any potential errors that occurred by flipping the erroneous bits of the sampled bit-strings. Finally, the logical state of the corrected bit-string is determined by the parity of (the sum of) all bits in the bit-string.
When decoding the errors with pymatching, one has the option of either using only the latest round of syndrome measurements, or also including measurement outcomes from earlier rounds. In our experiments, we saw (as expected) that including the history of measurements generally improved the decoding, motivating us to use this way of decoding for all results presented below.
Ava also supports trajectory based simulations, and can combine these with density matrix simulations. For this experiment, the noise was simulated via a density matrix simulation, while trajectories over measurement outcomes were sampled by collapsing the state after each X and Z parity measurement. For each combination of parameters (surface code size d and noise strength γ) we simulated 50 trajectories. All simulations were performed on a single NVIDIA GH200 Grace Hopper superchip.
Results
The results from the simulations with γ=0.01 can be seen in Figure 4. The x-axis shows the number of error detection cycles, and the y-axis the success rate. Each trajectory is drawn as a thin line. Since most trajectories have success rates close to 1 and therefore overlap, we observe a dense region near the top. As one might expect, surface-17 and surface-49 have more trajectories with success rates lower than 1 as compared to surface-97. The stars show the average success rate over all the trajectories.
Because Ava is an approximate emulator, each trajectory has an associated fidelity, which is a measure of how closely the simulated density matrix resembles the exact density matrix of the state. A higher number corresponds to a more accurate simulation, and a fidelity of 1 means that the simulation was exact. The distribution of fidelities for the different trajectories can be seen in the histograms in the lower row of the plot. We find that for each of the surface codes, all trajectories were simulated with fidelity f>0.99.
The same plot for γ=0.05 can be seen in Figure 5. The impact of the noise is more noticeable here, as there are more trajectories with success rates lower than 1. In general, the success rate decreases with the number of noise channels and error detection rounds that are applied. Surprisingly, we notice that the average success rate is actually lower here for surface-97 than surface-49, for k=4 and k=5. A more careful analysis showed that this can be attributed to a single outlier trajectory, whose success rate dropped to 0.0 for these values of k. The reason for this outlier might be an error in the decoding step, which led to an erroneous bit-string correction. We expect that the lower success rate for the largest surface code is simply a statistical error that would be mitigated by running more trajectories (recall that we sample only 50 trajectories here, which is relatively few).
We again note that the fidelities are high (only 2 trajectories from surface-97 attain a fidelity lower than 0.99) demonstrating that Ava is more than capable of accurately simulating surface codes of distance up to 7 at noise strength γ=0.05.
In the final Figure 6 we see the results for γ=0.10. Here the effects of noise are even more present. The success rate quickly drops with increasing k for all surface codes. The larger codes are, as expected, better at preserving the logical state and their success rate drops more gradually with k.
At noise strength γ=0.10 we notice a drop in fidelity for the largest surface code, where 18 trajectories out of 50 have fidelity smaller than 0.99. However, most trajectories still have relatively large fidelities, with 7 trajectories having fidelities smaller than 0.85. The reason that higher noise strength simulations have a somewhat lower fidelity is that more noise introduces more classical correlations which make it harder for the tensor network to represent the density matrix.
Conclusion
We have explored the capabilities of Ava by simulating quantum error correction rounds for large surface codes, significantly beyond the size that can be simulated by statevector methods. This involved running density matrix simulations with non-Clifford noise while sampling trajectories over the error detection measurement outcomes. Our simulations confirmed the already established fact that bigger surface codes mitigate the effects of noise better than smaller ones. We also observed that simulation fidelities remained high – above 0.99 for most of the simulated trajectories except for some in the high-noise regime – demonstrating the capabilities of Ava to accurately simulate large noisy circuits relevant to quantum error correction.
While these simulations provided interesting insights into the workings of the surface code, they represented only a small proof-of-concept benchmark to evaluate the ability of Ava to simulate quantum circuits relevant to quantum error correction, and leave a lot of room for extension. For instance, Ava also supports more flexible and sophisticated noise models that can be tailored to more accurately represent actual quantum devices, by modelling cross talk or using different noise channels for different qubits. Another interesting extension could be to explore different code geometries – e.g. the toric code, a periodic form of the surface code, which would be possible to emulate since Ava supports all-to-all qubit connectivity. One could also implement logical gates in terms of operations on physical qubits, and compute effective logical error rates using an accurate physical-qubit noise model.