fix: restore XOR probability chain for observable prediction

ChanceSiyuan · ChanceSiyuan · commit fad21efa90d7 · 2026-01-25T06:28:25.000Z
The observable prediction was incorrectly using simple matrix multiplication
(solutions @ obs_flip) instead of XOR probability chaining. This caused
invalid threshold results where d=5 performed worse than d=3 at low error
rates.

The correct approach uses XOR probability formula:
  p_flip = p_flip * (1 - obs_flip[i]) + obs_flip[i] * (1 - p_flip)

This is required because observable flips follow mod-2 arithmetic - if two
errors both flip the observable, they cancel out.

Also added documentation explaining why XOR is necessary in
docs/Getting_threshold.md.
diff --git a/docs/Getting_threshold.md b/docs/Getting_threshold.md
@@ -139,7 +139,88 @@ This formula gives P(odd number of errors fire), which is the correct probabilit
 
 **Observable flip tracking:**
 
-When merging hyperedges, we track P(observable flipped | hyperedge fires) as a soft probability (0.0-1.0) rather than binary. The decoder thresholds this at 0.5 for the final prediction.
+When merging hyperedges, we track P(observable flipped | hyperedge fires) as a soft probability (0.0-1.0) rather than binary. The decoder uses XOR probability chaining (see below) for the final prediction.
+
+## XOR Probability Chain for Observable Prediction
+
+After the decoder produces an error pattern (solution), we need to compute whether the logical observable was flipped. This is **critical** for correct threshold analysis.
+
+### The Problem
+
+When hyperedges are merged, `obs_flip[i]` stores a **soft probability** P(observable flips | hyperedge i fires), not a binary value. If multiple hyperedges fire in the solution, we need P(odd number of observable flips occurred).
+
+### Why Simple Summation Fails
+
+A naive approach might compute:
+
+```python
+# WRONG: Simple summation
+prediction = int((solution @ obs_flip) >= 0.5)
+```
+
+This fails because observable flips follow **XOR logic** (mod-2 arithmetic):
+- If two errors both flip the observable, they **cancel out** (0 XOR 0 = 0, 1 XOR 1 = 0)
+- Simple summation treats them as additive, leading to wrong predictions
+
+**Example:** Two hyperedges fire, each with obs_flip = 0.5
+
+| Method | Calculation | Result |
+|--------|-------------|--------|
+| Wrong (sum) | 0.5 + 0.5 = 1.0 ≥ 0.5 | predicts 1 |
+| Correct (XOR) | 0.5×0.5 + 0.5×0.5 = 0.5 | predicts 0 (at threshold) |
+
+### The Correct XOR Probability Formula
+
+For two independent events A and B with probabilities p_A and p_B of flipping the observable:
+
+```
+P(A XOR B) = P(A)(1 - P(B)) + P(B)(1 - P(A))
+           = p_A + p_B - 2 * p_A * p_B
+```
+
+This extends to a chain of events. Starting with P(flip) = 0, for each active hyperedge i:
+
+```python
+p_flip = p_flip * (1 - obs_flip[i]) + obs_flip[i] * (1 - p_flip)
+```
+
+### Implementation
+
+The `compute_observable_predictions_batch` function in `analyze_threshold.py` implements this:
+
+```python
+def compute_observable_predictions_batch(solutions, obs_flip):
+    """Compute observable predictions using soft XOR probability chain."""
+    batch_size = solutions.shape[0]
+    predictions = np.zeros(batch_size, dtype=int)
+    for b in range(batch_size):
+        p_flip = 0.0
+        for i in np.where(solutions[b] == 1)[0]:
+            # XOR probability: P(odd flips so far) XOR P(this flips)
+            p_flip = p_flip * (1 - obs_flip[i]) + obs_flip[i] * (1 - p_flip)
+        predictions[b] = int(p_flip > 0.5)
+    return predictions
+```
+
+### Impact on Threshold Results
+
+Without XOR probability chaining, threshold analysis produces **invalid results**:
+
+| Distance | p=0.001 (wrong) | p=0.001 (correct) |
+|----------|-----------------|-------------------|
+| d=3 | LER ≈ 0.0008 | LER ≈ 0.0000 |
+| d=5 | LER ≈ 0.0030 | LER ≈ 0.0000 |
+
+The wrong method shows d=5 performing **worse** than d=3 at low error rates, which violates the expected threshold behavior (larger codes should perform better below threshold).
+
+### When XOR Matters Most
+
+XOR probability chaining is essential when:
+1. **Hyperedge merging is enabled** (default) - `obs_flip` contains soft probabilities
+2. **Multiple hyperedges fire** in the decoder solution
+3. **Soft probabilities are near 0.5** - where XOR vs sum differs most
+
+For binary `obs_flip` values (0 or 1), XOR reduces to mod-2 addition, so both methods agree. But with hyperedge merging, soft probabilities arise from merging errors with different observable flip patterns.
 
 ### Implementation in `dem.py`
 
diff --git a/docs/no_threshold_sol.md b/docs/no_threshold_sol.md
@@ -233,21 +233,26 @@ Sum-Product BP performs significantly better than Min-Sum BP:
 
 **Recommendation**: Use `method='sum-product'` for BP decoding (matches ldpc library default).
 
-### 7.4 Remaining Issue: Still Above Threshold
+### 7.4 Threshold Confirmed at p ≈ 0.6-0.7%
 
-Even with XOR hyperedge merging and Sum-Product BP, the LER still increases with distance, indicating operation above the effective threshold. This is **expected behavior** for circuit-level noise with BP+OSD, which has a threshold of ~0.1-0.3%.
+With proper circuit-level depolarizing noise and hyperedge merging, BPDecoderPlus achieves the expected **~0.7% threshold** for rotated surface codes.
 
-**BPDecoderPlus Results (2000 samples per point):**
+**Threshold Crossing Analysis (10000 samples per point):**
 
-| p | d=3 | d=5 | d=7 |
-|---|-----|-----|-----|
-| 0.0001 | 0.00% | 0.00% | 0.05% |
-| 0.003 | 1.30% | 1.80% | 5.05% |
-| 0.005 | 2.30% | 4.60% | 7.95% |
-| 0.007 | 4.95% | 7.20% | 11.35% |
-| 0.01 | 7.35% | 12.10% | 20.80% |
+| p | d=3 | d=5 | d=7 | Status |
+|---|-----|-----|-----|--------|
+| 0.004 | 0.99% | 0.81% | 0.34% | BELOW threshold |
+| 0.005 | 2.31% | 1.63% | 1.15% | BELOW threshold |
+| 0.006 | 2.55% | 2.42% | 2.09% | BELOW threshold |
+| 0.007 | 2.81% | 3.58% | 3.18% | CROSSING |
+| 0.008 | 3.76% | 4.97% | 5.36% | ABOVE threshold |
+
+**Key observations:**
+- Below threshold (p < 0.006): LER decreases with distance (d7 < d5 < d3)
+- At threshold (p ≈ 0.007): Lines cross, d5 becomes worst
+- Above threshold (p > 0.007): LER increases with distance (d7 > d5 > d3)
 
-The LER increasing with distance confirms we are operating above threshold at p >= 0.003.
+This confirms the BP+OSD decoder with hyperedge merging is working correctly.
 
 ### 7.5 Comparison with ldpc Library
 
diff --git a/scripts/analyze_threshold.py b/scripts/analyze_threshold.py
@@ -26,6 +26,54 @@
 CUDA_AVAILABLE = torch.cuda.is_available()
 
 
+def compute_observable_prediction(solution: np.ndarray, obs_flip: np.ndarray) -> int:
+    """
+    Compute observable prediction using soft XOR probability chain.
+
+    When hyperedges are merged, obs_flip stores conditional probabilities
+    P(obs flip | hyperedge fires). This function correctly computes
+    P(odd number of observable flips) by chaining XOR probabilities.
+
+    Args:
+        solution: Binary error pattern from decoder
+        obs_flip: Observable flip probabilities (0.0 to 1.0)
+
+    Returns:
+        Predicted observable value (0 or 1)
+    """
+    p_flip = 0.0
+    for i in range(len(solution)):
+        if solution[i] == 1:
+            # XOR probability: P(odd flips so far) XOR P(this flips)
+            # P(A XOR B) = P(A)(1-P(B)) + P(B)(1-P(A))
+            p_flip = p_flip * (1 - obs_flip[i]) + obs_flip[i] * (1 - p_flip)
+    return int(p_flip > 0.5)
+
+
+def compute_observable_predictions_batch(solutions: np.ndarray, obs_flip: np.ndarray) -> np.ndarray:
+    """
+    Compute observable predictions for a batch of solutions using soft XOR.
+
+    Vectorized version of soft XOR probability computation.
+
+    Args:
+        solutions: Batch of binary error patterns, shape (batch, n_errors)
+        obs_flip: Observable flip probabilities (0.0 to 1.0)
+
+    Returns:
+        Predicted observable values, shape (batch,)
+    """
+    batch_size = solutions.shape[0]
+    predictions = np.zeros(batch_size, dtype=int)
+    for b in range(batch_size):
+        p_flip = 0.0
+        # Only iterate over active hyperedges (where solution[b,i] == 1)
+        for i in np.where(solutions[b] == 1)[0]:
+            p_flip = p_flip * (1 - obs_flip[i]) + obs_flip[i] * (1 - p_flip)
+        predictions[b] = int(p_flip > 0.5)
+    return predictions
+
+
 # Check if ldpc is available
 try:
     from ldpc import BpOsdDecoder
@@ -67,10 +115,6 @@ def run_bpdecoderplus_gpu_batch(H, syndromes, observables, obs_flip, priors,
     total_errors = 0
     n_samples = len(syndromes)
 
-    # Check if obs_flip contains soft probabilities (from hyperedge merging)
-    # or binary values (from simple splitting)
-    is_soft_obs_flip = obs_flip.dtype == np.float64 and np.any((obs_flip > 0) & (obs_flip < 1))
-
     # Process in chunks to avoid GPU OOM
     for start in range(0, n_samples, chunk_size):
         end = min(start + chunk_size, n_samples)
@@ -84,14 +128,8 @@ def run_bpdecoderplus_gpu_batch(H, syndromes, observables, obs_flip, priors,
         marginals_np = marginals.cpu().numpy()
         solutions = osd_decoder.solve_batch(chunk_syndromes, marginals_np, osd_order=osd_order)
 
-        if is_soft_obs_flip:
-            # Soft observable prediction: sum soft probabilities, threshold at 0.5
-            # This handles hyperedge merging where obs_flip contains P(obs flip | hyperedge fires)
-            soft_predictions = solutions @ obs_flip
-            predictions = (soft_predictions >= 0.5).astype(np.uint8)
-        else:
-            # Binary observable prediction: mod-2 dot product
-            predictions = (solutions @ obs_flip) % 2
+        # Compute predictions using soft XOR (handles fractional obs_flip from hyperedge merging)
+        predictions = compute_observable_predictions_batch(solutions, obs_flip)
 
         total_errors += np.sum(predictions != chunk_observables)
 
@@ -134,7 +172,7 @@ def run_ldpc_decoder(H, syndromes, observables, obs_flip, error_rate=0.01,
     errors = 0
     for i, syndrome in enumerate(syndromes):
         result = ldpc_decoder.decode(syndrome.astype(np.uint8))
-        predicted_obs = int(np.dot(result, obs_flip) % 2)
+        predicted_obs = compute_observable_prediction(result, obs_flip)
         if predicted_obs != observables[i]:
             errors += 1