Fix DC SNAP state target drop caused by FIPS type coercion (#772)

MaxGhenis · claude · web-flow · commit 5fd53eb31db7 · 2026-04-17T08:51:18.000-04:00
`_add_snap_metric_columns` in utils/loss.py wrote
`STATE_ABBR_TO_FIPS["DC"] = 11` (int) before looking up state FIPS
strings. The subsequent comparison `state_fips == r.GEO_ID[-2:]`
compares DC's int 11 to the string "11", which is always False, so
DC SNAP state calibration targets were silently dropped.

Worse, `STATE_ABBR_TO_FIPS` is the shared module-level dict from
`storage/calibration_targets/pull_soi_targets.py` — DC is already
defined there as the string "11" — so the write also corrupted the
dict for every downstream consumer (etl_irs_soi.py, pull_soi_targets.py)
after the function ran.

The fix is to delete the line. DC is already present as "11" (str)
in the canonical dict.

Add a regression test that asserts all STATE_ABBR_TO_FIPS entries are
2-character strings and that importing `utils.loss` does not mutate
the dict.

Co-authored-by: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/changelog.d/fix-dc-snap-fips-coercion.fixed.md b/changelog.d/fix-dc-snap-fips-coercion.fixed.md
@@ -0,0 +1 @@
+Fix DC SNAP state calibration target drop caused by int/string FIPS mismatch in utils/loss.py.
diff --git a/policyengine_us_data/utils/loss.py b/policyengine_us_data/utils/loss.py
@@ -1129,7 +1129,6 @@ def _add_snap_metric_columns(
 
     state = sim.calculate("state_code", map_to="person").values
     state = sim.map_result(state, "person", "household", how="value_from_first_person")
-    STATE_ABBR_TO_FIPS["DC"] = 11
     state_fips = pd.Series(state).apply(lambda s: STATE_ABBR_TO_FIPS[s])
 
     for _, r in snap_targets.iterrows():
diff --git a/tests/unit/test_state_fips.py b/tests/unit/test_state_fips.py
@@ -0,0 +1,43 @@
+"""Tests for the shared STATE_ABBR_TO_FIPS dict used across calibration code."""
+
+import pytest
+
+
+def test_dc_fips_is_string():
+    """DC's FIPS in the canonical dict is a string '11' (matches GEO_ID suffix)."""
+    from policyengine_us_data.storage.calibration_targets.pull_soi_targets import (
+        STATE_ABBR_TO_FIPS,
+    )
+
+    assert STATE_ABBR_TO_FIPS["DC"] == "11"
+    assert isinstance(STATE_ABBR_TO_FIPS["DC"], str)
+
+
+def test_all_state_fips_are_strings():
+    """All entries in STATE_ABBR_TO_FIPS are strings — downstream code compares
+    against ``r.GEO_ID[-2:]`` which is a string slice, so any int entry would
+    silently fail comparison and drop that state's targets."""
+    from policyengine_us_data.storage.calibration_targets.pull_soi_targets import (
+        STATE_ABBR_TO_FIPS,
+    )
+
+    for abbr, fips in STATE_ABBR_TO_FIPS.items():
+        assert isinstance(fips, str), f"{abbr} FIPS {fips!r} is not a string"
+        assert len(fips) == 2, f"{abbr} FIPS {fips!r} is not a 2-character string"
+
+
+def test_loss_module_does_not_mutate_state_fips_on_import():
+    """Regression for the DC SNAP calibration drop bug.
+
+    Previously ``_add_snap_metric_columns`` wrote ``STATE_ABBR_TO_FIPS["DC"] = 11``
+    (int) inline, corrupting the shared dict the moment the function ran. The fix
+    removed that line. Even at import time the dict should be untouched.
+    """
+    from policyengine_us_data.storage.calibration_targets.pull_soi_targets import (
+        STATE_ABBR_TO_FIPS,
+    )
+
+    before = dict(STATE_ABBR_TO_FIPS)
+    import policyengine_us_data.utils.loss  # noqa: F401
+
+    assert STATE_ABBR_TO_FIPS == before

Original file line number	Diff line number	Diff line change
`@@ -0,0 +1 @@`
	`1`	`+Fix DC SNAP state calibration target drop caused by int/string FIPS mismatch in utils/loss.py.`