Skip to content

Commit 5fd53eb

Browse files
MaxGhenisclaude
andauthored
Fix DC SNAP state target drop caused by FIPS type coercion (#772)
`_add_snap_metric_columns` in utils/loss.py wrote `STATE_ABBR_TO_FIPS["DC"] = 11` (int) before looking up state FIPS strings. The subsequent comparison `state_fips == r.GEO_ID[-2:]` compares DC's int 11 to the string "11", which is always False, so DC SNAP state calibration targets were silently dropped. Worse, `STATE_ABBR_TO_FIPS` is the shared module-level dict from `storage/calibration_targets/pull_soi_targets.py` — DC is already defined there as the string "11" — so the write also corrupted the dict for every downstream consumer (etl_irs_soi.py, pull_soi_targets.py) after the function ran. The fix is to delete the line. DC is already present as "11" (str) in the canonical dict. Add a regression test that asserts all STATE_ABBR_TO_FIPS entries are 2-character strings and that importing `utils.loss` does not mutate the dict. Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 3268136 commit 5fd53eb

3 files changed

Lines changed: 44 additions & 1 deletion

File tree

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
Fix DC SNAP state calibration target drop caused by int/string FIPS mismatch in utils/loss.py.

policyengine_us_data/utils/loss.py

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1129,7 +1129,6 @@ def _add_snap_metric_columns(
11291129

11301130
state = sim.calculate("state_code", map_to="person").values
11311131
state = sim.map_result(state, "person", "household", how="value_from_first_person")
1132-
STATE_ABBR_TO_FIPS["DC"] = 11
11331132
state_fips = pd.Series(state).apply(lambda s: STATE_ABBR_TO_FIPS[s])
11341133

11351134
for _, r in snap_targets.iterrows():

tests/unit/test_state_fips.py

Lines changed: 43 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,43 @@
1+
"""Tests for the shared STATE_ABBR_TO_FIPS dict used across calibration code."""
2+
3+
import pytest
4+
5+
6+
def test_dc_fips_is_string():
7+
"""DC's FIPS in the canonical dict is a string '11' (matches GEO_ID suffix)."""
8+
from policyengine_us_data.storage.calibration_targets.pull_soi_targets import (
9+
STATE_ABBR_TO_FIPS,
10+
)
11+
12+
assert STATE_ABBR_TO_FIPS["DC"] == "11"
13+
assert isinstance(STATE_ABBR_TO_FIPS["DC"], str)
14+
15+
16+
def test_all_state_fips_are_strings():
17+
"""All entries in STATE_ABBR_TO_FIPS are strings — downstream code compares
18+
against ``r.GEO_ID[-2:]`` which is a string slice, so any int entry would
19+
silently fail comparison and drop that state's targets."""
20+
from policyengine_us_data.storage.calibration_targets.pull_soi_targets import (
21+
STATE_ABBR_TO_FIPS,
22+
)
23+
24+
for abbr, fips in STATE_ABBR_TO_FIPS.items():
25+
assert isinstance(fips, str), f"{abbr} FIPS {fips!r} is not a string"
26+
assert len(fips) == 2, f"{abbr} FIPS {fips!r} is not a 2-character string"
27+
28+
29+
def test_loss_module_does_not_mutate_state_fips_on_import():
30+
"""Regression for the DC SNAP calibration drop bug.
31+
32+
Previously ``_add_snap_metric_columns`` wrote ``STATE_ABBR_TO_FIPS["DC"] = 11``
33+
(int) inline, corrupting the shared dict the moment the function ran. The fix
34+
removed that line. Even at import time the dict should be untouched.
35+
"""
36+
from policyengine_us_data.storage.calibration_targets.pull_soi_targets import (
37+
STATE_ABBR_TO_FIPS,
38+
)
39+
40+
before = dict(STATE_ABBR_TO_FIPS)
41+
import policyengine_us_data.utils.loss # noqa: F401
42+
43+
assert STATE_ABBR_TO_FIPS == before

0 commit comments

Comments
 (0)