Skip to content

Latest commit

 

History

History
148 lines (103 loc) · 7.35 KB

File metadata and controls

148 lines (103 loc) · 7.35 KB

policyengine-uk-rust — Caretaker Skill

Repo overview

Rust microsimulation engine for the UK tax-benefit system, with a Python wrapper (interfaces/python/policyengine_uk_compiled). Simulates income tax, NI, UC, Child Benefit, and 10+ other programmes. ~0.1ms per household.

Running simulations

import json, subprocess

binary = './target/release/policyengine-uk-rust'
data_dir = '~/.policyengine-uk-data/frs'   # clean FRS CSVs

# Baseline
result = json.loads(subprocess.run(
    [binary, '--data', data_dir, '--year', '2023', '--output', 'json'],
    capture_output=True, text=True
).stdout)

# Reform
result = json.loads(subprocess.run(
    [binary, '--data', data_dir, '--year', '2023', '--output', 'json',
     '--policy-json', json.dumps({'universal_credit': {'taper_rate': 0.50}})],
    capture_output=True, text=True
).stdout)

For microdata (per-entity DataFrames): use --output-microdata-stdout and parse the ===PERSONS=== / ===BENUNITS=== / ===HOUSEHOLDS=== sections.

Key CLI flags

Flag Purpose
--data DIR Clean CSV base dir (YYYY/persons.csv etc.)
--year YYYY Fiscal year — determines which parameter set to load
--policy-json '{...}' Inline JSON reform overlay
--output json Machine-readable aggregate output
--output-microdata-stdout Per-entity CSVs to stdout
--export-params-json Dump baseline parameters
--uprate-to YYYY With --extract: uprate dataset to target year before writing clean CSVs

Data

  • FRS clean CSVs live in ~/.policyengine-uk-data/frs/YYYY/ (persons.csv, benunits.csv, households.csv).
  • FRS 2023 is the main microdata year in use.
  • FRS under-reports UC receipt: model gets ~3.85m UC households vs ~6.4m in reality. This is structural — would_claim_uc is set from FRS-reported receipt only. Any cost estimate from this model will be ~60% of an OBR/full-population estimate.

UC simulation — key mechanics

  • UC is only awarded to on_uc_system benunits: those who reported UC in FRS (on_uc=True), or who migrated from legacy benefits (on_legacy=True and migration_seed < migration_rate).
  • Taper applies to net earned income (gross − income tax − NI − pension contribs) above the work allowance, at taper_rate (baseline 55%).
  • Work allowance only applies if the benunit has children or LCWRA. Rate is higher (£684/mo) without housing costs, lower (£411/mo) with.
  • Unearned income (savings, private pension, maintenance, property, other) reduces UC pound-for-pound.

Known model limitations

  • Caseload undercount: FRS-reported UC receipt is ~60% of actual. Policy costings will be proportionally lower than OBR/DWP estimates. For a taper 55%→50% reform, model gives ~£0.87bn (2023 FRS); scaled to real caseload ~£1.45–1.6bn. OBR-style estimates would be ~£2bn+.
  • Migration rates (uc_migration.* params: HB 70%, TC 95%, IS 65%) bring some legacy claimants onto UC, adding ~685k households, but don't close the gap.
  • UC earnings cutout for a single adult with no children and no rent is ~£8,700 gross (low, by design). Higher earners with rent/children can receive UC at much higher incomes.

Investigating a reform

Always run the reform before diagnosing. Workflow:

  1. Run baseline and reform with --output json, check program_breakdown.universal_credit.
  2. If surprising, switch to --output-microdata-stdout and merge benunits + persons to see the earner distribution, caseload, and per-household gains.
  3. Sanity-check a single household manually (use --stdin-data with a constructed payload) and trace through the formula by hand before touching the code.

Multi-dataset support

Five datasets are supported. FRS, LCFS, WAS, and SPI use the two-step flow: --extract to produce clean CSVs, then --data to simulate. EFRS is a composite built from FRS + WAS + LCFS and supports wealth and consumption taxes.

Dataset Flag Year arg Clean output
FRS --frs <tab_dir> FRS survey year data/frs/YYYY/
LCFS --lcfs <tab_dir> LCFS survey year data/lcfs/YYYY/
WAS --was <tab_dir> WAS survey year data/was/YYYY/
SPI --spi <tab_dir> Fiscal start year data/spi/YYYY/
EFRS --extract-efrs <out_dir> FRS survey year data/efrs/YYYY/

EFRS (Enhanced FRS): imputes wealth from WAS and consumption from LCFS onto FRS microdata using random forest models. Required to model wealth taxes (wealth_tax, stamp_duty, CGT). Build with:

./policyengine-uk-rust --extract-efrs data/efrs/2023 \
  --data data/frs --year 2023 \
  --was-dir raw/was/round_7/ \
  --lcfs-dir raw/lcfs/2021/

Pre-built clean EFRS CSVs are on GCS at gs://policyengine-uk-microdata/efrs/YYYY/ and downloaded automatically via ensure_dataset("efrs", year) in the Python wrapper. Rebuild using python scripts/rebuild_all.py --only efrs.

Wealth tax note: capital_gains defaults to zero on all datasets since no UK survey records realised gains. CGT reform modelling requires manually setting capital_gains per person via --stdin-data or a custom dataset.

# Extract
./policyengine-uk-rust --lcfs raw/lcfs_2023/tab/ --year 2023 --extract data/lcfs/2023/
./policyengine-uk-rust --spi  raw/spi_2022/tab/  --year 2022 --extract data/spi/2022/

# Simulate
./policyengine-uk-rust --data data/lcfs --year 2023 --output json
./policyengine-uk-rust --data data/spi  --year 2022 --output json --persons-only

SPI note: Use --persons-only — SPI has no household structure so benefit/decile outputs are meaningless. Output is a JSON array of per-person records with {person_id, weight, income fields, baseline/reform tax}.

LCFS note: Only 12 top-level COICOP categories are stored (p601–p612) plus petrol/diesel. Product-level codes are not kept. LCFS has ~4,200 households so aggregate totals are small but consumption patterns are correct.

SPI file naming: Newer files use put{yy}{yy+1}uk.tab (e.g. put2223uk.tab for 2022/23); older files use put{YYYY}uk.tab. The loader detects both automatically.

UKDS data: LCFS (SN 9468), WAS (SN 7215), SPI (SN 9422) are all under project ecf0b3c4-29d2-4d8a-931d-0e3773a4ac0b. Download tab zips from UKDS MCP and unzip before extracting.

Versioning and releasing

Versions are managed via pyproject.toml (the source of truth) and towncrier-style changelog fragments in changelog.d/.

After a new version is published to PyPI, trigger a redeploy of the chat app:

gh workflow run redeploy-on-package-update.yml --repo PolicyEngine/policyengine-uk-chat
  • Do not edit CHANGELOG.md or Cargo.toml versions directly — they are updated automatically by CI.
  • To ship a change, drop a fragment file in changelog.d/ with the naming convention <slug>.<type>:
File suffix Semver bump
.fixed patch
.changed patch
.added minor
.removed minor
.breaking major

Example: changelog.d/parse-id-list-delimiters.fixed

The content of the file is the human-readable changelog entry. CI runs .github/bump_version.py to infer the bump from fragment types, update pyproject.toml, then publish-git-tag.sh to tag and release.

Building

cargo build --release

Tests: cargo test

Parameter files

parameters/YYYY_YY.yaml — one file per fiscal year. All UC, IT, NI, benefit cap, etc. parameters. See LEGISLATIVE_REFERENCE.md for statutory citations.