Skip to content

Ascertainment Model Abstraction #777

@cdc-mitzimorris

Description

@cdc-mitzimorris

This proposal introduces an explicit ascertainment model abstraction for PyRenew to support:

  • fixed ascertainment
  • correlated ascertainment across signals
  • time-varying ascertainment

The goal is to make shared structure in multi-signal models explicit and readable in the builder API, while preserving the simplicity of the current observation process design.

This proposal does not introduce a separate abstraction for independent ascertainment. That case is already naturally supported by the current API.


Background

Currently, ascertainment is specified inside each observation process:

Counts(
    name="hospital",
    ascertainment_rate_rv=DistributionalVariable("ihr", dist.Beta(1, 100)),
    ...
)

This works well when:

  • each signal has its own independent ascertainment rate
  • or a shared RV is manually passed across signals

However, it becomes awkward when ascertainment has structure across signals, such as:

  • correlation between signals (e.g., hospital and ED)
  • shared latent processes
  • time-varying behavior

In these cases, the model structure is implicit and difficult to read.


Design Goals

  1. Keep the builder readable / make model structure explicit - the structure of the model should be obvious from the builder specification
  2. Preserve existing APIs- counts and related classes should still receive an ascertainment_rate_rv
  3. Support structured ascertainment - fixed, correlated, and time-varying cases should all fit the same conceptual pattern.
  4. Respect NumPyro naming/scoping
    • ascertainment RVs must participate cleanly in NumPyro site naming
    • observation classes should not be responsible for re-scoping ascertainment RVs
  5. Avoid unnecessary abstractions

Proposed Abstraction

Introduce a base class:

class AscertainmentModel(ABC):
    def for_signal(self, signal_name: str) -> RandomVariable:
        ...

An ascertainment model produces signal-specific RandomVariables.

Observation processes remain unchanged:

Counts(
    name="hospital",
    ascertainment_rate_rv=ascertainment_model.for_signal("hospital"),
    ...
)

Proposed Classes

FixedAscertainment

Represents fixed ascertainment per signal.

Example:

ascertainment = FixedAscertainment(
    values={
        "hospital": DeterministicVariable("ihr", 0.01),
        "ed": DeterministicVariable("iedr", 0.04),
    }
)

Mathematically:

$$ \alpha_k = \text{fixed value or RV} $$


CorrelatedAscertainment

Represents a joint prior over ascertainment rates.

A typical formulation is:

$$ \begin{pmatrix} \eta_1 \\ \vdots \\ \eta_m \end{pmatrix} \sim \mathcal{N} \left( \begin{pmatrix} \mu_1 \\ \vdots \\ \mu_m \end{pmatrix}, \Sigma \right), \qquad \alpha_j = \mathrm{logit}^{-1}(\eta_j) $$

where $m$ is the number of signals.

Example:

ascertainment = CorrelatedAscertainment(
    name="he_ascertainment",
    signals=("hospital", "ed"),
    loc=jnp.array([...]),
    scale_tril=L,
)

TimeVaryingAscertainment

A time-varying formulation is:

$$ \eta_j(t) \sim \text{temporal process}, \qquad \alpha_j(t) = \frac{1}{1 + e^{-\eta_j(t)}} $$

Example:

ascertainment = TimeVaryingAscertainment(
    processes={
        "hospital": AR1(...),
        "ed": AR1(...),
    }
)

Why we need this

The current API already supports fixed shared ascertainment cleanly when the shared ascertainment is deterministic. It supports independent ascertainment cleanly when each observation gets its own RV. But it does not obviously support shared stochastic or correlated ascertainment safely and readably without additional abstraction.


Builder Usage

Recommended pattern:

ascertainment = CorrelatedAscertainment(...)

builder.add_observation(
    Counts(
        name="hospital",
        ascertainment_rate_rv=ascertainment.for_signal("hospital"),
        ...
    )
)

Naming and Scoping

Ascertainment models must:

  • define their own NumPyro sample site naming
  • avoid duplicate sampling
  • ensure consistent reuse

Observation processes do not manage this.


Advantages

  • explicit structure
  • minimal API changes
  • scalable design

Open Questions

  1. Should builder manage ascertainment explicitly?
  2. How minimal should TimeVaryingAscertainment be initially?
  3. Restrict correlated models to logit-normal initially?

Recommendation

Implement:

  • AscertainmentModel
  • FixedAscertainment
  • CorrelatedAscertainment
  • TimeVaryingAscertainment

with explicit usage via:

ascertainment.for_signal(...)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions