This proposal introduces an explicit ascertainment model abstraction for PyRenew to support:
- fixed ascertainment
- correlated ascertainment across signals
- time-varying ascertainment
The goal is to make shared structure in multi-signal models explicit and readable in the builder API, while preserving the simplicity of the current observation process design.
This proposal does not introduce a separate abstraction for independent ascertainment. That case is already naturally supported by the current API.
Background
Currently, ascertainment is specified inside each observation process:
Counts(
name="hospital",
ascertainment_rate_rv=DistributionalVariable("ihr", dist.Beta(1, 100)),
...
)
This works well when:
- each signal has its own independent ascertainment rate
- or a shared RV is manually passed across signals
However, it becomes awkward when ascertainment has structure across signals, such as:
- correlation between signals (e.g., hospital and ED)
- shared latent processes
- time-varying behavior
In these cases, the model structure is implicit and difficult to read.
Design Goals
- Keep the builder readable / make model structure explicit - the structure of the model should be obvious from the builder specification
- Preserve existing APIs-
counts and related classes should still receive an ascertainment_rate_rv
- Support structured ascertainment - fixed, correlated, and time-varying cases should all fit the same conceptual pattern.
- Respect NumPyro naming/scoping
- ascertainment RVs must participate cleanly in NumPyro site naming
- observation classes should not be responsible for re-scoping ascertainment RVs
- Avoid unnecessary abstractions
Proposed Abstraction
Introduce a base class:
class AscertainmentModel(ABC):
def for_signal(self, signal_name: str) -> RandomVariable:
...
An ascertainment model produces signal-specific RandomVariables.
Observation processes remain unchanged:
Counts(
name="hospital",
ascertainment_rate_rv=ascertainment_model.for_signal("hospital"),
...
)
Proposed Classes
FixedAscertainment
Represents fixed ascertainment per signal.
Example:
ascertainment = FixedAscertainment(
values={
"hospital": DeterministicVariable("ihr", 0.01),
"ed": DeterministicVariable("iedr", 0.04),
}
)
Mathematically:
$$
\alpha_k = \text{fixed value or RV}
$$
CorrelatedAscertainment
Represents a joint prior over ascertainment rates.
A typical formulation is:
$$
\begin{pmatrix}
\eta_1 \\
\vdots \\
\eta_m
\end{pmatrix}
\sim
\mathcal{N}
\left(
\begin{pmatrix}
\mu_1 \\
\vdots \\
\mu_m
\end{pmatrix},
\Sigma
\right),
\qquad
\alpha_j = \mathrm{logit}^{-1}(\eta_j)
$$
where $m$ is the number of signals.
Example:
ascertainment = CorrelatedAscertainment(
name="he_ascertainment",
signals=("hospital", "ed"),
loc=jnp.array([...]),
scale_tril=L,
)
TimeVaryingAscertainment
A time-varying formulation is:
$$
\eta_j(t) \sim \text{temporal process}, \qquad
\alpha_j(t) = \frac{1}{1 + e^{-\eta_j(t)}}
$$
Example:
ascertainment = TimeVaryingAscertainment(
processes={
"hospital": AR1(...),
"ed": AR1(...),
}
)
Why we need this
The current API already supports fixed shared ascertainment cleanly when the shared ascertainment is deterministic. It supports independent ascertainment cleanly when each observation gets its own RV. But it does not obviously support shared stochastic or correlated ascertainment safely and readably without additional abstraction.
Builder Usage
Recommended pattern:
ascertainment = CorrelatedAscertainment(...)
builder.add_observation(
Counts(
name="hospital",
ascertainment_rate_rv=ascertainment.for_signal("hospital"),
...
)
)
Naming and Scoping
Ascertainment models must:
- define their own NumPyro sample site naming
- avoid duplicate sampling
- ensure consistent reuse
Observation processes do not manage this.
Advantages
- explicit structure
- minimal API changes
- scalable design
Open Questions
- Should builder manage ascertainment explicitly?
- How minimal should TimeVaryingAscertainment be initially?
- Restrict correlated models to logit-normal initially?
Recommendation
Implement:
- AscertainmentModel
- FixedAscertainment
- CorrelatedAscertainment
- TimeVaryingAscertainment
with explicit usage via:
ascertainment.for_signal(...)
This proposal introduces an explicit ascertainment model abstraction for PyRenew to support:
The goal is to make shared structure in multi-signal models explicit and readable in the builder API, while preserving the simplicity of the current observation process design.
This proposal does not introduce a separate abstraction for independent ascertainment. That case is already naturally supported by the current API.
Background
Currently, ascertainment is specified inside each observation process:
This works well when:
However, it becomes awkward when ascertainment has structure across signals, such as:
In these cases, the model structure is implicit and difficult to read.
Design Goals
countsand related classes should still receive anascertainment_rate_rvProposed Abstraction
Introduce a base class:
An ascertainment model produces signal-specific RandomVariables.
Observation processes remain unchanged:
Proposed Classes
FixedAscertainment
Represents fixed ascertainment per signal.
Example:
Mathematically:
CorrelatedAscertainment
Represents a joint prior over ascertainment rates.
A typical formulation is:
where$m$ is the number of signals.
Example:
TimeVaryingAscertainment
A time-varying formulation is:
Example:
Why we need this
The current API already supports fixed shared ascertainment cleanly when the shared ascertainment is deterministic. It supports independent ascertainment cleanly when each observation gets its own RV. But it does not obviously support shared stochastic or correlated ascertainment safely and readably without additional abstraction.
Builder Usage
Recommended pattern:
Naming and Scoping
Ascertainment models must:
Observation processes do not manage this.
Advantages
Open Questions
Recommendation
Implement:
with explicit usage via: