Ascertainment Model Abstraction


This proposal introduces an explicit **ascertainment model abstraction** for PyRenew to support:

- fixed ascertainment
- correlated ascertainment across signals
- time-varying ascertainment

The goal is to make **shared structure in multi-signal models explicit and readable** in the builder API, while preserving the simplicity of the current observation process design.

This proposal does **not** introduce a separate abstraction for independent ascertainment. That case is already naturally supported by the current API.

---

## Background

Currently, ascertainment is specified inside each observation process:

```
Counts(
    name="hospital",
    ascertainment_rate_rv=DistributionalVariable("ihr", dist.Beta(1, 100)),
    ...
)
```

This works well when:

- each signal has its own independent ascertainment rate
- or a shared RV is manually passed across signals

However, it becomes awkward when ascertainment has **structure across signals**, such as:

- correlation between signals (e.g., hospital and ED)
- shared latent processes
- time-varying behavior

In these cases, the model structure is implicit and difficult to read.

---

## Design Goals

1. Keep the builder readable / make model structure explicit - the structure of the model should be obvious from the builder specification 
2. Preserve existing APIs- `counts` and related classes should still receive an `ascertainment_rate_rv`
3. Support structured ascertainment - fixed, correlated, and time-varying cases should all fit the same conceptual pattern.
4. Respect NumPyro naming/scoping
    - ascertainment RVs must participate cleanly in NumPyro site naming
    - observation classes should not be responsible for re-scoping ascertainment RVs
5. Avoid unnecessary abstractions

---

## Proposed Abstraction

Introduce a base class:

```
class AscertainmentModel(ABC):
    def for_signal(self, signal_name: str) -> RandomVariable:
        ...
```

An ascertainment model produces **signal-specific RandomVariables**.

Observation processes remain unchanged:

```
Counts(
    name="hospital",
    ascertainment_rate_rv=ascertainment_model.for_signal("hospital"),
    ...
)
```

---

## Proposed Classes

### FixedAscertainment

Represents fixed ascertainment per signal.

Example:

```
ascertainment = FixedAscertainment(
    values={
        "hospital": DeterministicVariable("ihr", 0.01),
        "ed": DeterministicVariable("iedr", 0.04),
    }
)
```

Mathematically:

$$
\alpha_k = \text{fixed value or RV}
$$


---

### CorrelatedAscertainment

Represents a joint prior over ascertainment rates.

A typical formulation is:

$$
\begin{pmatrix}
\eta_1 \\
\vdots \\
\eta_m
\end{pmatrix}
\sim
\mathcal{N}
\left(
\begin{pmatrix}
\mu_1 \\
\vdots \\
\mu_m
\end{pmatrix},
\Sigma
\right),
\qquad
\alpha_j = \mathrm{logit}^{-1}(\eta_j)
$$

where $m$ is the number of signals.

Example:

```
ascertainment = CorrelatedAscertainment(
    name="he_ascertainment",
    signals=("hospital", "ed"),
    loc=jnp.array([...]),
    scale_tril=L,
)
```




---

### TimeVaryingAscertainment

A time-varying formulation is:

$$
\eta_j(t) \sim \text{temporal process}, \qquad
\alpha_j(t) = \frac{1}{1 + e^{-\eta_j(t)}}
$$

Example:

```
ascertainment = TimeVaryingAscertainment(
    processes={
        "hospital": AR1(...),
        "ed": AR1(...),
    }
)
```

---

## Why we need this

The current API already supports fixed shared ascertainment cleanly when the shared ascertainment is deterministic. It supports independent ascertainment cleanly when each observation gets its own RV. But it does not obviously support shared stochastic or correlated ascertainment safely and readably without additional abstraction.

---




## Builder Usage

Recommended pattern:

```
ascertainment = CorrelatedAscertainment(...)

builder.add_observation(
    Counts(
        name="hospital",
        ascertainment_rate_rv=ascertainment.for_signal("hospital"),
        ...
    )
)
```

---

## Naming and Scoping

Ascertainment models must:

- define their own NumPyro sample site naming
- avoid duplicate sampling
- ensure consistent reuse

Observation processes do not manage this.

---

## Advantages

- explicit structure
- minimal API changes
- scalable design

---

## Open Questions

1. Should builder manage ascertainment explicitly?
2. How minimal should TimeVaryingAscertainment be initially?
3. Restrict correlated models to logit-normal initially?


---

## Recommendation

Implement:

- AscertainmentModel
- FixedAscertainment
- CorrelatedAscertainment
- TimeVaryingAscertainment

with explicit usage via:

```
ascertainment.for_signal(...)
```


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ascertainment Model Abstraction #777

Background

Design Goals

Proposed Abstraction

Proposed Classes

FixedAscertainment

CorrelatedAscertainment

TimeVaryingAscertainment

Why we need this

Builder Usage

Naming and Scoping

Advantages

Open Questions

Recommendation

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Ascertainment Model Abstraction #777

Description

Background

Design Goals

Proposed Abstraction

Proposed Classes

FixedAscertainment

CorrelatedAscertainment

TimeVaryingAscertainment

Why we need this

Builder Usage

Naming and Scoping

Advantages

Open Questions

Recommendation

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions