Skip to content

add default deadline interceptor for gRPC clients#551

Open
k4leung4 wants to merge 2 commits intochainguard-dev:mainfrom
k4leung4:grpc-waitforready-timeout
Open

add default deadline interceptor for gRPC clients#551
k4leung4 wants to merge 2 commits intochainguard-dev:mainfrom
k4leung4:grpc-waitforready-timeout

Conversation

@k4leung4
Copy link
Copy Markdown
Contributor

Summary

Add a unary client interceptor that sets a default 30s deadline on gRPC
calls that don't already have one. This prevents WaitForReady(true) from
blocking indefinitely when a downstream is unavailable.

Motivation

Addresses CUS-238 /
CUS-172 /
chainguard-dev/internal-dev#18988.

WaitForReady(true) is set globally on all gRPC clients. When a downstream
is unavailable, RPCs block indefinitely waiting for the connection to become
ready, tying up goroutines until Cloud Run's per-instance concurrency limit
(50) is exhausted. During INC-44, this caused API instances to accumulate
blocked goroutines faster than Cloud Run could scale new instances.

Changes

  • New env var GRPC_CLIENT_DEFAULT_TIMEOUT (default 30s)
  • New defaultDeadlineInterceptor placed first in the unary interceptor
    chain — if the caller's context has no deadline, one is applied
  • Only affects unary calls (streams are long-lived and shouldn't have a
    blanket timeout)

Behavior

Scenario Result
Caller has no deadline 30s deadline applied
Caller sets own deadline (e.g., 60s) Caller's deadline preserved
GRPC_CLIENT_DEFAULT_TIMEOUT=0 Interceptor is a no-op
GRPC_CLIENT_DEFAULT_TIMEOUT=10s 10s deadline applied

Risk

Services with intentionally long RPCs (e.g., apkoaas builds, image
assembly) should set an explicit context deadline or override the env var.
The 30s default is conservative — the API server's request timeout is 60
minutes for the build use case, but the API server always sets a context
deadline from the incoming request, so the interceptor won't override it.

Tests

  • TestDefaultDeadlineInterceptor_AppliesTimeout — slow server + short
    timeout → DeadlineExceeded
  • TestDefaultDeadlineInterceptor_RespectsExistingDeadline — caller's
    longer deadline is preserved, call succeeds
  • TestDefaultDeadlineInterceptor_DisabledWithZero — timeout=0 disables
    the interceptor, call succeeds

Test plan

  • go test ./pkg/options/ -v — 5/5 pass
  • golangci-lint run ./pkg/options/ — 0 issues
  • CI passes
  • Deploy to staging, verify no unexpected DeadlineExceeded errors

🤖 Generated with Claude Code

Signed-off-by: Kenny Leung <kleung@chainguard.dev>
@k4leung4 k4leung4 marked this pull request as ready for review March 16, 2026 19:58
@k4leung4 k4leung4 requested review from cmdpdx and tcnghia March 16, 2026 19:58
- Fix RespectsExistingDeadline: server delay (200ms) now exceeds the
  interceptor timeout (100ms) so the test actually proves the caller's
  deadline governs. Previously the 50ms delay would succeed either way.
- Extract dialWithDeadlineInterceptor helper to deduplicate server/client
  setup across all 3 tests (~50 lines removed).
- Use t.Cleanup instead of defer for server/conn lifecycle.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown

@eslerm eslerm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖: Good fix — defaultDeadlineInterceptor is well-placed (outermost, so deadline envelopes all retries) and the guard conditions are clean (!ok + timeout > 0). Tests are solid.

Comment thread pkg/options/options.go
grpc.WithStatsHandler(trace.PreserveTraceParentHandler),
grpc.WithChainUnaryInterceptor(clientid.UnaryClientInterceptor(), state().clientMetrics.UnaryClientInterceptor(), grpc_retry.UnaryClientInterceptor(retryOpts...)),
grpc.WithChainUnaryInterceptor(defaultDeadlineInterceptor(state().env.GrpcClientDefaultTimeout), clientid.UnaryClientInterceptor(), state().clientMetrics.UnaryClientInterceptor(), grpc_retry.UnaryClientInterceptor(retryOpts...)),
grpc.WithChainStreamInterceptor(clientid.StreamClientInterceptor(), state().clientMetrics.StreamClientInterceptor(), grpc_retry.StreamClientInterceptor(retryOpts...)),
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖: WaitForReady(true) applies to both unary and streaming calls, but the deadline interceptor is only in the unary chain. A streaming RPC without a caller-provided deadline is still exposed to the same indefinite-blocking scenario the PR describes. Worth tracking as a follow-up — or a comment here noting streaming is intentionally out of scope.

Comment thread pkg/options/options.go
}
}

// defaultDeadlineInterceptor returns a unary client interceptor that sets a
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤖: Worth noting in the godoc that the 30s deadline is the total budget shared across all retry attempts (since this interceptor is outermost and grpc_retry is innermost). An operator tuning both GRPC_CLIENT_DEFAULT_TIMEOUT and GRPC_CLIENT_MAX_RETRY might not realize retries don't each get a fresh window.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants