feat: optional InferenceService annotation to skip model-ready nodeSelector (Karpenter) by ossianpe · Pull Request #1 · vsco/ome

ossianpe · 2026-04-01T00:51:42Z

Problem

Engine/decoder pods get a required nodeSelector of the form models.ome.io/clusterbasemodel.<name>=Ready. The model-agent applies that label only after weights are on disk. Autoscalers such as Karpenter may not provision nodes when the pod requires labels that no NodePool advertises yet, causing a scheduling deadlock on cold GPU pools.

Additionally, GHA has been configured to build the ome-manager image. Pipeline results for a build can be seen here https://github.com/vsco/ome/actions/runs/23865820885

Solution

New annotation on InferenceService: ome.io/skip-model-ready-node-selector: "true" (default behavior unchanged when absent).
When set, omit the model-ready nodeSelector; accelerator/runtime merged selectors still apply.
BenchmarkJob: if the referenced InferenceService has the annotation, the benchmark pod skips the same selector for consistency.

Implementation

constants.SkipModelReadyNodeSelectorAnnotationKey
IsSkipModelReadyNodeSelector() + gate in UpdatePodSpecNodeSelector.
Unit tests, charts/ome-resources/README.md (Autoscaling section).

Makefile

Optional BASE_IMAGE=ubuntu:24.04 for linux/amd64 builds on Apple Silicon (OL10 + QEMU x86-64-v3 issue).

Note: An earlier PR was mistakenly opened against sgl-project/ome; close that one if you only want review on this fork.

Made with Cursor

…-node-selector set

Self-contained workflow (workflow_dispatch). Delete this file before PR to sgl-project/ome. Does not modify upstream dev-images/release workflows. Made-with: Cursor

JDavis10213

Can we use the version in semver format so that we can also push up the chart with the changes that I requested in the shared-infra PR. I will note them here just to be safe. :)

I would like to have the helm chart updated so that it fixes the configmap to provide the types mapping and to use our ghcr to pull the chart instead of pulling from upstream.

I would also like to add that I would like to have @calebwilliams-vsco review this from a code perspective. See if it aligns with how the flow was meant to be in the code or just get some more eyes on this.

…eMap

…yaml

…utoscaling

calebwilliams-vsco · 2026-04-06T16:37:42Z

+			"inferenceService", inferenceService.Namespace+"/"+inferenceService.Name,
+			"benchmarkJob", benchmarkJob.Name,
+			"baseModel", baseModelMeta.Name)
+	} else {


can we have a comment here as to what's happening when this annotation doesn't skip the model ready node selector?

calebwilliams-vsco

lgtm!

…oscaling

…sco/ome into add_annotation_for_autoscaling

ossianpe added 9 commits March 31, 2026 17:04

add base image var

0927acb

add note about ome.io/skip-model-ready-node-selector

9aebc10

add SkipModelReadyNodeSelectorAnnotationKey

96969f8

add conditional for skipping model label when ome.io/skip-model-ready…

1796ae1

…-node-selector set

update log to print nodeSelector message

ef3d479

impliment TestIsSkipModelReadyNodeSelector

108d82d

add test for SkipModelReadyNodeSelectorAnnotationKey

0d20172

impliment TestIsSkipModelReadyNodeSelector

b2903cf

ci(vsco): add fork-only workflow to publish ome-manager to ghcr.io/vsco

70f0fb0

Self-contained workflow (workflow_dispatch). Delete this file before PR to sgl-project/ome. Does not modify upstream dev-images/release workflows. Made-with: Cursor

github-actions bot added documentation Improvements or additions to documentation controller inferenceservice benchmark helm ci tests labels Apr 1, 2026

ossianpe added 2 commits April 1, 2026 10:05

update to use vsco ghcr for now

294d67f

update tag to v0.1.4-vsco2

6b70d93

JDavis10213 reviewed Apr 2, 2026

View reviewed changes

ossianpe added 3 commits April 2, 2026 11:22

update to use vsco standardized versioning scheme

e00f40c

update to use vsco standardized versioning scheme; define instanceTyp…

77b2b2e

…eMap

migrate instanceTypeMap to generate from charts/ome-resources/values.…

322dc6d

…yaml

github-actions bot added the crd label Apr 2, 2026

JDavis10213 added 5 commits April 2, 2026 19:08

Merge branch 'main' of github.com:/vsco/ome into add_annotation_for_a…

a4c18fa

…utoscaling

Revert the ome chart version updates as this is handled in the release.

8657ce6

Remove for the time being.

e5fc84f

Update the chart to pull images from vsco instead of the other

a4b84b7

Missed an comment that didn't need to be there anymore

d1e4900

calebwilliams-vsco reviewed Apr 6, 2026

View reviewed changes

calebwilliams-vsco approved these changes Apr 6, 2026

View reviewed changes

ossianpe added 3 commits April 6, 2026 10:46

add descriptino for case with adding annotation to skip label for aut…

5b663dc

…oscaling

Merge branch 'add_annotation_for_autoscaling' of https://github.com/v…

3b71cff

…sco/ome into add_annotation_for_autoscaling

add ingressclassname variable for overriding

285e059

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: optional InferenceService annotation to skip model-ready nodeSelector (Karpenter)#1

feat: optional InferenceService annotation to skip model-ready nodeSelector (Karpenter)#1
ossianpe wants to merge 22 commits intomainfrom
add_annotation_for_autoscaling

ossianpe commented Apr 1, 2026 •

edited

Loading

Uh oh!

JDavis10213 left a comment •

edited

Loading

Uh oh!

calebwilliams-vsco Apr 6, 2026

Uh oh!

calebwilliams-vsco left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ossianpe commented Apr 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Implementation

Makefile

Uh oh!

JDavis10213 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

calebwilliams-vsco Apr 6, 2026

Choose a reason for hiding this comment

Uh oh!

calebwilliams-vsco left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ossianpe commented Apr 1, 2026 •

edited

Loading

JDavis10213 left a comment •

edited

Loading