-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Description
Checks
- I've already read https://docs.github.com/en/actions/hosting-your-own-runners/managing-self-hosted-runners-with-actions-runner-controller/troubleshooting-actions-runner-controller-errors and I'm sure my issue is not covered in the troubleshooting guide.
- I am using charts that are officially provided
Controller Version
0.10.1
Deployment Method
Helm
Checks
- This isn't a question or user support case (For Q&A and community support, go to Discussions).
- I've read the Changelog before submitting this issue and I'm sure it's not due to any recently-introduced backward-incompatible changes
To Reproduce
This is not reproducible for every run, and other jobs with the same runs-on configuration are picked up correctlyDescribe the bug
Summary
We’re seeing an intermittent issue where some GitHub Actions jobs remain in the UI with the status:
Waiting for a runner to pick up this job…
while our ARC runner scale set never scales up (DesiredReplicas stays at 0) and no EphemeralRunners are created in the corresponding namespace.
Most jobs in the same workflow run are handled correctly by ARC and execute on runners from the same scale set. Only a small subset of jobs (in our case two bench jobs) get stuck in this state.
Because this is not reproducible for every run, and other jobs with the same runs-on configuration are picked up correctly, this does not look like a static misconfiguration (labels, runner group, permissions, etc.). It looks more like ARC never receives the job assignment event for those specific jobs.
What happened
For workflow run:
Workflow run URL: https://github.com/web-infra-dev/rspack/actions/runs/23532375491/job/68501023764
Two jobs named bench show the status Waiting for a runner to pick up this job… for a long time and never start.
At the same time, on the ARC side:
# No ephemeral runners in the scale-set namespace
kubectl get ephemeralrunners -n runners-web-infra-dev-ubuntu-22-04-large
# output: No resources found in the namespace.
# EphemeralRunnerSet shows DesiredReplicas=0
kubectl get ephemeralrunnersets -n runners-web-infra-dev-ubuntu-22-04-large
NAME DESIREDREPLICAS CURRENTREPLICAS PENDING RUNNERS RUNNING RUNNERS FINISHED RUNNERS DELETING RUNNERS
rspack-ubuntu-22.04-large-cdtk8 0 0 0Other jobs in the same workflow run (e.g. size-limit, reusable-build-*, test, etc.) are handled correctly:
Listener logs show Updating job info for the runner and Updating ephemeral runner with merge patch
Ephemeral runners are created
Jobs start and complete as expected
For the problematic bench jobs:
The GitHub Actions UI shows them as waiting for a runner
The ARC listener logs never mention these jobs (neither job name nor workflow run ID in combination with the bench job name)
The corresponding EphemeralRunnerSet never increases DesiredReplicas above 0
Because this only happens for some jobs, and only occasionally, we believe:
runs-on labels are correct (same as other jobs that do run)
Runner group permissions and Selected repositories are correct
ARC is generally able to scale runners and process jobs from this repository/scale set
Relevant listener logs (for the same workflow run)
From the listener pod:
kubectl logs -n arc-systems rspack-ubuntu-22.04-large-75697c4c-listener | grep -i 23532375491Example excerpts (for other jobs in the same run):
2026-03-25T08:53:55Z INFO listener-app.worker.kubernetesworker Updating job info for the runner {"runnerName": "rspack-ubuntu-22.04-large-cdtk8-runner-ggghx", "ownerName": "web-infra-dev", "repoName": "rspack", "workflowRef": "web-infra-dev/rspack/.github/workflows/reusable-build-codspeed.yml@refs/pull/13446/merge", "workflowRunId": 23532375491, "jobDisplayName": "Test Linux / Codspeed-build / Codspeed-build-simulation", "requestId": 0}
2026-03-25T08:53:55Z INFO listener-app.worker.kubernetesworker Updating ephemeral runner with merge patch {"json": "{\"status\":{\"jobDisplayName\":\"Test Linux / Codspeed-build / Codspeed-build-simulation\",\"jobRepositoryName\":\"web-infra-dev/rspack\",\"jobWorkflowRef\":\"web-infra-dev/rspack/.github/workflows/reusable-build-codspeed.yml@refs/pull/13446/merge\",\"workflowRunId\":23532375491}}"}
...
2026-03-25T09:03:43Z INFO listener-app.worker.kubernetesworker Updating job info for the runner {"runnerName": "rspack-ubuntu-22.04-large-cdtk8-runner-pd6qv", "ownerName": "web-infra-dev", "repoName": "rspack", "workflowRef": "web-infra-dev/rspack/.github/workflows/reusable-build-test.yml@refs/pull/13446/merge", "workflowRunId": 23532375491, "jobDisplayName": "Test WASM / test / E2E Testing", "requestId": 0}
2026-03-25T09:03:43Z INFO listener-app.worker.kubernetesworker Updating ephemeral runner with merge patch {"json": "{\"status\":{\"jobDisplayName\":\"Test WASM / test / E2E Testing\",\"jobRepositoryName\":\"web-infra-dev/rspack\",\"jobWorkflowRef\":\"web-infra-dev/rspack/.github/workflows/reusable-build-test.yml@refs/pull/13446/merge\",\"workflowRunId\":23532375491}}"}For the two bench jobs in the same run, there is no corresponding log line at all.
Additional observations / thoughts
The issue is intermittent and only affects a subset of jobs in a run
Because other jobs with the same runs-on and repository settings are picked up correctly, we suspect this is not a static misconfiguration (labels, runner group, permissions, etc.)
It feels like either:
The workflow_job/job-available event for these jobs is never sent to the scale set listener; or
ARC somehow drops/ignores those specific events, so it never bumps DesiredReplicas
Describe the expected behavior
For every job in the workflow run (including the two bench jobs), GitHub should send a workflow_job / job-available event to the scale set listener
ARC should:
Increase DesiredReplicas on the corresponding EphemeralRunnerSet
Create an ephemeral runner pod
The job should move from Waiting for a runner… to In progress and eventually complete (or fail for a more obvious reason)
Additional Context
githubConfigUrl: <xxx>
githubConfigSecret: <xxx>
template:
spec:
nodeSelector:
kubernetes.io/os: linux
tolerations:
containers:
- name: runner
image: <xxx>
command: ["/home/runner/run.sh"]
resources:
requests:
memory: 40Gi
cpu: 10000mController Logs
https://gist.github.com/deanjingshui/676914d6c9be6b380f9b5a68ab49ae0bRunner Pod Logs
In this case there are no runner pods to inspect. For the two bench jobs that are stuck in Waiting for a runner to pick up this job…, ARC never scales the EphemeralRunnerSet above DesiredReplicas=0, so no EphemeralRunner objects and no runner pods are created in the scale-set namespace (runners-web-infra-dev-ubuntu-22-04-large).
Because of that, there are no runner pod logs or kubectl describe output available for the affected jobs.
For reference, here is the log from a runner pod that successfully executed a job (different workflow, but same ARC setup):
kubectl logs -n runners-lynx-family-ubuntu-22-04-large \
lynx-ubuntu-22.04-large-kjnf8-runner-fwr8h
√ Connected to GitHub
Current runner version: '2.333.0'
2026-03-25 14:56:07Z: Listening for Jobs
2026-03-25 14:56:11Z: Running job: static-check
2026-03-25 14:58:28Z: Job static-check completed with result: Succeeded
√ Removed .credentials
√ Removed .runner
Runner listener exit with 0 return code, stopping the service, no retry needed.
Exiting runner...