Skip to content

infra: update aggregation mode deployment#2278

Merged
JuArce merged 3 commits into
stagingfrom
update_agg_mode_deploy
Jun 23, 2026
Merged

infra: update aggregation mode deployment#2278
JuArce merged 3 commits into
stagingfrom
update_agg_mode_deploy

Conversation

@JuArce

@JuArce JuArce commented Mar 13, 2026

Copy link
Copy Markdown
Collaborator

infra: run aggregation mode on a daily-scheduled GPU server

Description

Run the aggregation mode on a Paperspace GPU server that is only powered on once a day, instead of keeping it running 24/7.

Type of change

  • Infra

Checklist

  • “Hotfix” to testnet, everything else to staging
  • Linked to Github Issue
  • This change depends on code or research by an external entity
    • Acknowledgements were updated to give credit
  • Unit tests added
  • This change requires new documentation.
    • Documentation has been added/updated.
  • This change is an Optimization
    • Benchmarks added/run
  • Has a known issue
  • If your PR changes the Operator compatibility (Ex: Upgrade prover versions)
    • This PR adds compatibility for operator for both versions and do not change crates/docs/examples
    • This PR updates batcher and docs/examples to the newer version. This requires the operator are already updated to be compatible

@JuArce JuArce self-assigned this Mar 13, 2026
@JuArce JuArce marked this pull request as ready for review June 23, 2026 20:41
@JuArce JuArce marked this pull request as draft June 23, 2026 20:41
@JuArce JuArce changed the title infra: update agg mode deployment infra: run aggregation mode on a daily-scheduled GPU server Jun 23, 2026
@JuArce JuArce marked this pull request as ready for review June 23, 2026 20:54
@github-actions

Copy link
Copy Markdown

Codex Code Review

Findings

  1. High Security: .github/workflows/aggregation_mode.yml stores PAPERSPACE_API_KEY as a GitHub Actions variable. This is a cloud API credential and should be a repository/environment secret: ${{ secrets.PAPERSPACE_API_KEY }}. Keep only PAPERSPACE_MACHINE_ID as a plain variable.

  2. Medium Bug: infra/aggregation_mode/aggregation_mode.sh clones the repo into aligned_layer/, but the following commands still run from the parent directory. As a result, paths like aggregation_mode/proof_aggregator and ./infra/aggregation_mode/run.sh at lines 127 and 131 will not exist on a fresh setup. Fix by cloning into . or cd aligned_layer before building/copying.

  3. Medium Bug: infra/aggregation_mode/run.sh does not stop on aggregation failures. If SP1 or Risc0 exits non-zero, the script still prints “finished” and reaches sudo shutdown -h now, likely making the daily run look successful while skipping aggregation. Add failure handling, for example set -euo pipefail plus a safe docker stop guard.

No significant performance issues found in this PR diff.

@MauroToscano MauroToscano enabled auto-merge June 23, 2026 20:57
Comment thread .github/workflows/aggregation_mode.yml Outdated
Comment thread infra/aggregation_mode/aggregation_mode.sh
Comment thread infra/aggregation_mode/run.sh
Comment thread infra/aggregation_mode/aggregation_mode.sh
@JuArce JuArce changed the base branch from staging to testnet June 23, 2026 21:14
@JuArce JuArce changed the base branch from testnet to staging June 23, 2026 21:14
@JuArce JuArce changed the title infra: run aggregation mode on a daily-scheduled GPU server infra: update aggregation mode deployment Jun 23, 2026
@JuArce JuArce disabled auto-merge June 23, 2026 21:31
@JuArce JuArce merged commit 2b65191 into staging Jun 23, 2026
1 check failed
@JuArce JuArce deleted the update_agg_mode_deploy branch June 23, 2026 21:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants