Skip to content

Re-raise infrastructure errors during output collection#467

Open
ksuderman wants to merge 2 commits into
galaxyproject:masterfrom
ksuderman:reraise-infra-errors
Open

Re-raise infrastructure errors during output collection#467
ksuderman wants to merge 2 commits into
galaxyproject:masterfrom
ksuderman:reraise-infra-errors

Conversation

@ksuderman

Copy link
Copy Markdown
Contributor

Part of #465.
Split out of #466

What this does

_collect_output downgraded all collection exceptions to warnings for
output_workdir outputs (via _allow_collect_failure). That is correct for a missing
output, but it also swallowed genuine infrastructure failures — a disk-full or
out-of-memory during collection would surface as a "successful" job with empty outputs.

  • Re-raise ImportError, MemoryError, SystemError, and infrastructure OSErrors
    (disk full, I/O errors) so they fail the job.
  • Two cases are deliberately kept recoverable and flow through the normal
    _allow_collect_failure handling:
    • FileNotFoundError — a missing output file is expected (e.g. a from_work_dir
      output a tool legitimately did not produce, which Galaxy represents as an empty
      dataset).
    • HTTP 403 — Galaxy authoritatively refused the upload (dataset purged mid-job);
      checked before the OSError test because requests.HTTPError is itself an
      OSError subclass.

Branch is based on current master.

Surface ImportError, MemoryError, SystemError and infrastructure OSErrors
(e.g. disk full, I/O errors) during output collection instead of silently
downgrading them to warnings, so they fail the job rather than producing a
job that 'succeeds' with empty outputs.

Two exceptions are deliberately NOT treated as infrastructure failures and
keep flowing through the normal _allow_collect_failure handling:

  * FileNotFoundError — a missing output file is expected and recoverable
    (e.g. a from_work_dir output the tool legitimately did not produce, which
    Galaxy represents as an empty dataset).
  * HTTP 403 — Galaxy authoritatively refused the upload (dataset purged mid
    job); checked before the OSError test because requests.HTTPError is itself
    an OSError subclass.
Comment thread pulsar/client/staging/down.py Outdated
getattr(action, "url", None) or getattr(action, "path", action),
)
return False
if isinstance(e, OSError) and not isinstance(e, FileNotFoundError):

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you move this logic into allow_collect_failure? It looks like there is an abstraction esbalished for this. You'll have to add the exception to the method interface. I think that is fine.

@ksuderman ksuderman Jul 3, 2026

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this what you had in mind?

@ksuderman ksuderman requested a review from jmchilton July 3, 2026 20:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants