Skip to content

[Bug]: silent failure when checking the worker node: /etc/redhat-release access can be restricted #274

@aldbr

Description

@aldbr

Search before creating an issue

  • I have searched existing issues and confirmed this is not a duplicate

Bug Description

On some worker nodes (seen in a HPC), the pilot fails during the CheckWorkerNode step when attempting to read /etc/redhat-release.

The file exists but is not readable due to site-specific restrictions (permissions or security policies), which causes a PermissionError and makes the pilot exit early, even though the information is only informational/logging-related.

Steps to Reproduce

  1. Run the pilot on a worker node where /etc/redhat-release exists but is not readable by the pilot user.
  2. The CheckWorkerNode command attempts to open /etc/redhat-release.
  3. The pilot fails with a PermissionError during this step.
Uname      = Linux nukwa-...
Host Name  = nukwa-01...
Host FQDN  = nukwa-01...
Traceback (most recent call last):
  File "/home/sec-constraints/wn.py", line 12, in <module>
    with open(fileName, "r") as f:
PermissionError: [Errno 13] Permission denied: '/etc/redhat-release'

Expected Behavior

The pilot should not fail if OS release files are unreadable.

OS identification is informational and should be best-effort:

  • If the file cannot be read, the pilot should continue running.
  • OS details should be logged when available, skipped otherwise.

Actual Behavior

The pilot exits during CheckWorkerNode when attempting to read /etc/redhat-release, even though the file is optional and not required for execution.

Environment

No response

Relevant Log Output

Additional Context

fileName = "/etc/redhat-release"
if os.path.exists(fileName):
with open(fileName, "r") as f:
self.log.info("RedHat Release = %s" % f.read().strip())
fileName = "/etc/lsb-release"
if os.path.isfile(fileName):
with open(fileName, "r") as f:
self.log.info("Linux release:\n%s" % f.read().strip())

/etc/redhat-release is apparently legacy, distribution-specific file and may be restricted or absent on some systems.

A more robust and standardized alternative is /etc/os-release from what I understand: https://www.freedesktop.org/software/systemd/man/latest/os-release.html

It looks like it is supported by all modern (and even less modern) Linux distributions.

Based on example4 (python3.10) or example5 (still python2 support???), I think we should:

  • Prefer reading /etc/os-release (best-effort, with exception handling)
  • Then /usr/lib/os-release
  • Never fail the pilot if OS release information cannot be read

Metadata

Metadata

Assignees

Labels

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions