Skip to content

{2025.06}[foss/2024a] LAMMPS 22Jul2025 with CUDA#1461

Open
laraPPr wants to merge 2 commits intoEESSI:mainfrom
laraPPr:LAMMPS_GPU
Open

{2025.06}[foss/2024a] LAMMPS 22Jul2025 with CUDA#1461
laraPPr wants to merge 2 commits intoEESSI:mainfrom
laraPPr:LAMMPS_GPU

Conversation

@laraPPr
Copy link
Copy Markdown
Collaborator

@laraPPr laraPPr commented Apr 7, 2026

No description provided.

@laraPPr
Copy link
Copy Markdown
Collaborator Author

laraPPr commented Apr 7, 2026

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-vsc-ugent for:arch=x86_64/amd/zen3,accel=nvidia/cc80

@gpu-bot-ugent
Copy link
Copy Markdown

gpu-bot-ugent bot commented Apr 7, 2026

New job on instance eessi-bot-vsc-ugent for repository eessi.io-2025.06-software
Building on: amd-zen3 and accelerator nvidia/cc80
Building for: x86_64/amd/zen3 and accelerator nvidia/cc80
Job dir: /scratch/gent/vo/002/gvo00211/SHARED/jobs/2026.04/pr_1461/15679185

date job status comment
Apr 07 10:43:46 UTC 2026 submitted job id 15679185 awaits release by job manager
Apr 07 10:45:02 UTC 2026 released job awaits launch by Slurm scheduler
Apr 07 11:37:06 UTC 2026 finished
🤷 UNKNOWN (click triangle for detailed information)
  • Job results file _bot_job15679185.result does not exist in job directory, or parsing it failed.
  • No artefacts were found/reported.
Apr 07 11:37:06 UTC 2026 test result
🤷 UNKNOWN (click triangle for detailed information)
  • Job test file _bot_job15679185.test does not exist in job directory, or parsing it failed.

@laraPPr
Copy link
Copy Markdown
Collaborator Author

laraPPr commented Apr 7, 2026

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90

@eessi-bot-surf
Copy link
Copy Markdown

eessi-bot-surf bot commented Apr 7, 2026

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: amd-zen4 and accelerator nvidia/cc90
Building for: x86_64/amd/zen4 and accelerator nvidia/cc90
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.04/pr_1461/21613219

date job status comment
Apr 07 11:22:58 UTC 2026 submitted job id 21613219 will be eligible to start in about 20 seconds
Apr 07 11:23:07 UTC 2026 received job awaits launch by Slurm scheduler
Apr 07 11:23:21 UTC 2026 running job 21613219 is running
Apr 07 11:25:05 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-21613219.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen4-accel-nvidia-cc90-17755610530.tar.zstsize: 0 MiB (22 bytes)
entries: 0
modules under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
no module files in tarball
software under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
no software packages in tarball
reprod directories under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/reprod
no reprod directories in tarball
other under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
no other files in tarball
Apr 07 11:25:05 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node %device_type=gpu /526cd259 @BotBuildTests:gpu_h100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (2/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node %device_type=gpu /416eaee1 @BotBuildTests:gpu_h100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (3/4) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node /73a202f1 @BotBuildTests:gpu_h100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (4/4) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node /7f04eb2b @BotBuildTests:gpu_h100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ PASSED ] Ran 0/4 test case(s) from 4 check(s) (0 failure(s), 4 skipped, 0 aborted)
Details
✅ job output file slurm-21613219.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@laraPPr
Copy link
Copy Markdown
Collaborator Author

laraPPr commented Apr 7, 2026

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-surf for:arch=x86_64/amd/zen4,accel=nvidia/cc90

@eessi-bot-surf
Copy link
Copy Markdown

eessi-bot-surf bot commented Apr 7, 2026

New job on instance eessi-bot-surf for repository eessi.io-2025.06-software
Building on: amd-zen4 and accelerator nvidia/cc90
Building for: x86_64/amd/zen4 and accelerator nvidia/cc90
Job dir: /projects/eessibot/eessi-bot-surf/jobs/2026.04/pr_1461/21613350

date job status comment
Apr 07 11:32:35 UTC 2026 submitted job id 21613350 will be eligible to start in about 20 seconds
Apr 07 11:32:42 UTC 2026 received job awaits launch by Slurm scheduler
Apr 07 11:33:05 UTC 2026 running job 21613350 is running
Apr 07 11:35:38 UTC 2026 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-21613350.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen4-accel-nvidia-cc90-17755616890.tar.zstsize: 0 MiB (22 bytes)
entries: 0
modules under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/modules/all
no module files in tarball
software under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/software
no software packages in tarball
reprod directories under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90/reprod
no reprod directories in tarball
other under 2025.06/software/linux/x86_64/amd/zen4/accel/nvidia/cc90
no other files in tarball
Apr 07 11:35:38 UTC 2026 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ SKIP ] (1/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node %device_type=gpu /526cd259 @BotBuildTests:gpu_h100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (2/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node %device_type=gpu /416eaee1 @BotBuildTests:gpu_h100+default [Skipping GPU test : only 1 GPU available for this test case]
[ SKIP ] (3/4) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node /73a202f1 @BotBuildTests:gpu_h100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ SKIP ] (4/4) EESSI_OSU_pt2pt_GPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2024a-CUDA-12.6.0 %scale=1_4_node /7f04eb2b @BotBuildTests:gpu_h100+default [Skipping test : 1 GPU(s) available for this test case, need exactly 2]
[ PASSED ] Ran 0/4 test case(s) from 4 check(s) (0 failure(s), 4 skipped, 0 aborted)
Details
✅ job output file slurm-21613350.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@laraPPr
Copy link
Copy Markdown
Collaborator Author

laraPPr commented Apr 7, 2026

@casparvl why is cuda compute capabilities set like this? LAMMPS does not like it.

cuda-compute-capabilities                (E) = 9.0a

@laraPPr
Copy link
Copy Markdown
Collaborator Author

laraPPr commented Apr 7, 2026

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-vsc-ugent for:arch=x86_64/amd/zen3,accel=nvidia/cc80

@gpu-bot-ugent
Copy link
Copy Markdown

gpu-bot-ugent bot commented Apr 7, 2026

New job on instance eessi-bot-vsc-ugent for repository eessi.io-2025.06-software
Building on: amd-zen3 and accelerator nvidia/cc80
Building for: x86_64/amd/zen3 and accelerator nvidia/cc80
Job dir: /scratch/gent/vo/002/gvo00211/SHARED/jobs/2026.04/pr_1461/15679189

date job status comment
Apr 07 11:49:23 UTC 2026 submitted job id 15679189 awaits release by job manager
Apr 07 11:51:12 UTC 2026 released job awaits launch by Slurm scheduler
Apr 07 12:01:16 UTC 2026 running job 15679189 is running
Apr 07 14:42:00 UTC 2026 finished
😁 SUCCESS (click triangle for details)
Details
✅ job output file slurm-15679189.out
✅ no message matching FATAL:
✅ no message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
✅ found message(s) matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen3-accel-nvidia-cc80-17755728850.tar.zstsize: 241 MiB (252803906 bytes)
entries: 5042
modules under 2025.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/modules/all
LAMMPS/22Jul2025-foss-2024a-kokkos-CUDA-12.6.0.lua
software under 2025.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/software
LAMMPS/22Jul2025-foss-2024a-kokkos-CUDA-12.6.0
reprod directories under 2025.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80/reprod
LAMMPS/22Jul2025-foss-2024a-kokkos-CUDA-12.6.0/20260407_144108UTC
other under 2025.06/software/linux/x86_64/amd/zen3/accel/nvidia/cc80
no other files in tarball
Apr 07 14:42:00 UTC 2026 test result
😢 FAILURE (click triangle for details)
Reason
EESSI test suite was not run, test step itself failed to execute.
Details
✅ job output file slurm-15679189.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@casparvl
Copy link
Copy Markdown
Collaborator

casparvl commented Apr 7, 2026

@casparvl why is cuda compute capabilities set like this? LAMMPS does not like it.

Because that's the target for which we want CUDA code to be compiled :D If a particular package doesn't support the suffixes in the targets, we should make sure they get stripped. We could do this in an EESSI hook, but it would be better to do it upstream in EasyBuild.

Both a and f suffixes are valid in --cuda-compute-capabilities (see the regex pattern against which the option is checked at https://github.com/easybuilders/easybuild-framework/blob/83d94433ad38b4b02a443beb244b88a09edb2748/easybuild/tools/options.py#L1004 ) at the framework level. I'm sure there are easyblocks out there - like LAMMPS - that don't handle thise case well. But the block should just be improved to take a sensible decision. In the end, we would like nvcc to be called with a -arch=sm_90a argument (or equivalent, there are a few different options that lead to the same result). But of course, build systems sometimes abstract this away, and then we need to deal with the build system to do something sensible.

Note that the a suffix potentially allows for better optimization, because you're essentially telling the compiler: this code only needs to run on 9.0 devices (wheres -arch=sm_90 would have to produce code that runs on any 9.X device, thus potentially allowing for less optimization).

@laraPPr
Copy link
Copy Markdown
Collaborator Author

laraPPr commented Apr 7, 2026

It is because it is not in this mapping, https://github.com/easybuilders/easybuild-easyblocks/blob/ad5538e0d532f06ecdc801794e390db49aa5c350/easybuild/easyblocks/l/lammps.py#L158-L177. From your explanation I see adding 9.0a as an option if building with -arch=sm_90a is supported. I also prefer doing all this upstream in EasyBuild.

@laraPPr
Copy link
Copy Markdown
Collaborator Author

laraPPr commented Apr 7, 2026

We do not build lammps with -arch but with Kokkos (-DKokkos_ARCH_NATIVE=yes -DKokkos_ARCH_{GPU_MAPPING}=yes). So I see two options:

We add the following mapping in the easyblock
'9.0a': 'HOPPER90', # NVIDIA Hopper generation CC 9.0
Since 90a is not know in LAMMPS see https://github.com/lammps/lammps/blob/c7ae612a9497437412cb787b78769570f48653dd/lib/kokkos/core/src/impl/Kokkos_NvidiaGpuArchitectures.hpp#L50 or in kokkos https://github.com/kokkos/kokkos/blob/04eff0f546860eeda1c52c4206df9fafc337f6f7/core/src/impl/Kokkos_NvidiaGpuArchitectures.hpp#L31.

or overwrite cuda_cc in the hook. I prefer option 1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2025.06-software.eessi.io 2025.06 version of software.eessi.io accel:nvidia

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants