folio/hpc.qmd at main · WestDRI/folio · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
---
title: "Intro to high-performance computing (HPC)"
aliases:
  - /introhpc
resources: files/introHPC.zip
---

**January 29<sup>th</sup>, 10:00am–12:00pm Pacific Time**

<!-- {{<cor>}}January 30<sup>th</sup> (Part 1), February 6<sup>th</sup> (Part 2) and February 13<sup>th</sup> (Part 3){{</cor>}}\ -->
<!-- {{<cgr>}}All days 10:00am - noon Pacific Time{{</cgr>}} -->

<!-- --- -->

This course is an introduction to High-Performance Computing on the Alliance clusters.

**Abstract**: This course is an introduction to High-Performance Computing (HPC) on the Alliance clusters. We
will start with the cluster hardware overview, then talk about some basic tools and the software environment
on our clusters. Next we'll give a quick tour of various parallel programming frameworks such as OpenMP, MPI,
Python Dask, newer parallel languages such as Chapel and Julia, and we'll try to compile some serial,
shared-memory and distributed-memory codes using makefiles. We'll then proceed to working with the Slurm
scheduler, submitting and benchmarking our previously compiled codes. We will learn about batch and
interactive cluster usage, best practices for submitting a large number of jobs, estimating your job's
resource requirements, and managing file permissions in shared cluster filesystems. There will be many demos
and hands-on exercises on our training cluster.

**Instructor**: Alex Razoumov (SFU)

**Prerequisites:** Working knowledge of the Linux Bash shell. We will provide guest accounts to one of our
Linux systems.

**Software**: All attendees will need a remote secure shell (SSH) client installed on their computer in order
to participate in the course exercises. On Mac and Linux computers SSH is usually pre-installed (try typing
`ssh` in a terminal to make sure it is there). Many versions of Windows also provide an OpenSSH client by
default -- try opening PowerShell and typing `ssh` to see if it is available. If not, then we recommend
installing [the free Home Edition of MobaXterm](https://mobaxterm.mobatek.net/download.html).

**Materials**: Please download a [ZIP file](https://nextcloud.computecanada.ca/index.php/s/DyG3CrCHRLeqPML/download) with all slides
   (single PDF combining all chapters) and sample codes. A copy of this file is also available on the training
   cluster.

<!-- tried briefly https://folio.vastcloud.org/files/introHPC.zip -->


<!-- ## Part 1 -->

<!-- Click on each line to expand: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 1: cluster filesystems -->
<!-- Let's log in to the training cluster. Try to access `/home`, `/scratch`, `/project` on the training -->
<!-- cluster. Note that these only emulate the real production filesystems and have no speed benefits on the -->
<!-- training cluster. -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 2: edit a remote file -->
<!-- Edit a remote file in `nano` or `vi` or `emacs`. Use `cat` or `more` to view its content in the terminal. -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 3: gcc compiler -->
<!-- Load the default GNU compiler with `module` command. Which version is it? Try to understand what the module does: run -->
<!-- `module show` on it, `echo $PATH`, `which gcc`. -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 2: 4: Intel compiler -->
<!-- Load the default Intel compiler. Which version is it? Does it work on the training cluster? -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 5: third compiler? -->
<!-- Can you spot the third compiler family when you do `module avail`? -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 6: scipy-stack -->
<!-- What other modules does `scipy-stack/2022a` load? -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 7: python3 -->
<!-- How many versions of python3 do we have? What about python2? -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 8: research software -->
<!-- Think of a software package that you use. Check if it is installed on the cluster, and share your findings. -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 9: file transfer -->
<!-- Transfer a file to/from the cluster (we did this already in bash class) using either command line or GUI. Type -->
<!-- "done" into the chat when done. -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 10: why HPC? -->
<!-- Can you explain (1-2 sentences) how HPC can help us solve problems? Why a desktop/workstation not sufficient? -->
<!-- Maybe, you can give an example from your field? -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 11: tmux -->
<!-- Try left+right or upper+lower split panes in `tmux`. Edit a file in one and run bash commands in the -->
<!-- other. Try disconnecting temporarily and then reconnecting to the same session. -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 12: compiling -->
<!-- In `introHPC/codes`, compile `{pi,sharedPi,distributedPi}.c` files. Try running a short serial code on the -->
<!-- login node (not longer than a few seconds: modify the number of terms in the summation). -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 13: make -->
<!-- Write a makefile to replace these compilations commands with `make {serial,openmp,mpi}`.   -->
<!-- Add target `all`.   -->
<!-- Add target `clean`. Try implementing `clean` for *all* executable files in the current -->
<!-- directory, no matter what they are called. -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 14: Julia -->
<!-- Julia parallelism was not mentioned in the videos. Let's quickly talk about it (slide 29). -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 14b: parallelization -->
<!-- Suggest a computational problem to parallelize. Which of the parallel tools mentioned in the videos would you -->
<!-- use, and why?   -->
<!-- If you are not sure about the right tool, suggest a problem, and we can brainstorm the approach together. -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 15: Python and R -->
<!-- If you use Python or R in your work, try running a Python or R script in the terminal.   -->
<!-- If this script depends on packages, try installing them in your own directory with `virtualenv`. Probably, -->
<!-- only a few of you should do this on the training cluster at the same time. -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 16: other -->
<!-- Any remaining questions? Type your question into the chat, ask via audio (unmute), or raise your hand in Zoom. -->
<!-- ::: -->


<!-- {{< solution >}} -->
<!-- ```sh -->
<!-- function countfiles() { -->
<!--     if [ $# -eq 0 ]; then -->
<!--         echo "No arguments given. Usage: countfiles dir1 dir2 ..." -->
<!--         return 1 -->
<!--     fi -->
<!--     for dir in $@; do -->
<!--         echo in $dir we found $(find $dir -type f | wc -l) files -->
<!--     done -->
<!-- } -->
<!-- ``` -->
<!-- {{< /solution >}} -->


<!-- ## Part 2 -->

<!-- Click on each line to expand: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 17: serial job -->
<!-- Submit a serial job that runs `hostname` command.   -->
<!-- Try playing with `sq`, `squeue`, `scancel` commands. -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 18: serial job (cont.) -->
<!-- Submit a serial job based on `pi.c`.   -->
<!-- Try `sstat` on a currently running job. Try `seff` and `sacct` on a completed job. -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 19: optimization timing -->
<!-- Using a serial job, time optimized (`-O2`) vs. unoptimized code. Type your findings into the chat. -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 20: Python vs. C timing -->
<!-- Using a serial job, time `pi.c` vs. `pi.py` for the same number of terms which cannot be too large or too -->
<!-- small -- why?   -->
<!-- Python pros -- can you speed up `pi.py`? -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 21: array job -->
<!-- Submit an array job for different values of `n` (number of terms) with `pi.c`. How can you have different -->
<!-- executable for each job inside the array? -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 22: OpenMP job -->
<!-- Submit a shared-memory job based on `sharedPi.c`. Did you get any speedup? Type your answer into the chat. -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 23: MPI job -->
<!-- Submit an MPI job based on `distributedPi.c`.   -->
<!-- Try scaling 1 → 2 → 4 → 8 cores. Did you get any speedup? Type your answer into the chat. -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 24: serial interactive job -->
<!-- Test the serial code inside an interactive job. Please quit the job when done, as we have very few compute -->
<!-- cores on the training cluster.   -->
<!-- Note: we have seen the training cluster become unstable when using too many interactive resources. Strictly -->
<!-- speaking, this should not happen, however there is a small chance it might. We do have a backup. -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 25: shared-memory interactive job -->
<!-- Test the shared-memory code inside an interactive job. Please quit when done, as we have very few compute -->
<!-- cores on the training cluster. -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 26: MPI interactive job -->
<!-- Test the MPI code inside an interactive job. Please quit when done, as we have very few compute cores on the -->
<!-- training cluster. -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 27: debugging and optimization -->
<!-- Let's talk about debugging, profiling and code optimization. -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 28: permissions and file sharing -->
<!-- Let's talk about file permissions and file sharing.   -->
<!-- Share a file in your `~/projects` directory (make it readable) with all other users in `def-sponsor00` group. -->
<!-- ::: -->

<!-- ::: {.callout-note collapse="true"} -->
<!-- ## 29: other -->
<!-- Are there questions on any of the topics that we covered today? You can type your question into the chat, ask -->
<!-- via audio (unmute), or raise your hand in Zoom. -->
<!-- ::: -->


## Videos: introduction

These videos (recorded in 2020) cover the same materials we study in the course, but you can watch these at
your own pace.

- [Introduction](https://www.youtube.com/watch?v=dVMNSp98yRA) (3 min)
- [Cluster hardware overview](https://www.youtube.com/watch?v=pLy3m9Nq4rM) (17 min)
- [Basic tools on HPC clusters](https://www.youtube.com/watch?v=9StaWaE4KRw) (18 min)
- [File transfer](https://www.youtube.com/watch?v=SjANgOLA4lc) (10 min)
- [Programming languages and tools](https://www.youtube.com/watch?v=dhV0Jg8VLoU) (16 min)

**Updates**:

1. Since April 1st, 2022, your instructors in this course are based at Simon Fraser University.
1. Some of the slides and links in the video have changed -- please make sure to download
   the [latest version of the slides](http://bit.ly/introhpc2) (ZIP file).
1. Compute Canada has been replaced by the Digital Research Alliance of Canada (the Alliance). All Compute
  Canada hardware and services are now provided to researchers by the Alliance and its regional
  partners. However, you will still see many references to Compute Canada in
  [our documentation](https://docs.alliancecan.ca) and support system.
1. New systems were added (e.g. Narval in Calcul Québec), and some older systems were replaced (Cedar → Fir,
   Béluga → Rorqual, Graham → Nibi, Niagara → Trillium)

## Videos: overview of parallel programming frameworks

Here we give you a brief overview of various parallel programming tools. Our goal here is not to learn how to
use these tools, but rather tell you at a high level what these tools do, so that you understand the
difference between shared- and distributed-memory parallel programming models and know which tools you can use
for each. Later, in the scheduler session, you will use this knowledge to submit parallel jobs to the queue.

Feel free to skip some of these videos if you are not interested in parallel programming.

- [OpenMP](https://www.youtube.com/watch?v=hrN8hYYI-GA) (3 min)
- [MPI (message passing interface)](https://www.youtube.com/watch?v=0jTuecDVPYI) (9 min)
- [Chapel parallel programming language](https://www.youtube.com/watch?v=ptR9Wa-Saek) (7 min)
- [Python Dask](https://www.youtube.com/watch?v=-kYclNmUuX0) (6 min)
- [Make build automation tool](https://www.youtube.com/watch?v=m_60GzGJn6E) (9 min)
- [Other essential tools](https://www.youtube.com/watch?v=Ncwmx80zlGE) (5 min)
- [Python and R on clusters](https://www.youtube.com/watch?v=hqdvNMAaegI) (6 min)

## Videos: Slurm job scheduler

- [Slurm intro](https://www.youtube.com/watch?v=Qd39UkdajwQ) (8 min)
- [Job billing with core equivalents](https://www.youtube.com/watch?v=GjI8Fmzo20A) (2 min)
- [Submitting serial jobs](https://www.youtube.com/watch?v=sv5lUnoBV30) (12 min)
- [Submitting shared-memory jobs](https://www.youtube.com/watch?v=rIxTP8d8PaM) (9 min)
- [Submitting MPI jobs](https://www.youtube.com/watch?v=7RWpRtCCPz8) (8 min)
- [Slurm jobs and memory](https://www.youtube.com/watch?v=zaYUIjsuKoU) (8 min)
- [Hybrid and GPU jobs](https://www.youtube.com/watch?v=-1g2WM9kG88) (5 min)
- [Interactive jobs](https://www.youtube.com/watch?v=Ye7IrSxaN2k) (8 min)
- [Getting information and other Slurm commands](https://www.youtube.com/watch?v=I_U5u9F-_no) (6 min)
- [Best computing / storage practices and summary](https://www.youtube.com/watch?v=G4dcMri-gDM) (9 min)


<!-- An interactive job will give you a bash shell on one the nodes that was allocated to your job. There you -->
<!-- can start a test run, debug your code, start a VNC/ParaView/VisIt/etc server and connect to it from a -->
<!-- client on your computer, etc. Note that interactive jobs typically have a short maximum runtime, usually -->
<!-- 3 hours. -->

<!-- One of the main takeaways from this course is to learn how to transition between `sbatch` and `salloc` -->
<!-- commands. You may debug your workflow with `salloc`, transition to production jobs with `sbatch`, and -->
<!-- then find that you need to use `salloc` again to debug problems and to analyze your large datasets. -->