HPCT
High Performance Computing Tools
Objectives
The learning objectives of this course unit are:
- Integrate cluster hardware consisting of several computing and storage nodes into a supercomputer
- Configure the system services that enable efficient management of the cluster’s hardware and software
- Installing software and supplying it to various users
- Compile end-user applications and run them on various nodes
- Analysing system and application performance
- Formulate security policies and apply tools to reinforce the system, such as firewalls and intrusion detection
- Describe and document system configuration
Program
The course covers several important aspects of high-performance tools, including:
- Introduction to LINUX : Administration Basics, File System, SSH, Scripts, Environment Variables
- High-performance infrastructure architectures in particular Interconnects and network topologies in high-performance infrastructures
- Task management as SLURM
- Administrative task automation tools such as Ansible
- Containers in high-performance infrastructures such as Singularity/Apptainer
- Compiling tools such as Easy Build and Spack
- Cluster management tools such as Warewulf and IPMI
- Security in high-performance infrastructures: firewalls, certificates and PKI, security policies
- Tools for monitoring and diagnosing the system such as Kibana/InfluxDB, Grafana
Bibliography
- High-Performance Computing: Modern Systems and Practices. Thomas Sterling, Matthew Anderson, Maciej Brodowicz. 2017. Morgan Kaufmann
- Modern System Administration: Managing Reliable and Sustainable Systems. Jennifer Davis. 2022 O’ Reilly Media, Inc
- Ansible: Up and Running. Bas Meijer, Lorin Hochstein, René Moser. 2022. O’ Reilly Media, Inc