High Performance Computing Tools

Objectives

The learning objectives of this course unit are:

  • Integrate cluster hardware consisting of several computing and storage nodes into a supercomputer
  • Configure the system services that enable efficient management of the cluster’s hardware and software
  • Installing software and supplying it to various users
  • Compile end-user applications and run them on various nodes
  • Analysing system and application performance
  • Formulate security policies and apply tools to reinforce the system, such as firewalls and intrusion detection
  • Describe and document system configuration

Program

The course covers several important aspects of high-performance tools, including:

  • Introduction to LINUX : Administration Basics, File System, SSH, Scripts, Environment Variables
  • High-performance infrastructure architectures in particular Interconnects and network topologies in high-performance infrastructures
  • Task management as SLURM
  • Administrative task automation tools such as Ansible
  • Containers in high-performance infrastructures such as Singularity/Apptainer
  • Compiling tools such as Easy Build and Spack
  • Cluster management tools such as Warewulf and IPMI
  • Security in high-performance infrastructures: firewalls, certificates and PKI, security policies
  • Tools for monitoring and diagnosing the system such as Kibana/InfluxDB, Grafana

Bibliography

  • High-Performance Computing: Modern Systems and Practices. Thomas Sterling, Matthew Anderson, Maciej Brodowicz. 2017. Morgan Kaufmann
  • Modern System Administration: Managing Reliable and Sustainable Systems. Jennifer Davis. 2022 O’ Reilly Media, Inc
  • Ansible: Up and Running. Bas Meijer, Lorin Hochstein, René Moser. 2022. O’ Reilly Media, Inc

Updated: