Orchestration of Distributed Advanced Computing

Objectives

  • Understand the organization and fundamental concepts of advanced computing distributed pipelines (e.g., storage, pre-processing, distributed processing).
  • Understand concepts of scalability and reliability of distributed architectures.
  • Acquire knowledge about current technologies (e.g., Apache Spark, TensorFlow, Horovod, RAY) used to facilitate the distributed processing of large amounts of data and real case studies where these are used.
  • Acquire knowledge about efficient data formats, optimizations, and advanced configurations of technologies for advanced computing.
  • Perform the orchestration (i.e., configuration and deployment) of advanced computing technologies in a distributed infrastructure, considering performance, scalability, and reliability properties.
  • Implement the monitoring and evaluation of these technologies.

Program

  • Overview and concepts of advanced computing distributed pipelines (e.g., advanced statistical analysis, machine learning, deep learning) in HPC environments.
  • Distributed data, computing, and pipeline parallelization architectures (e.g., MapReduce, parameter server, ring all-reduce) and systems (e.g., Apache Spark, TensorFlow, Horovod, RAY).
  • Formats (e.g., VCFs, TFRecords) and data processing tools (e.g., DALI) in advanced computing distributed pipelines.
  • Optimization and orchestration techniques for advanced computing distributed systems in HPC environments.
  • Case studies that demonstrate the real applicability of advanced computing pipelines for different areas such as health, mobility and finance.
  • Monitoring and evaluation of distributed systems for advanced computing.

Bibliography

  • Dhabaleswar K. Panda, Xiaoyi Lu, Dipti Shankar. High-Performance Big Data Computing, MIT Press, 2022
  • Guanhua Wang. Distributed Machine Learning with Python, O’Reilly, 2022
  • Hannes Hapke, Catherine Nelson. Building Machine Learning Pipelines, O’Reilly, 2020
  • Yuan Tang. Distributed Machine Learning Patterns, Manning, 2023
  • Max Pumperla, Edward Oakes, Richard Liaw. Learning Ray, O’Reilly, 2023

Updated: