Understand the organization and fundamental concepts of advanced computing distributed pipelines (e.g., storage, pre-processing, distributed processing).
Understand concepts of scalability and reliability of distributed architectures.
Acquire knowledge about current technologies (e.g., Apache Spark, TensorFlow, Horovod, RAY) used to facilitate the distributed processing of large amounts of data and real case studies where these are used.
Acquire knowledge about efficient data formats, optimizations, and advanced configurations of technologies for advanced computing.
Perform the orchestration (i.e., configuration and deployment) of advanced computing technologies in a distributed infrastructure, considering performance, scalability, and reliability properties.
Implement the monitoring and evaluation of these technologies.
Program
Overview and concepts of advanced computing distributed pipelines (e.g., advanced statistical analysis, machine learning, deep learning) in HPC environments.
Distributed data, computing, and pipeline parallelization architectures (e.g., MapReduce, parameter server, ring all-reduce) and systems (e.g., Apache Spark, TensorFlow, Horovod, RAY).
Formats (e.g., VCFs, TFRecords) and data processing tools (e.g., DALI) in advanced computing distributed pipelines.
Optimization and orchestration techniques for advanced computing distributed systems in HPC environments.
Case studies that demonstrate the real applicability of advanced computing pipelines for different areas such as health, mobility and finance.
Monitoring and evaluation of distributed systems for advanced computing.
Bibliography
Dhabaleswar K. Panda, Xiaoyi Lu, Dipti Shankar. High-Performance Big Data Computing, MIT Press, 2022
Guanhua Wang. Distributed Machine Learning with Python, O’Reilly, 2022
Hannes Hapke, Catherine Nelson. Building Machine Learning Pipelines, O’Reilly, 2020