Checking date: 20/03/2024


Course: 2024/2025

High Performance Computing for Data Science
(19377)
Master in Statistics for Data Science (Plan: 386 - Estudio: 345)
EPI


Coordinating teacher: UCAR MARQUES, IÑAKI

Department assigned to the subject: Statistics Department

Type: Electives
ECTS Credits: 3.0 ECTS

Course:
Semester:




Requirements (Subjects that are assumed to be known)
Programming in R Advanced Programming
Objectives
- Knowledge of the general principles of high-performance computing. - Knowledge of automation and scripting strategies and tools. - Ability to select appropriate tools for program measurement and profiling. - Ability to connect R code with external C/C++ libraries through Rcpp. - Ability to parallelize R code and compiled code. - Ability to produce reproducible execution environments. - Ability to send workflows to the cloud.
Skills and learning outcomes
Description of contents: programme
1. The road to High-Performance Computing   1.1. HPC overview   1.2. Tools for automation and scripting   1.3. Tools for measuring and profiling 2. Making your code run faster: On the shoulders of giants   2.1. Motivation and main concepts   2.2. Interfacing C/C++ external libraries via Rcpp   2.3. Efficient use of linear algebra engines   2.4. Interfacing other languages and libraries   2.5. Use cases in Statistics 3. Running multiple things at once: Parallel programming   3.1. Motivation and main concepts   3.2. Low-level parallelism: OpenMP, RcppParallel   3.3. High-level parallelism: the future package   3.4. Use cases in Statistics 4. Using more resources: Working in the cloud and beyond   4.1. Motivation and main concepts   4.2. Containerization: reproducible execution environments   4.3. Scaling R in the cloud with googleComputeEngineR   4.4. Use cases in Statistics The program is subject to modifications due to the course development and/or academic calendar.
Learning activities and methodology
* Training activities   - AF1: Theoretical lesson.   - AF2: Practical lesson.   - AF5: Tutorials.   - AF6: Group work.   - AF7: Individual work.   - AF8: On-site evaluation tests. * Teaching methodologies   - MD1: Class lectures by the professor with the support of computer and audiovisual media, in which the main concepts of the subject are developed and the bibliography is provided to complement the students' learning.   - MD3: Resolution of practical cases, problems, etc. posed by the teacher individually or in groups.   - MD5: Preparation of papers and reports individually or in groups.
Assessment System
  • % end-of-term-examination 0
  • % of continuous assessment (assigments, laboratory, practicals...) 100
Calendar of Continuous assessment
Basic Bibliography
  • Chambers, J. M.. Software for Data Analysis Programming with R. Springer. 2009
  • Chambers, J. M.. Extending R. Chapman and Hall/CRC. 2017
Additional Bibliography
  • Chapple, S., Troup, E., Forster, T., and Sloan, T.. Mastering Parallel Programming with R. Packt Publishing. 2016
  • Eddelbuettel, D.. Seamless R and C++ integration with Rcpp. Springer. 2013
  • McCallum, Q. E. and Weston, S.. Parallel R. O'Reilly Media. 2012

The course syllabus may change due academic events or other reasons.