Checking date: 19/04/2023

Course: 2023/2024

Data Intensive Computing
Master in Machine Learning for Health (Plan: 480 - Estudio: 359)

Coordinating teacher: MOLINA BULLA, HAROLD YESID

Department assigned to the subject: Signal and Communications Theory Department

Type: Electives
ECTS Credits: 6.0 ECTS


Requirements (Subjects that are assumed to be known)
Basic programming skills in Python
The main objective of this course is to train analysts in data science, either for research or the labour market, using the latest techniques applied in machine learning. We will learn how to get the most out of the computational resources that we can have at our disposal, from our own computer, resources in the cloud for programming with GPUs and programming for Big Data; knowledge in high demand in various environments. The basic concepts of parallel programming will be explained: 1- using the resources of common computers, 2- hybrid programming: using the resources of a normal computer and specific hardware such as graphics processing units (GPUs). 3- distributed and cloud programming, for big cases, such as Big Data.
Skills and learning outcomes
Description of contents: programme
Parallel Programming: * Multiprocessor/Multicore Programming. * Parallel Programming with Shared Memory (and the dangers for data) * Parallel Programming with Shared Memory with semaphores and locking Hybrid Programming * Use of GPUs and how they differ from ordinary computer processors. * How GPUs are programmed: when and how they can be used. * Use of advanced GPU programming techniques, GPU resource management. * How to use GPUs in Machine Learning. Distributed Programming * What is distributed computing and how can we take advantage of it? * Use of distributed programming platforms for BigData and Machine Learning. * Use cases of Apache Spark for machine learning.
Learning activities and methodology
It is a practice-oriented course. The basic theoretical concepts of intensive computing will be taught, in order to carry out guided practice. Use cases of the technologies will be implemented, oriented to form an advanced researcher in data science.
Assessment System
  • % end-of-term-examination 60
  • % of continuous assessment (assigments, laboratory, practicals...) 40
Calendar of Continuous assessment
Basic Bibliography
  • Benjamin Bengfort ; Jenny Kim. Interactive Spark using PySpark. O'Reilly Media. 2016
  • Holden Karau ; Rachel Warren. High Performance Spark. O'Reilly Media. 2017
  • Ian Gorton, Deborah K. Gracio. Data-Intensive Computing: Architectures, Algorithms, and Applications. Cambridge University Press New York. 2012

The course syllabus may change due academic events or other reasons.