Checking date: 08/07/2020


Course: 2020/2021

High-performance computing for big data in companies
(17231)
Master in Big Data Analytics (Plan: 352 - Estudio: 322)
EPI


Coordinating teacher: GARCIA BLAS, FRANCISCO JAVIER

Department assigned to the subject: Computer Science and Engineering Department

Type: Compulsory
ECTS Credits: 3.0 ECTS

Course:
Semester:




Objectives
Basic Skills * Knowledge and understanding that provide a basis or opportunity for originality in developing and / or applying ideas, often in a research context * That the students can apply the broader (or multidisciplinary) acquired knowledge and ability to solve problems in new or unfamiliar environments within contexts related to their field of study * Students must possess the learning skills that enable them to continue studying in a way that will be largely self-directed or autonomous. General Competencies * Apply the theoretical underpinnings of the techniques for the high-performance processing of large volumes of data as a basis for the development and adaptation of such techniques to specific problems * Identify different techniques and paradigms for processing large amounts of data, and differentiate them according to their theoretical and practical features * Use skills for teamwork and getting along with other independently Specific Skills * Apply basic knowledge of big data programming techniques using advanced technologies and methods for treating large volumes of data * Identify opportunities that data processing techniques can make to the improvement of the activity of enterprises and organizations * Provide basic and fundamental knowledge of big data processing frameworks * Identify and select suitable frameworks and software tools for the treatment of large amounts of data * Making efficient use of distributed platforms for high-performance data processing Learning Results * Manage the basics of big data processing frameworks. * Ability to use high-performance architectures and technologies for large volumes of data. * Knowledge of design techniques and application development of high-performance big data computing. * Skills to analyze and model the most appropriate frameworks for each problem, adapting to the specifications of individual cases
Description of contents: programme
1. Introduction to Big Data Processing 2. MapReduce Paradigm 3. Storage Systems Big Data environments * HDFS as distributed file system * Commands for managing files in HDFS 4. Frameworks for intensive computing data * Introduction to Apache Hadoop * Functional Programming in Scala * Apache Spark * Access and processing a large volume of data * Streaming Data Processing 4. Management computational resources * Introduction to Apache Yarn * Deploying applications in corporate Big Data environments * Tools for monitoring Big Data applications
Learning activities and methodology
Learning activities: * Lectures * Hands-on land lab projects * Personal student work. Teaching methodology: * Presential lectures imparted in the class, using multimedia and informatics support, to develop the main concepts of the course. Reading materials will be provided to complement students knowledge. * Reading of recommended texts, from papers, technical journals, manuals and reports, to extend the student knowledge of the subject topics. * Solving practical jobs, problems, etc. proposed in class (individually or in groups).
Assessment System
  • % end-of-term-examination 50
  • % of continuous assessment (assigments, laboratory, practicals...) 50

Basic Bibliography
  • Holden Karau, Andy Konwinski, Patrick Wendell & Matei Zaharia. Learning Spark. O¿Reilly. 2015
  • Martin Odersky, Lex Spoon, Bil Venners. Programming in Scala. Artima.

The course syllabus may change due academic events or other reasons.


More information: https://www.arcos.inf.uc3m.es/fjblas