Checking date: 24/04/2024


Course: 2024/2025

Information Systems
(15397)
Dual Bachelor Data Science and Engineering - Telecommunication Technologies Engineering (Plan: 456 - Estudio: 371)


Coordinating teacher: DIAZ SANCHEZ, DANIEL

Department assigned to the subject: Telematic Engineering Department

Type: Electives
ECTS Credits: 6.0 ECTS

Course:
Semester:




Requirements (Subjects that are assumed to be known)
System architectures (be familiar with the labs) Telematic Applications (have knowledge about upper protocolo layers, specifically HTTP)
Objectives
The goal of this course is to enable the student to acquire knowledge relative to the fundamental concepts on modern distributed systems and enable him/her to use modern distributed systems that enable popular services used by millions of users all over the world, such as Dropbox, Spotify, Youtube, Google Search Engine The philosophy of this course is to enable the practical learning of the distributed computing through the usage of interfaces from widely known and used services. In order to achieve this goal, the student should acquire a series of knowledges and capabilities. Knowledge To achieve this goal the student must acquire the following knowledge: - To know the structure of a modern distributed systems, its internal characteristics and interfaces with other systems, such as mobile and desktop applications or others. - To know the basic communication mechanisms of distributed systems, using different protocols used nowadays, such as HTTP (REST interfaces) or Web Services - To know modern load balancing systems, from Enterprise Service Bus to the ones used by Map Reduce (used currently by Google, Amazon and others). - To know real-world scenarios for web applications with high number of visits (Amazon, periĆ³dicos); cloud architectures for applications development and/or storing. - To know the advantages and disadvantages of these systems and when to develop a specific application in a distributed way (resource scaling, data center economics). To know data sets that need distributed processing. Specific capabilities The specific capabilities that will be acquired by the student upon successful completion of the course will be: - To use basic communication mechanisms in distributed systems. - To know centralised and distributed applications through demonstrative labs. - To model and deploy a distributed system and to use existing applications and clouds. General capabilities The general capabilities acquired by the students will be: - Ability to use and apply knowledge of telecommunications technologies and engineering. The students will work on this capability in the lab sessions and in the exercise sessions. - Ability to use the techniques and modern engineering tools necessary for the professional practice - Ability to effectively communicate information in speech, presentation and in writing in english and spanish, through the development of the proposed activities in the subject (exercises, projects about new trend technologies, etc) - Recognition of the need of a life-long learning and the ability to obtain and apply the required and suitable information through the use of technical literature related to the subject field in spanish and in english - Knowledge of contemporary issues and trends in the field of study
Skills and learning outcomes
Description of contents: programme
Summary The subject is roughly divided equally between theory and laboratory work, with three content blocks: 1) Cloud: It analyzes the architectures that have led to the cloud and current models. AWS and Google Cloud are used in the lab. 2) Communications and the Cloud: It covers data representation, the use of REST, queues, and other protocols for interaction, as well as data persistence and extraction (dataset construction). AWS, Google Cloud, Spotify, and other real systems are used in the lab. 3) Big Data: It analyzes the needs, the type of hardware required, distributed storage systems, data center sizing, and architecture. HDFS and Hadoop are used in the lab to understand Location Aware and MapReduce principles, Spark for introducing the use of multiple MapReduce rounds, as well as for streaming and high-level programming supported by a cluster. Cloud Computing Program - Introduction to distributed computing: Evolution of computing, legacy systems and evolution to distributed systems, distributed systems and Cloud Computing, models of computing distribution, what is cloud computing?, architectures regarding deployment (IaaS, PaaS, SaaS), security and location aspects in the Cloud, challenges and opportunities, use cases. LABS: AWS EC2, Google Cloud Compute Engine, Google App Engine, AWS Lambda. - Communications and the Cloud: Legacy protocols, current protocols (synchronous, asynchronous, messaging, queues), data representation, data capture and extraction, architectural and network considerations, challenges and opportunities, use cases. LABS: Introduction to REST with Flask (Google Cloud), programmatic data extraction via API (Twitter, Spotify), forced data extraction via Scraping, MQTT and other IoT/M2M protocols. - Big Data: Big Data system architecture, analysis of processing/storage hardware characteristics, anatomy of a data center, modern distributed storage systems, modern batch processing systems, modern stream processing systems, highly distributed systems, challenges and opportunities, use cases. LABS: HDFS/Hadoop, Big Data Spark batch, Big Data Spark stream, algorithms in Spark, Spark and notebooks (PySpark).
Learning activities and methodology
The activities used to underpin the competences and the skills in the course are : - During the first part of the course, the students should complete a set of guided labs, where they will modify code provided by the teaching staff in order to gradually acquire the needed skills to pass the course. These guided labs will be closely related with a real case study, presented at the beginning of the course to learn modern cloud systems as those provided by Google or Amazon. - During a six-week period, students will be divided into pairs, and they are expected to complete a project entailing the design and implementation of a Cloud application (or an application that access a Cloud) that should use one or more of the technologies and paradigms covered during the course (PO a). - Students are requested in several activities throughout the course to search for auxiliary documents to support the information studied in a topic. In their final report, they must acknowledge the information sources they used (PO: i). - Use of the following tools: Virtual machines, IDEs, and version control in multiple laboratory sessions (PO: k). - Exercises covering the topics of the course (PO a) During these activities the teaching staff reviews the student work in the class, supervises the lab sessions, answers questions in course forum, maintains office hours and calls for plenary office hours upon demand.
Assessment System
  • % end-of-term-examination 0
  • % of continuous assessment (assigments, laboratory, practicals...) 100

Calendar of Continuous assessment


Extraordinary call: regulations
Basic Bibliography
  • Tom White. Hadoop : the definitive guide. O'Reilly. 2009
  • George F. Coulouris. Distributed systems : concepts and design. Addison-Wesley. 2005
Recursos electrónicosElectronic Resources *
(*) Access to some electronic resources may be restricted to members of the university community and require validation through Campus Global. If you try to connect from outside of the University you will need to set up a VPN


The course syllabus may change due academic events or other reasons.


More information: https://gitlab.gast.it.uc3m.es/distributed-computing-assignements