Checking date: 28/04/2025 12:54:05


Course: 2025/2026

Information Systems
(15397)
Dual Bachelor Data Science and Engineering - Telecommunication Technologies Engineering (Study Plan 2020) (Plan: 456 - Estudio: 371)


Coordinating teacher: DIAZ SANCHEZ, DANIEL

Department assigned to the subject: Telematic Engineering Department

Type: Electives
ECTS Credits: 6.0 ECTS

Course:
Semester:




Requirements (Subjects that are assumed to be known)
System architectures (be familiar with the labs) Telematic Applications (have knowledge about upper protocolo layers, specifically HTTP)
Objectives
The goal of this course is for the student to acquire knowledge to understand the functioning and make use of modern distributed computing systems that enable many of the popular services offered and consumed by millions of users worldwide, such as Dropbox, Spotify, YouTube, and Google Search Engine. The philosophy of this subject is to facilitate practical learning of distributed computing through the use of real interfaces of well-known and current services. To achieve this objective, the student must acquire a series of knowledge and skills. Knowledge The knowledge gained by taking this course includes: 1. Understanding the structure of a modern distributed communication system, its internal characteristics, and its interfaces with mobile applications, desktop systems, and other systems. Understanding the evolution of computing that has led us to this scenario. 2. Understanding the basic communication mechanisms in distributed systems that allow the transfer of computational loads between remote systems through the network, such as REST interfaces or WebServices. 3. Understanding the fundamentals of modern load distribution systems used by major providers like Google, Amazon, and many others. 4. Understanding real scenarios that enable scalability, cloud architectures for application development and/or storage. 5. A technical introduction to systems for processing large volumes of data (big data). Understanding the advantages and disadvantages of these systems and when a specific application should be implemented in a distributed way (resource scaling and data center economy). Understanding real data set cases that require distributed processing. Specific Skills The specific skills the student will acquire upon completing the course include: 1. Using basic communication mechanisms in distributed systems. 2. Understanding centralized and distributed network applications through demonstrative practices. 3. Modeling and deploying a distributed system and making use of existing applications and clouds. General Skills As for the general skills that the student will acquire: 1. The ability to apply knowledge of telecommunications and engineering technologies. This skill will be particularly worked on through laboratory practices and the resolution of exercises in theoretical classes. 2. The ability to use engineering techniques and tools necessary for professional practice. 3. The ability to communicate effectively both orally, in writing, and graphically, in both Spanish and English throughout the activities proposed in the subject (exercises, papers on new technologies, etc.). 4. Recognition of the need for continuous learning and the ability to obtain and apply the required information by accessing technical literature related to the subject area in both Spanish and English. 5. Knowledge of new technologies and trends in the field of study.
Description of contents: programme
Summary The subject is roughly divided equally between theory and laboratory work, with three content blocks: 1) Cloud: It analyzes the architectures that have led to the cloud and current models. AWS and Google Cloud are used in the lab. 2) Communications and the Cloud: It covers data representation, the use of REST, queues, and other protocols for interaction, as well as data persistence and extraction (dataset construction). AWS, Google Cloud, Spotify, and other real systems are used in the lab. 3) Big Data: It analyzes the needs, the type of hardware required, distributed storage systems, data center sizing, and architecture. HDFS and Hadoop are used in the lab to understand Location Aware and MapReduce principles, Spark for introducing the use of multiple MapReduce rounds, as well as for streaming and high-level programming supported by a cluster. Cloud Computing Program - Introduction to distributed computing: Evolution of computing, legacy systems and evolution to distributed systems, distributed systems and Cloud Computing, models of computing distribution, what is cloud computing?, architectures regarding deployment (IaaS, PaaS, SaaS), security and location aspects in the Cloud, challenges and opportunities, use cases. LABS: AWS EC2, Google Cloud Compute Engine, Google App Engine, AWS Lambda. - Communications and the Cloud: Legacy protocols, current protocols (synchronous, asynchronous, messaging, queues), data representation, data capture and extraction, architectural and network considerations, challenges and opportunities, use cases. LABS: Introduction to REST with Flask (Google Cloud), programmatic data extraction via API (Twitter, Spotify), forced data extraction via Scraping, MQTT and other IoT/M2M protocols. - Big Data: Big Data system architecture, analysis of processing/storage hardware characteristics, anatomy of a data center, modern distributed storage systems, modern batch processing systems, modern stream processing systems, highly distributed systems, challenges and opportunities, use cases. LABS: HDFS/Hadoop, Big Data Spark batch, Big Data Spark stream, algorithms in Spark, Spark and notebooks (PySpark).
Learning activities and methodology
The activities used to underpin the competences and the skills in the course are : - During the first part of the course, the students should complete a set of guided labs, where they will modify code provided by the teaching staff in order to gradually acquire the needed skills to pass the course. These guided labs will be closely related with a real case study, presented at the beginning of the course to learn modern cloud systems as those provided by Google or Amazon. - During a six-week period, students will be divided into pairs, and they are expected to complete a project entailing the design and implementation of a Cloud application (or an application that access a Cloud) that should use one or more of the technologies and paradigms covered during the course (PO a). - Students are requested in several activities throughout the course to search for auxiliary documents to support the information studied in a topic. In their final report, they must acknowledge the information sources they used (PO: i). - Use of the following tools: Virtual machines, IDEs, and version control in multiple laboratory sessions (PO: k). - Exercises covering the topics of the course (PO a) During these activities the teaching staff reviews the student work in the class, supervises the lab sessions, answers questions in course forum, maintains office hours and calls for plenary office hours upon demand.
Assessment System
  • % end-of-term-examination/test 0
  • % of continuous assessment (assigments, laboratory, practicals...) 100

Calendar of Continuous assessment


Extraordinary call: regulations
Basic Bibliography
  • Tom White. Hadoop : the definitive guide. O'Reilly. 2009
  • George F. Coulouris. Distributed systems : concepts and design. Addison-Wesley. 2005
Recursos electrónicosElectronic Resources *
(*) Access to some electronic resources may be restricted to members of the university community and require validation through Campus Global. If you try to connect from outside of the University you will need to set up a VPN


The course syllabus may change due academic events or other reasons.


More information: https://gitlab.gast.it.uc3m.es/distributed-computing-assignements