Checking date: 22/05/2024


Course: 2024/2025

Big Data
(16751)
Master in Financial Sector Technologies: FinTech (Plan: 461 - Estudio: 313)
EPI


Coordinating teacher: CALLE GOMEZ, FRANCISCO JAVIER

Department assigned to the subject: Computer Science and Engineering Department

Type: Compulsory
ECTS Credits: 6.0 ECTS

Course:
Semester:




Requirements (Subjects that are assumed to be known)
- Structured Databases - Algebraic Data Languages - SQL - OLAP Databases and Data Warehouse Programming skills (desirable basics of Javascript)
Objectives
- Understand the concept and all dimensions of Big Data technology. - Explore the social, business, and technological contexts underlying the emergence and expansion of this technology. - Comprehend the Information Life Cycle and the processes that sustain it. - Analyze the needs prompted by information: acquisition, transformation, storage, and exploitation. - Study the related technologies and the components of a Big Data system (front-end and back-end systems). - Become familiar with the characteristics and use of various tools supporting Big Data. -- Differentiate between structured mass storage and NoSQL Management Systems. -- Get introduced to NoSQL management through a Document-Oriented System (MongoDB). Learn how to manipulate data with this tool and study practical techniques for replication and distribution of data collections to implement massively parallel systems. -- Get introduced to other types of NoSQL management: column-oriented systems (Cassandra) and graph-oriented systems (Neo4J). -- Introduction to the Hadoop suite of tools.
Skills and learning outcomes
Description of contents: programme
Block I: Theoretical Foundation. ------------------------------------ Item 1: Introduction: Social and technological framework - Role of Information in the IT society - Need and types for Data Systems - Characterization of the Big Data concept - Implementation of Big Data - Legal and ethical aspects Item 2: Storage and No-SQL Technologies - Storage technologies: structures and processes - Transactional DB vs. Analytical DB - Architectures. Distributed Systems and CAP. - Distributed operability: MapReduce paradigm - Classification of NoSQL systems Item 3: Integration, transformation and Cleaning - Integration of sources - Transformation and Cleaning - Google Refine - SPARQL Block II: Tools Supporting Big Data: Main commercial tools for Storage, Report, and Visualization ------------------------------------ Item 4: Back-End for BigData I: MongoDB - Basic Operation in MongoBD - Aggregation in MongoBD. Pipeline and Map-Reduce. - Replication and Distribution in MongoBD Topic 5: Back-End for BigData II: Neo4J - Introduction to linked Data: Graphs - Graph based DB models. Languages. - Property Graph DB: Neo4J Item 6: Back-End for BigData III: Cassandra - Cassandra's Basics - Design on Cassandra Item 7: Back-End for BigData IV: Hadoop - The HADOOP ecosystem and its installation - SandBox - HADOOP functionality - Map-Reduce in HADOOP
Learning activities and methodology
Learning activities: AF1: Theoretical classes: presentations accompanied by digital supporting materials. AF3: Theoretical practical classes: Combination of theoretical classes accompanied by the resolution of practical exercises. AF4: Laboratory practices: Practices to be developed in specific laboratories for the different subjects. AF5: Tutorials: Face-to-face and / or distance tutorials (videoconference). AF2: E-learning activities: tutorials, recommended reading, documentation. AF7: Individual student work: Individual student activities that complement the rest of the activities (both face-to-face and non-face-to-face), as well as exam preparation. Teaching methodologies MD1: Lectures with the support of computer and audiovisual media, in which the main concepts of the subject are developed and the bibliography is provided to complement the students' learning. MD2: Critical reading of texts recommended by the professor of the subject: press articles, reports, manuals and / or academic articles, either for later discussion in class, or to expand and consolidate knowledge of the subject. MD3: Resolution of practical cases, problems, etc. raised by the teacher MD4: Exhibition and discussion in class, under the moderation of the teacher of topics related to the content of the subject, as well as practical cases MD5: Preparation of work and reports individually or in groups MD6: Specific e-learning activities, related to the semi-face-to-face nature of the degree, self-correction activities, participation in forums, and any other online teaching mechanism
Assessment System
  • % end-of-term-examination 60
  • % of continuous assessment (assigments, laboratory, practicals...) 40

Calendar of Continuous assessment


Basic Bibliography
  • Apache¿ Hadoop®. http://hadoop.apache.org/. Apache¿ Hadoop®. 2016
  • MongoBD. http://www.mongodb.org. MongoBD. 2016
Recursos electrónicosElectronic Resources *
(*) Access to some electronic resources may be restricted to members of the university community and require validation through Campus Global. If you try to connect from outside of the University you will need to set up a VPN


The course syllabus may change due academic events or other reasons.