Ficha

Versión en español

Course: 2025/2026

Audio and Visual Analytics

(18469)

Bachelor in Telecommunication Technologies Engineering (Study Plan 2019) (Plan: 445 - Estudio: 252)

Coordinating teacher: PELAEZ MORENO, CARMEN

Department assigned to the subject: Signal and Communications Theory Department

Type: Electives

ECTS Credits: 6.0 ECTS

Course: 4º

Semester:

Objectives

Audiovisual data is of vital importance for the entertainment industry (digital media, television, radio, podcasts, video games, music, etc.) in which its combination with telecommunications has radically changed our lives, especially in the way we which we interact with said data (intelligent assistants such as Siri, Alexa, Google Assistant, etc). In addition, these data are increasingly relevant in areas such as medicine where the increasing sophistication of sensing devices and even wearable devices such as smart watches, virtual and augmented reality glasses, etc., generate more and more types of data with immense potential for transforming society and creating new markets. But the key technology that has become essential to transform this large amount of audiovisual data into useful information and knowledge is artificial intelligence or machine learning, including neural networks and deep learning. Therefore, the objective of this subject is to provide students with theoretical and methodological knowledge on algorithms and methods for the analysis of audiovisual information, including retrieval and indexing of multimedia information for navigation and search, user profiling, opinion mining and positioning, personalization of recommendations, etc. In addition, an eminently practical point of view will be adopted, providing the tools to put theoretical knowledge into practice in the laboratory, so that students end up being able to develop an audiovisual data analysis project based on machine learning. Ultimately, this will allow connections to be made with the myriad of applications the business products and services they support (for example, various Google services, platforms like Twitter, Instagram, TikTok, Spotify, Netflix, YouTube, Twitch, Shazam, and more). 1. TRANSVERSAL/GENERIC COMPETENCES: 1.1. Personal work capacity. 1.2. Capacity for analysis and synthesis. 1.3. Ability to apply theoretical concepts in practical cases. 1.4. Skills related to group work and collaboration with other colleagues. 1.5. Skills related to making oral and written presentations. 2. SPECIFIC OBJECTIVES: 2.1. To understand the fundamentals of audio-visual data analysis and its applications. 2.2. To understand the basic methods of representation and description of speech, audio, image and video. 2.3. To understand the methods and technologies used for classification, detection, indexing, recovery, filtering, personalization or recognition of voice, audio, image or video. 2.4. Ability to design and implement the above methods and technologies in practical problems of automatic voice, audio, image and video analysis.

Learning Outcomes

CB1: Students have demonstrated possession and understanding of knowledge in an area of study that builds on the foundation of general secondary education, and is usually at a level that, while relying on advanced textbooks, also includes some aspects that involve knowledge from the cutting edge of their field of study CB2: Students are able to apply their knowledge to their work or vocation in a professional manner and possess the competences usually demonstrated through the development and defence of arguments and problem solving within their field of study. CG3: Knowledge of basic and technological subject areas which enable acquisition of new methods and technologies, as well as endowing the technical engineer with the versatility necessary to adapt to any new situation. ETEGITT3: Ability to analyze, codify, process and transmit multimedia information using analog and digital signal processing techniques. RA1: Knowledge and understanding of the general fundamentals of engineering, scientific and mathematical principles, as well as those of their branch or specialty, including some knowledge at the forefront of their field. RA3: Design. Graduates will have the ability to make engineering designs according to their level of knowledge and understanding, working as a team. Design encompasses devices, processes, methods and objects, and specifications that are broader than strictly technical, including social awareness, health and safety, environmental and commercial considerations RA5: Applications. Graduates will have the ability to apply their knowledge and understanding to solve problems, conduct research, and design engineering devices or processes. These skills include knowledge, use and limitations of materials, computer models, process engineering, equipment, practical work, technical literature and information sources. They must be aware of all the implications of engineering practice: ethical, environmental, commercial and industrial.

Description of contents: programme

The modern information overload problem caused by the availability of enormous amounts of information through internet makes it necessary to design systems that allow us to find the information we search and filter or personalize the information according to our needs. For this, it is essential to be able to automatically index not only textual content but also audio, voice, image or video, using methods based on the content itself or collaborative labeling such as that which takes place on social networks. Topic 0. Overview of audiovisual data analysis. Topic 1. Audiovisual data descriptors Topic 2. Methods for the analysis of audiovisual data Topic 3. Audiovisual data retrieval and filtering systems Topic 4. Applications

Learning activities and methodology

Several types of learning activities are proposed: theoretical and practical lessons, lab assignments and final project. Several methodologies will be adopted: theoretical lessons and problem-based learning (with different levels of supervision and guidance). The following learning activities and methodologies are employed: Combined master and lab classes, flipped classes and final project. Combined master and lab clases (3 ECTS): Master classes provide an overview of the main theoretical & mathematical concepts of the representation and processing of audiovisual data along with the analytic tools employed for indexing and accessing audiovisual contents and for their profiling and automatic recommendation. In these classes, lab examples will be introduced as part of the theoretical expositions: all the formative sessions (lab availability provided) will take place in the lab to imbricate practical examples within the explanations to add dynamism to the class. This is also beneficial to solve different background issues given the possibility to access this subject from all the degrees from the Telecommunications family. Moreover, every unit will begin with a debate of its technological implications. For this purpose, flipped classroom methodologies will be employed. In particular, students will be provided with some selected videos in advance to motivate the debate together with a list of questions (sometimes controversial) that the instructor will not answer categorically to encourage discussions. In this way, we expect to awake the curiosity of the student on the materials that will be subsequently explained. COLLABORATIVE LEARNING (1 ECTS) A practical example of the use of collaborative methods learned in theory classes for content labeling will be built, to establish connections between applications and commercial services that benefit from the analysis of audiovisual data, as well as the milestones and failures of learned technologies. For this, collaborative concept mapping tools will be used. FINAL PROJECT (2 ECTS) Students will work on a project in which they will program a complete modular system of one of the tools explained in class. The students will be provided with some guidelines and some preparatory sessions by using problem-based learning.

Assessment System

% end-of-term-examination/test 0
% of continuous assessment (assigments, laboratory, practicals...) 100

Calendar of Continuous assessment

Lab assignments and questionnaires (40%)
Final Project (60%)

Extraordinary call: regulations

Basic Bibliography

C. D. Manning, P. Raghavan and H. Schultze. Introduction to Information Retrieval. MIT press. 2008
Rafael C. González and Richard E. Woods. Digital Image Processing. Fourth Edition, Pearson. 2018

Additional Bibliography

Ben Gold (Author), Nelson Morgan (Author), Dan Ellis (Author) . Speech and Audio Signal Processing: Processing and Perception of Speech and Music. Wiley. 2011
Dan Jurafsky and James H. Martin . Speech and Language Processing (3rd ed.). Prentice Hall. 2018
Li Deng (Editor), Yang Liu (Editor) . Deep Learning in Natural Language Processing. Springer. 2018
Ricardo Baeza-Yates, Berthier Ribeiro-Neto. Modern Information Retrieval: the concepts and technology behind search. 2nd Edition, Pearson. 2011
S. Theodoridis and K. Koutroumbas. Pattern Recognition. 4th ed., Academic Press. 2008
Wilhelm Burger and Mark J. Burge. Principles of Digital Image Processing: Fundamental Techniques. Springer-Verlag. 2009
Yu, Dong, Deng, Li . Automatic Speech Recognition. Springer. 2015

The course syllabus may change due academic events or other reasons.