Ficha

Versión en español

Course: 2025/2026

Speech, audio, image, and video processing applications

(15936)

Master in Telecommunications Engineering (Plan: 171 - Estudio: 227)

EPI

Coordinating teacher: GALLARDO ANTOLIN, ASCENSION

Department assigned to the subject: Signal and Communications Theory Department

Type: Electives

ECTS Credits: 3.0 ECTS

Course: 2º

Semester: 1º

Objectives

OBJECTIVES Similarly to other Master's elective courses, the student will acquire a greater specialization in different areas of Telecommunications technologies. Particularly, this course covers the following Signal Processing skills: - Mathematical basis of signal processing. - General knowledge on potential speech/audio/image/video processing applications. - Basic subsystems of speech/audio/image/video processing applications. - Use of speech/audio/image/video processing software. - Handling of basic processing tools. - Solving speech/audio/image/video processing problems by using several basic tools.

Learning Outcomes

Link to document

Description of contents: programme

The goal of this subject is to provide the student with an introduction to recent signal processing techniques with application to speech, audio, image and video. To that end, a Project-Based Learning Approach is followed. The emphasis is put on lab exercises, so that the student can be assessed according to her work on a mini project. 1.- Course presentation 2.- Introduction to Deep Learning 2.1. Neural Networks 2.2. Deep Neural Networks (DNNs) 3.- Fundamentals and Techniques of Image Processing 3.1. Digital Representation of the Image. Color spaces. 3.2. Point Operations. Filtering. 3.3. Convolutional Neural Networks (CNNs) 4.- Fundamentals and Techniques of Speech and Audio Processing 4.1. Digital Representation of Speech and Audio Signals. Spectrogram. 4.2. DNNs for Speech and Audio Processing. 4.3. Recurrent Neural Networks (RNNs) 5.- Interpretable and Sustainable Artificial Intelligence for Audiovisual Data 6.- Case Studies and Applications of Speech, Audio, Image and Video Processing

Learning activities and methodology

Two teaching activities are proposed: theoretical classes with examples and lab exercises. THEORETICAL CLASSES WITH EXAMPLES (2 ECTS) The theoretical class will be given in the blackboard, with slides or by any other means to illustrate the concepts of the lectures. In these classes the explanation will be completed with examples (AF1, MD1). In these sessions the student will acquire the basic concepts of the course. It is important to highlight that these classes require the initiative and the personal and group involvement of the students (there will be concepts that the students should develop by themselves) (AF3, MD3). LABORATORY EXERCISES (1 ECTS) Some basic selected concepts learnt during the course are applied in the lab. The students should participate actively in the exercise implementation. There will be two types of lab exercises: - Guided lab exercises: getting used to speech, audio, image and video processing using Python (AF2, AF4, MD4). - Final Project: image or speech/audio processing problem to be solved in groups (AF6, AF7, MD2, MD5).

Assessment System

% end-of-term-examination/test 0
% of continuous assessment (assigments, laboratory, practicals...) 100

Calendar of Continuous assessment

SE2.- Presentation and oral presentation of individual or group works carried out during the course.
SE5.- Midterm exam.

In the ordinary call, the final grade of the subject will be based on the following submissions:

- 3 tests about guided lab exercises (30%)
- Final Project (50%)
- Midterm multiple choice exam (20%)

In the extraordinary call, the final course grade is established by means of the multiple choice examen and the delivery of the lab exercises and the final project.

Basic Bibliography

Aurélien Géron. Hands-On Machine Learning with Scikit-Learn and TensorFlow. O'Reilly Media. 2017
Francois Chollet. Deep Learning with Python. Manning Publications. 2017
Ian Goodfellow, Yoshua Bengio, Aaron Courville . Deep Learning. MIT Press. 2016
Pradeepta Mishra. PyTorch Recipes A Problem-Solution Approach. Berkeley, CA : Apress : Imprint: Apress. 2019

Additional Bibliography

Ben Gold (Author), Nelson Morgan (Author), Dan Ellis (Author). Speech and Audio Signal Processing: Processing and Perception of Speech and Music.. Wiley. 2011
Christopher M. Bishop. Pattern Recognition and Machine Learning. Springer. 2006
Forsyth & Ponce. Computer Vision: A Modern Approach. Pearson. 2012
Gonzalez and Woods. Digital Image Processing 4th Edition. Pearson. 2018
Wilhelm Burger and Mark J. Burge. Principles of Digital Image Processing: Core Techniques. Springer-Verlag. 2009

The course syllabus may change due academic events or other reasons.