Última actualización: 05/05/2025 22:59:57


Curso Académico: 2025/2026

Audio processing, Video processing and Computer vision
(16508)
Dual Bachelor Data Science and Engineering - Telecommunication Technologies Engineering (Study Plan 2020) (Plan: 456 - Estudio: 371)


Coordinador/a: GONZALEZ DIAZ, IVAN

Departamento asignado a la asignatura: Signal and Communications Theory Department

Tipo: Compulsory
Créditos: 6.0 ECTS

Curso:
Cuatrimestre:




Requirements (Subjects that are assumed to be known)
Neural Networks Signals and Systems Machine Learning I and II
Objectives
Students must achieve the following objectives: 1) Know image, audio, sppech and video signals, as well as their main parameters and the digitization process. 2) Know the most important techniques of image, video and audio processing, as well as the main tasks in computer vision and audio. 3) Apply machine learning and deep learning techniques studied in previous subjects to the analysis of audiovisual content: images, video, audio, speech. 4) Develop intelligent applications that involve the automatic analysis of audiovisual content.
Description of contents: programme
The course is divided into two main blocks absed on the signal modalities: on the one hand, image and video and, on the other, voice and audio. In both cases, signals and their main characteristics are presented first, including certain notions of the visual and auditory systems. Next, the most common techniques for each signal processing are studied, illustrating their use in selected applications. Finally, most modern approaches are introduced, based on the application of deep learning (e.g. CNNs and RNNs), which constitute nowadays the state of the art of technology. The course program is organized as follows: Block 1: Processing of visual signals: image and video ============================================ Topic 1: Introduction to digital video and images Topic 2: Fundamentals of image and video processing Topic 3: Image Representation: low-level descriptors Topic 4: Image Segmentation Topic 5: Convolutional Neural Networks (CNNs) for image classification Topic 6: Other applications of CNNs in visual analysis: object detection, semantic segmentation, image generation, style transfer Block 2: Processing of speech and audio signals ======================================= Topic 7: Fundamentals of digital audio and speech: generation, perception and digitization Topic 8: Time-located analysis for speec and audio signals Topic 9: Low-level speech and audio descriptors Topic 10: Neural Networks for Sequential Data Analysis: Temporal CNNs, Recurrent Neural Networks, Transformers, State-Space Models (SSMs). Applications of these models to audio/voice signals.
Learning activities and methodology
Two teaching activities are proposed: lectures and lab sessions. LECTURES The lecture sessions will be supported by slides or by any other means to illustrate the concepts explained. In these classes the explanation will be completed with examples. In these sessions the student will acquire the basic concepts of the course. It is important to highlight that these classes require the initiative and the personal and group involvement of the students (there will be concepts that the students themselves should develop). LABORATORY SESSIONS This is a course with a high practical component, and students will attend to laboratory sessions very often. In them, the concepts explained during the lectures will be put into practice using the programming language python, and software libraries for image analysis and computer vision (scikit-image, PIL, OpenCV), audio analysis (scikit-sound), machine learning (scikit-learn) and deep learning (pytorch). In the laboratory, machines equipped with high-performance GPUs are available and but students can also use free distributed computing systems such as Google Colab.
Assessment System
  • Peso porcentual del Examen/Prueba Final 40
  • Peso porcentual del resto de la evaluación 60

Calendar of Continuous assessment


Extraordinary call: regulations

El programa de la asignatura podría sufrir alguna variación por causa de fuerza mayor debidamente justificada o por eventos académicos comunicados con antelación.