The course is divided into two main blocks absed on the signal modalities: on the one hand, image and video and, on the other, voice and audio.
In both cases, signals and their main characteristics are presented first, including certain notions of the visual and auditory systems. Next, the most common techniques for each signal processing are studied, illustrating their use in selected applications. Finally, most modern approaches are introduced, based on the application of deep learning (e.g. CNNs and RNNs), which constitute nowadays the state of the art of technology.
The course program is organized as follows:
Block 1: Processing of visual signals: image and video
============================================
Topic 1: Introduction to digital video and images
Topic 2: Fundamentals of image and video processing
Topic 3: Image Representation: low-level descriptors
Topic 4: Image Segmentation
Topic 5: Convolutional Neural Networks (CNNs) for image classification
Topic 6: Other applications of CNNs in visual analysis: object detection, semantic segmentation, image generation, style transfer
Block 2: Processing of speech and audio signals
=======================================
Topic 7: Fundamentals of digital audio and speech: generation, perception and digitization
Topic 8: Time-located analysis for speec and audio signals
Topic 9: Low-level speech and audio descriptors
Topic 10: Neural Networks for sequential data analysis: temporal CNNs, Recurrent Neural Networks, Trasnformers, applications on audio/speech