The goal of this subject is to provide the student with an introduction to signal processing techniques with application to speech, audio, image and video.
To that end, the emphasis is put on lab exercises, so that the student can be assessed according to her work on a mini project.
The subject is divided into two main blocks: first, image and video processing and, second, voice and audio processing.
In both blocks, the signals and their characteristics are presented first, including certain notions of the visual and auditory systems. Next, the fundamental processing techniques for specific signals are presented, illustrating the use of these techniques in selected applications. Then, the convolutional neuronal networks are introduced, and several applications are described in both areas (image & video and speech & audio).
PROGRAMME
Fundamentals of Image and Video Processing
A first approach to Image Classification
Convolutional Neural Networks (CNNs)
- Brief Review of Neural Networks (NNs) and Deep Neural Networks (DNNs)
- Fundamentals and Building Blocks
- Applications in Computer Vision
Recurrent Neural Networks (RNNs)
- Fundamentals
- Applications in Computer Vision
Fundamentals of Speech and Audio Processing
Overview of Speech and Audio Technologies
Deep Learning-based Speech and Audio Technologies