Checking date: 24/04/2024

Course: 2024/2025

Audio processing, Video processing and Computer vision
Bachelor in Data Science and Engineering (Plan: 392 - Estudio: 350)

Coordinating teacher: GONZALEZ DIAZ, IVAN

Department assigned to the subject: Signal and Communications Theory Department

Type: Compulsory
ECTS Credits: 6.0 ECTS


Requirements (Subjects that are assumed to be known)
Neural Networks Signals and Systems Machine Learning I and II
Students must achieve the following objectives: 1) Know image, audio, sppech and video signals, as well as their main parameters and the digitization process. 2) Know the most important techniques of image, video and audio processing, as well as the main tasks in computer vision and audio. 3) Apply machine learning and deep learning techniques studied in previous subjects to the analysis of audiovisual content. 4) Develop intelligent applications that involve the automatic analysis of audiovisual content.
Skills and learning outcomes
Description of contents: programme
The course is divided into two main blocks absed on the signal modalities: on the one hand, image and video and, on the other, voice and audio. In both cases, signals and their main characteristics are presented first, including certain notions of the visual and auditory systems. Next, the most common techniques for each signal processing are studied, illustrating their use in selected applications. Finally, most modern approaches are introduced, based on the application of deep learning (e.g. CNNs and RNNs), which constitute nowadays the state of the art of technology. The course program is organized as follows: Block 1: Processing of visual signals: image and video ============================================ Topic 1: Introduction to digital video and images Topic 2: Fundamentals of image and video processing Topic 3: Image Representation: low-level descriptors Topic 4: Image Segmentation Topic 5: Convolutional Neural Networks (CNNs) for image classification Topic 6: Other applications of CNNs in visual analysis: object detection, semantic segmentation, image generation, style transfer Block 2: Processing of speech and audio signals ======================================= Topic 7: Fundamentals of digital audio and speech: generation, perception and digitization Topic 8: Time-located analysis for speec and audio signals Topic 9: Low-level speech and audio descriptors Topic 10: Neural Networks for sequential data analysis: temporal CNNs, Recurrent Neural Networks, Trasnformers, applications on audio/speech
Learning activities and methodology
Two teaching activities are proposed: lectures and lab sessions. LECTURES The lecture sessions will be supported by slides or by any other means to illustrate the concepts explained. In these classes the explanation will be completed with examples. In these sessions the student will acquire the basic concepts of the course. It is important to highlight that these classes require the initiative and the personal and group involvement of the students (there will be concepts that the students themselves should develop). LABORATORY SESSIONS This is a course with a high practical component, and students will attend to laboratory sessions very often. In them, the concepts explained during the lectures will be put into practice using the programming language python, and software libraries for image analysis and computer vision (scikit-image, PIL, OpenCV), audio analysis (scikit-sound), machine learning (scikit-learn) and deep learning (pytorch). In the laboratory, machines equipped with high-performance GPUs are available and but students can also use free distributed computing systems such as Google Colab.
Assessment System
  • % end-of-term-examination 40
  • % of continuous assessment (assigments, laboratory, practicals...) 60

Calendar of Continuous assessment

Extraordinary call: regulations
Basic Bibliography
  • Ian Goodfellow, Yoshoua Bengio, and Aaron Courville. Deep Learning. The MIT Press. 2016
  • Ken C. Pohlmann. Principles of Digital Audio (5th Edition). McGraw-Hill/TAB Electronics. 2005
  • Ken C. Pohlmann. Principles of Digital Audio (5th Edition). McGraw-Hill/TAB Electronics. 2005
  • N. Morgan and B. Gold. Speech and Audio Signal Processing: Processing and Perception of Speech and Music. John Wiley & Sons, Inc. New York, NY, USA. 1999
  • Rafael C. González and Richard E. Woods . Digital Image Processing (4th Edition). Pearson. 2018
Additional Bibliography
  • D. O'Shaughnessy. Automatic speech recognition: History, methods and challenges. Pattern Recognition, 41 (10) pp. 2965-2979. 2008
  • David A. Forsyth and Jean Ponce. Computer Vision: A Modern Approach (2nd Edition). Pearson . 2012
  • S. Huang, A. Acero, H.W. Hon. Spoken Language Processing: A Guide to Theory, Algorithms and System Development. Prentice Hall. 2001
  • Wilhelm Burger and Mark J. Burge. Principles of Digital Image Processing: Fundamental Techniques. Springer-Verlag. 2009
  • Wilhelm Burger and Mark J. Burge. Principles of Digital Image Processing: Core Techniques. Springer-Verlag. 2009

The course syllabus may change due academic events or other reasons.