Ficha

Course: 2023/2024

Audio processing, Video processing and Computer vision

(16508)

Dual Bachelor Data Science and Engineering - Telecommunication Technologies Engineering (Study Plan 2020) (Plan: 456 - Estudio: 371)

Coordinating teacher: GONZALEZ DIAZ, IVAN

Department assigned to the subject: Signal and Communications Theory Department

Type: Compulsory

ECTS Credits: 6.0 ECTS

Course: 5º

Semester: 1º

Requirements (Subjects that are assumed to be known)

Neural Networks

Learning Outcomes

Link to document

Description of contents: programme

The goal of this subject is to provide the student with an introduction to signal processing techniques with application to speech, audio, image and video. To that end, the emphasis is put on lab exercises, so that the student can be assessed according to her work on a mini project. The subject is divided into two main blocks: first, image and video processing and, second, voice and audio processing. In both blocks, the signals and their characteristics are presented first, including certain notions of the visual and auditory systems. Next, the fundamental processing techniques for specific signals are presented, illustrating the use of these techniques in selected applications. Then, the convolutional neuronal networks are introduced, and several applications are described in both areas (image & video and speech & audio). PROGRAMME Fundamentals of Image and Video Processing A first approach to Image Classification Convolutional Neural Networks (CNNs) - Brief Review of Neural Networks (NNs) and Deep Neural Networks (DNNs) - Fundamentals and Building Blocks - Applications in Computer Vision Recurrent Neural Networks (RNNs) - Fundamentals - Applications in Computer Vision Fundamentals of Speech and Audio Processing Overview of Speech and Audio Technologies Deep Learning-based Speech and Audio Technologies

Learning activities and methodology

AF1: THEORETICAL-PRACTICAL CLASSES. They will present the knowledge that students should acquire. They will receive the class notes and will have basic texts of reference to facilitate the follow-up of the classes and the development of the subsequent work. The student will solve exercises and practical problems. Workshops and evaluation tests will be held to acquire the necessary skills. AF3: INDIVIDUAL OR GROUP WORK OF THE STUDENT. AF8: WORKSHOPS AND LABORATORIES. AF9: FINAL EXAM. In which the knowledge, skills and abilities acquired throughout the course will be assessed globally. MD1: CLASS THEORY. Presentations in the teacher's class with support of computer and audiovisual media, in which the main concepts of the subject are developed and the materials and bibliography are provided to complement the students' learning. MD2: PRACTICES. Resolution of practical cases, problems, etc. raised by the teacher individually or in groups. MD3: TUTORIALS. Individualized assistance (individual tutorials) or group (collective tutorials) to students by the teacher. MD6: LABORATORY PRACTICES. Applied / experimental teaching in laboratories under the supervision of a tutor.

Assessment System

% end-of-term-examination/test 60
% of continuous assessment (assigments, laboratory, practicals...) 40

SE1: FINAL EXAM. Knowledge, skills and abilities acquired throughout the course will be assessed globally.
SE2: CONTINUOUS ASSESSMENT. Works, presentations, debates, exercises, practices and work in the workshops throughout the course will be evaluated.

Basic Bibliography

Ken C. Pohlmann. Principles of Digital Audio (5th Edition). McGraw-Hill/TAB Electronics. 2005
N. Morgan and B. Gold. Speech and Audio Signal Processing: Processing and Perception of Speech and Music. John Wiley & Sons, Inc. New York, NY, USA. 1999
Rafael C. González and Richard E. Woods . Digital Image Processing (4th Edition). Pearson. 2018

Additional Bibliography

D. O'Shaughnessy. Automatic speech recognition: History, methods and challenges. Pattern Recognition, 41 (10) pp. 2965-2979. 2008
David A. Forsyth and Jean Ponce. Computer Vision: A Modern Approach (2nd Edition). Pearson . 2012
Ian Goodfellow and Yoshua Bengio and Aaron Courville. Deep Learning. MIT Press. 2016
S. Huang, A. Acero, H.W. Hon. Spoken Language Processing: A Guide to Theory, Algorithms and System Development. Prentice Hall. 2001
Wilhelm Burger and Mark J. Burge. Principles of Digital Image Processing: Fundamental Techniques. Springer-Verlag. 2009
Wilhelm Burger and Mark J. Burge. Principles of Digital Image Processing: Core Techniques. Springer-Verlag. 2009

The course syllabus may change due academic events or other reasons.