The modern information overload problem caused by the availability of enormous amounts of information through internet makes it necessary to design systems that allow us to find the information we search and filter or personalize the information according to our needs.
The objective of this course is to introduce the basic techniques of voice, audio, image and video processing, with a notable practical orientation. This will be supported by a project-based learning methodology. In particular, the methods necessary to carry out two projects will be presented both in the classroom and in the laboratory. One project in the field of image processing and the other in the field of audio processing, namely:
- Image: face recognition, construction of panoramic images, vehicle detection, etc.
- Audio: clustering systems or classification of audio by genres, classification of electrocardiograms, classification of emotions, etc.
Both projects will be presented on the Kaggle platform as "challenges", so that students can compete with each other.
The course will be closed with an introductory lesson to neural networks and their applications in speech, audio, image and video processing, which will be continued in two optional courses of the second term:
- Deep learning for image analysis
- Natural Language Processing
Program of the course
1. Introduction to audio & visual analytics
2. Audiovisual data: digital representation
3. Digital image and video processing
3.1. Point-to-point operations and filters
3.2. Image segmentation and morphological processing
3.3. Image representation for classification and detection
3.4. Integrating project (e.g. face recognition, construction of panoramic images, vehicle detection, etc.)
4. Voice and audio processing
4.1. Speech production and audio perception
4.2. Localized time analysis.
4.3. How does Shazam work?
4.4. Speech and audio representation for classification, detection and recognition
4.4. Integrating project: (e.g. audio clustering or classification, electrocardiogram classification, emotion classification, etc.)
5. Introduction to Neural Networks for voice, audio, image and video analysis