Checking date: 20/05/2022


Course: 2022/2023

Machine learning applications
(16503)
Bachelor in Data Science and Engineering (Plan: 392 - Estudio: 350)


Coordinating teacher: ARENAS GARCIA, JERONIMO

Department assigned to the subject: Signal and Communications Theory Department

Type: Compulsory
ECTS Credits: 6.0 ECTS

Course:
Semester:




Requirements (Subjects that are assumed to be known)
It is recommended to have completed the subjects about mathematical foundations from the first year (Calculus I and II, Linear Algebra, Probability and Data Analysis), the subjects related to programming and algorithms (Programming and Data Structures and Algorithms), as well as subject Statistical Learning. It is also advised that the students have already taken the Machine Learning (I and II) courses.
Objectives
- Design a data model suitable for an analysis task. - Correctly and efficiently choose and use one or more data analysis methods including statistical or algorithmic techniques. - Evaluate the results of the analysis and propose modifications to the analysis process. - Know how to design and apply unsupervised inference methods for models with latent variables. - Know how to design and apply data adaptation and curation techniques. - Know how to design and apply natural language processing methods. - Know how to design and apply recommendation systems.
Skills and learning outcomes
Description of contents: programme
This course is divided into 3 thematic blocks. The first concerns the problem of adapting and cleaning a database, a critical preprocessing step that is addressed prior to any machine learning application. The next two blocks address two industry-relevant applications where machine learning techniques have achieved a great success. The understanding of how the different machine learning techniques have to be adapted to solve specific problems of interest to industry and society will provide students with a practical and general vision of applied Machine Learning. The course ends with a final block where two visualization tools will be presented to the students, that will use them for the final project assignment. PART I: TECHNIQUES DATA CURATION AND CLEANING 1. Problem Introduction. Data representation and visualization. 2. Organization and integration of databases from different sources. 3. Feature extraction and selection. Multivariate Analysis and Mutual Information Methods. 4. Data cleaning: data characterization, detection and imputation of corrupt data. Outlier detection. PART II: NATURAL LANGUAGE PROCESSING 5. Text processing pipelines. Vector representation of texts. 6. Topic Modeling: Latent Semantic Indexing, Latent Dirichlet Allocation. 7. Text Vector representation and models for automatic translation using neural networks. PART III: RECOMMENDATION SYSTEMS 8. Content-based recommendation systems. 9. Collaborative filtering recommendation systems. ALS and Prod2Vec. BONUS TRACK: ADVANCED DATA VISUALIZATION TOOLS - Visualization of Graph Data with Gephi - Business Intelligence Tools
Learning activities and methodology
AF1: THEORETICAL-PRACTICAL CLASSES. They will present the knowledge that students should acquire. They will receive the class notes and will have basic texts of reference to facilitate the follow-up of the classes and the development of the subsequent work. Exercises, practical problems on the part of the student will be solved and workshops and evaluation test will be held to acquire the necessary skills. AF2: Updated to allegation AF3: INDIVIDUAL OR GROUP WORK OF THE STUDENT. AF9: FINAL EXAM. In which the knowledge, skills and abilities acquired throughout the course will be assessed globally. MD1: CLASS THEORY. Exhibitions in the teacher's class with support of computer and audiovisual media, in which the main concepts of the subject are developed and the materials and bibliography are provided to complement the students' learning. MD2: PRACTICES. Resolution of practical cases, problems, etc. raised by the teacher individually or in groups. MD3: TUTORIALS. Individualized assistance (individual tutorials) or group (collective tutorials) to students by the teacher.
Assessment System
  • % end-of-term-examination 30
  • % of continuous assessment (assigments, laboratory, practicals...) 70
Calendar of Continuous assessment
Basic Bibliography
  • . Data Visualization with Python for Beginners: Visualize Your Data using Pandas, Matplotlib and Seaborn. AI Publishing LLC. 2020
  • C.C. Aggarwal. Recommender Systems: The Textbook. Springer. 2016
  • D. Juravsky, J.H. Martin. Speech and Language Processing. Prentice Hall; 2nd edition. 2008
  • J. Eisenstein. Introduction to Natural Language Processing. MIT Press. 2019
  • J. Ham, M. Kamber. Data Mining: Concepts and Techniques (3rd. ed). Morgan Kaupfman. 2011
  • S. Bird, E. Klein, E. Loper. Natural Language Processing with Python. O'Reilly Media. 2009
Recursos electrónicosElectronic Resources *
Additional Bibliography
  • C. Manning, H. Schütze. Foundations of Statistical Natural Language Processing. MIT Press. 1999
  • K. Murphy. Machine Learning: A probabilistic Perspective. The MIT Press. 2012
  • M. W. Berry. Survey of Text Mining Clustering, Classification, and Retrieval. Springer. 2004
(*) Access to some electronic resources may be restricted to members of the university community and require validation through Campus Global. If you try to connect from outside of the University you will need to set up a VPN


The course syllabus may change due academic events or other reasons.