Checking date: 27/04/2024


Course: 2024/2025

Natural Language Processing
(19291)
Master in Machine Learning for Health (Plan: 480 - Estudio: 359)
EPI


Coordinating teacher: ARENAS GARCIA, JERONIMO

Department assigned to the subject: Signal and Communications Theory Department

Type: Electives
ECTS Credits: 3.0 ECTS

Course:
Semester:




Requirements (Subjects that are assumed to be known)
* It is recommended to have passed the Machine Learning subject * The Deep Learning subject also provides competences of interest, although it is not essential to have taken it. During the first sessions of the course, the necessary concepts for neural-based word and document embeddings will be reviewed.
Objectives
* Familiarize students with some commonly used methods for natural language processing, both for preprocessing unstructured text, and for building models based on machine learning * Know various approaches for calculating semantic similarity between documents and their use to build and analyze semantic graphs * Presentation of some tools for the interactive visualization of machine learning models and natural language processing based on graphs and interactive dashboards * Familiarize students with some relevant applications of natural language processing * Encourage maturity in the knowledge of these technologies, and the autonomy to deepen the concepts explained in class, by working on a final group project
Skills and learning outcomes
Description of contents: programme
1. Natural Language Processing Introduction 2. Word and document vector representation 2.1. Text homogeneization and cleaning 2.2. Spacy and Spark NLP 2.3. One-hot encoding 2.4. Word Embeddings. Word2Vec. GloVe 2.5. Other Embedding representations 3. Transformers 3.1. Introduction to Transformers. Hugging Face 3.2. Text Classification: Sentiment Analysis 3.3. Other applications * Zero-shot classification * Text Generation * Neural Machine Translation * Question & Answering 4. Topic Modeling 4.1. Latent Dirichlet Allocation 4.2. Neural Topic Modeling 5. Semantic graph Analysis 5.1. Semantic Similarity Metrics 5.2. Semantic Graphs 5.3. Graph Analysis 5.4. Graph Visualization 5.5. Semantic Information Retrieval
Learning activities and methodology
The following learning activities and methodologies are employed: - Combined master and lab clases (AF3 & AF4 / MD1, MD2 & MD3): Master classes provide an overview of the main theoretical & mathematical concepts of natural language processing along with the analytic tools. In these classes, lab examples will be introduced as part of the theoretical expositions: all the formative sessions (lab availability provided) will take place in the lab to imbricate practical examples within the explanations to add dynamism to the class. This is also beneficial to solve different background issues. - Final Project (AF6 / MD5): Students will work on a project in which they will program a complete modular system of one of the tools explained in class. The students will be provided with some guidelines and some preparatory sessions by using problem-based learning. Teachers are available during 2 hours per week for office hours.
Assessment System
  • % end-of-term-examination 0
  • % of continuous assessment (assigments, laboratory, practicals...) 100

Calendar of Continuous assessment


Basic Bibliography
  • Aston Zhang, Zachary C. Lipton, Mu Li, Alexander J. Smola. Dive into Deep Learning. https://d2l.ai. 2020
  • Christopher D. Manning, Hinrich Schütze. Foundations of Statistical Natural Language Processing. MIT Press. 1999
  • Dan Jurafsky and James H. Martin. Speech and Language Processing. Prentice Hall. 2018
  • Denis Rothman. Transformers for Natural Language Processing: Build, train, and fine-tune deep neural network architectures for NLP with P... Transformers for Natural Language Processing. Packt>. 2022 (2nd Edition)
  • Li Deng (Editor), Yang Liu (Editor). Deep Learning in Natural Language Processing. Springer. 2018

The course syllabus may change due academic events or other reasons.