 Checking date: 10/05/2022

Course: 2022/2023

Predictive Modeling
(16494)
Study: Dual Bachelor in Data Science and Engineering and Telecommunication Technologies Engineering (371)

Coordinating teacher: GARCIA PORTUGUES, EDUARDO

Department assigned to the subject: Department of Statistics

Type: Compulsory
ECTS Credits: 6.0 ECTS

Course:
Semester:

Requirements (Subjects that are assumed to be known)
Calculus I and II Linear Algebra Programming Probability and Data Analysis Introduction to Statistical Modeling Statistical Learning
Objectives
* General competences   - CG1: Adequate knowledge and skills to analyse and synthesise basic problems related to engineering and data science, solve them and communicate them efficiently.   - CG4: Ability to solve technological, computational, mathematical and statistical problems that may arise in engineering and data science.   - CG5: Ability to solve mathematically formulated problems applied to different subjects, using numerical algorithms and computational techniques.   - CG6: Synthesise the conclusions obtained from the analyses carried out and present them clearly and convincingly, both written and orally. * Transversal competences   - CT1: Ability to communicate knowledge orally and in writing, before a specialised and non-specialised public. * Specific competences   - CE1: Ability to solve mathematical problems that may arise in engineering and data science. Ability to apply knowledge about: algebra; geometry; differential and integral calculation; numerical methods; numerical algorithm; statistics and optimisation.   - CE2: Properly identify problems of a predictive nature corresponding to certain objectives and data and use the basic results of regression analysis as the basic basis of prediction methods.   - CE5: Understand and handle fundamental concepts of probability and statistics and be able to represent and manipulate data to extract meaningful information from them.   - CE7: Understand the basic concepts of programming and ability to carry out programs aimed at data analysis.
Skills and learning outcomes
Description of contents: programme
This course is designed to give a panoramic view of several tools available for predictive modeling, at an introductory-intermediate level. This view covers in-depth the main concepts in linear models and gives an overview on their extensions. The focus is placed on providing the main insights on the statistical/mathematical foundations of the models and on showing the effective implementation of the methods through the use of the statistical software R. 1. Introduction   1.1. Course overview   1.2. Review on probability   1.3. Random vectors   1.4. Review on statistical inference   1.5. What is predictive modeling? 2. Simple linear regression   2.1. Model formulation and estimation   2.2. Assumptions of the model   2.3. Inference for model parameters   2.4. Prediction   2.5. ANOVA and model fit 3. Multiple linear regression   3.1. Model formulation and estimation   3.2. Assumptions of the models   3.3. Inference for model parameters   3.4. ANOVA and model fit   3.5. Model selection   3.6. Handling nonlinear relationships   3.7. Use of qualitative predictors   3.8. Model diagnostics and multicollinearity 4. Linear regression extensions   4.1. Review on principal component analysis   4.2. Principal components regression   4.3. Partial least squares regression   4.4. Regularized linear models   4.5. Ridge and lasso regression 5. Logistic regression   5.1. Model formulation and interpretation   5.2. Maximum likelihood estimation   5.3. Inference for model parameters   5.4. Model selection and multicollinearity   5.5. Regularized logistic models The program is subject to minor modifications due to the course development and/or academic calendar.
Learning activities and methodology
The lessons primarily consist of theoretical expositions on the statistical methods of the course. These are complemented with illustrative examples. The laboratories are designed to carry out exercises and case studies that elaborate on the practical usage of the seen methods. The implementation of the methods is done with the statistical language R.
Assessment System
• % end-of-term-examination 60
• % of continuous assessment (assigments, laboratory, practicals...) 40
Calendar of Continuous assessment
Basic Bibliography
• James, G., Witten, D., Hastie, T., and Tibshirani, R.. An Introduction to Statistical Learning. Springer-Verlag. 2013
• Papoulis, A. and Pillai, S. U.. Probability, Random Variables, and Stochastic Processes. McGraw-Hill. 2002