Course: 2022/2023

Predictive Modeling

(16494)

Requirements (Subjects that are assumed to be known)

Calculus I and II
Linear Algebra
Programming
Probability and Data Analysis
Introduction to Statistical Modeling
Statistical Learning

* General competences
- CG1: Adequate knowledge and skills to analyse and synthesise basic problems related to engineering and data science, solve them and communicate them efficiently.
- CG4: Ability to solve technological, computational, mathematical and statistical problems that may arise in engineering and data science.
- CG5: Ability to solve mathematically formulated problems applied to different subjects, using numerical algorithms and computational techniques.
- CG6: Synthesise the conclusions obtained from the analyses carried out and present them clearly and convincingly, both written and orally.
* Transversal competences
- CT1: Ability to communicate knowledge orally and in writing, before a specialised and non-specialised public.
* Specific competences
- CE1: Ability to solve mathematical problems that may arise in engineering and data science. Ability to apply knowledge about: algebra; geometry; differential and integral calculation; numerical methods; numerical algorithm; statistics and optimisation.
- CE2: Properly identify problems of a predictive nature corresponding to certain objectives and data and use the basic results of regression analysis as the basic basis of prediction methods.
- CE5: Understand and handle fundamental concepts of probability and statistics and be able to represent and manipulate data to extract meaningful information from them.
- CE7: Understand the basic concepts of programming and ability to carry out programs aimed at data analysis.

Skills and learning outcomes

Description of contents: programme

This course is designed to give a panoramic view of several tools available for predictive modeling, at an introductory-intermediate level. This view covers in-depth the main concepts in linear models and gives an overview on their extensions. The focus is placed on providing the main insights on the statistical/mathematical foundations of the models and on showing the effective implementation of the methods through the use of the statistical software R.
1. Introduction
1.1. Course overview
1.2. Review on probability
1.3. Random vectors
1.4. Review on statistical inference
1.5. What is predictive modeling?
2. Simple linear regression
2.1. Model formulation and estimation
2.2. Assumptions of the model
2.3. Inference for model parameters
2.4. Prediction
2.5. ANOVA and model fit
3. Multiple linear regression
3.1. Model formulation and estimation
3.2. Assumptions of the models
3.3. Inference for model parameters
3.4. ANOVA and model fit
3.5. Model selection
3.6. Handling nonlinear relationships
3.7. Use of qualitative predictors
3.8. Model diagnostics and multicollinearity
4. Linear regression extensions
4.1. Review on principal component analysis
4.2. Principal components regression
4.3. Partial least squares regression
4.4. Regularized linear models
4.5. Ridge and lasso regression
5. Logistic regression
5.1. Model formulation and interpretation
5.2. Maximum likelihood estimation
5.3. Inference for model parameters
5.4. Model selection and multicollinearity
5.5. Regularized logistic models
The program is subject to minor modifications due to the course development and/or academic calendar.

Learning activities and methodology

The lessons primarily consist of theoretical expositions on the statistical methods of the course. These are complemented with illustrative examples. The laboratories are designed to carry out exercises and case studies that elaborate on the practical usage of the seen methods. The implementation of the methods is done with the statistical language R.

Assessment System

- % end-of-term-examination 60
- % of continuous assessment (assigments, laboratory, practicals...) 40

Basic Bibliography

- James, G., Witten, D., Hastie, T., and Tibshirani, R.. An Introduction to Statistical Learning. Springer-Verlag. 2013
- Papoulis, A. and Pillai, S. U.. Probability, Random Variables, and Stochastic Processes. McGraw-Hill. 2002

Additional Bibliography

- Hastie, T., Tibshirani, R., and Friedman, J.. The Elements of Statistical Learning. Springer. 2013
- Kuhn, M. and Johnson, K.. Applied Predictive Modeling. Springer. 2013
- Panaretos, V. M.. Statistics for Mathematicians. Springer. 2016
- Peña, D.. Regresión y Diseño de Experimentos. Alianza Editorial. 2002
- Seber, G. A. F.. Linear Regression Analysis. John Wiley & Sons. 1977
- Wood, S. N.. Generalized Additive Models: An Introduction with R. Chapman & Hall/CRC. 2006

The course syllabus may change due academic events or other reasons.

**More information: **https://www.uc3m.es/ss/Satellite/Grado/en/Detalle/Estudio_C/1371241688824/1371212987094/Bachelor_s_Degree_in_Data_Science_and_Engineering