Checking date: 19/05/2022

Course: 2022/2023

Predictive Modeling
Study: Bachelor in Data Science and Engineering (350)

Coordinating teacher: GARCIA PORTUGUES, EDUARDO

Department assigned to the subject: Department of Statistics

Type: Compulsory
ECTS Credits: 6.0 ECTS


Requirements (Subjects that are assumed to be known)
Calculus I and II Linear Algebra Programming Probability and Data Analysis Introduction to Statistical Modeling Statistical Learning
Skills and learning outcomes
Description of contents: programme
This course is designed to give a panoramic view of several tools available for predictive modeling, at an introductory-intermediate level. This view covers in-depth the main concepts in linear models and gives an overview on their extensions. The focus is placed on providing the main insights on the statistical/mathematical foundations of the models and on showing the effective implementation of the methods through the use of the statistical software R. 1. Introduction   1.1. Course overview   1.2. Review on probability   1.3. Random vectors   1.4. Review on statistical inference   1.5. What is predictive modeling? 2. Simple linear regression   2.1. Model formulation and estimation   2.2. Assumptions of the model   2.3. Inference for model parameters   2.4. Prediction   2.5. ANOVA and model fit 3. Multiple linear regression   3.1. Model formulation and estimation   3.2. Assumptions of the models   3.3. Inference for model parameters   3.4. ANOVA and model fit   3.5. Model selection   3.6. Handling nonlinear relationships   3.7. Use of qualitative predictors   3.8. Model diagnostics and multicollinearity 4. Linear regression extensions   4.1. Review on principal component analysis   4.2. Principal components regression   4.3. Partial least squares regression   4.4. Regularized linear models   4.5. Ridge and lasso regression 5. Logistic regression   5.1. Model formulation and interpretation   5.2. Maximum likelihood estimation   5.3. Inference for model parameters   5.4. Model selection and multicollinearity   5.5. Regularized logistic models The program is subject to minor modifications due to the course development and/or academic calendar.
Learning activities and methodology
The lessons primarily consist of theoretical expositions on the statistical methods of the course. These are complemented with illustrative examples. The laboratories are designed to carry out exercises and case studies that elaborate on the practical usage of the seen methods. The implementation of the methods is done with the statistical language R. * Training activities THEORETICAL-PRACTICAL CLASSES. Knowledge and concepts students must acquire. Students receive course notes and will have basic reference texts to facilitate following the classes and carrying out follow-up work. Students partake in exercises to resolve practical problems and participate in workshops. (Subjects with 6 ECTS are 44 hours as a general rule/ 100% classroom instruction (excepting those subjects which do not have an exam and are 48 hours). TUTORING SESSIONS. Individualized attendance (individual tutoring) or in-group (group tutoring) for students with a teacher. Subjects with 6 credits have 4 hours of tutoring/ 100% onsite attendance. STUDENT INDIVIDUAL WORK OR GROUP WORK. Subjects with 6 credits have 98 hours/0% on-site. WORKSHOPS AND LABORATORY SESSIONS. Subjects with 3 credits have 3 hours with 100% on-site instruction. Subjects with 6 credits have 6 hours/100% on-site instruction. FINAL EXAM. Global assessment of knowledge, skills, and capacities acquired throughout the course. It entails 4 hours/100% on-site. * Teaching methodology THEORY CLASS. Classroom presentations by the teacher with IT and audiovisual support in which the subject`s main concepts are developed while providing material and bibliography to complement student learning. PRACTICAL CLASS. Resolution of practical cases and problems, posed by the teacher, and carried out individually or in a group. TUTORING SESSIONS. Individualized attendance (individual tutoring sessions) or in-group (group tutoring sessions) for students with the teacher as tutor. Subjects with 6 credits have 4 hours of tutoring/100% on-site. LABORATORIES. Applied/experimental learning/teaching in workshops and laboratories under the Tutor's supervision.
Assessment System
  • % end-of-term-examination 60
  • % of continuous assessment (assigments, laboratory, practicals...) 40
Calendar of Continuous assessment
Basic Bibliography
  • James, G., Witten, D., Hastie, T., and Tibshirani, R.. An Introduction to Statistical Learning. Springer-Verlag. 2013
  • Papoulis, A. and Pillai, S. U.. Probability, Random Variables, and Stochastic Processes. McGraw-Hill. 2002
Additional Bibliography
  • Hastie, T., Tibshirani, R., and Friedman, J.. The Elements of Statistical Learning. Springer. 2013
  • Kuhn, M. and Johnson, K.. Applied Predictive Modeling. Springer. 2013
  • Panaretos, V. M.. Statistics for Mathematicians. Springer. 2016
  • Peña, D.. Regresión y Diseño de Experimentos. Alianza Editorial. 2002
  • Seber, G. A. F.. Linear Regression Analysis. John Wiley & Sons. 1977
  • Wood, S. N.. Generalized Additive Models: An Introduction with R. Chapman & Hall/CRC. 2006

The course syllabus may change due academic events or other reasons.

More information: