Checking date: 21/04/2025 16:37:08


Course: 2025/2026

Predictive Modeling
(16494)
Bachelor in Data Science and Engineering (Plan: 566 - Estudio: 350)


Coordinating teacher: GARCIA PORTUGUES, EDUARDO

Department assigned to the subject: Statistics Department

Type: Compulsory
ECTS Credits: 6.0 ECTS

Course:
Semester:




Requirements (Subjects that are assumed to be known)
Calculus I and II Linear Algebra Programming Probability and Data Analysis Introduction to Statistical Modeling Statistical Learning
Objectives
* General competences   - CG1: Adequate knowledge and skills to analyse and synthesise basic problems related to engineering and data science, solve them and communicate them efficiently.   - CG4: Ability to solve technological, computational, mathematical and statistical problems that may arise in engineering and data science.   - CG5: Ability to solve mathematically formulated problems applied to different subjects, using numerical algorithms and computational techniques.   - CG6: Synthesise the conclusions obtained from the analyses carried out and present them clearly and convincingly, both written and orally. * Transversal competences   - CT1: Ability to communicate knowledge orally and in writing, before a specialised and non-specialised public. * Specific competences   - CE1: Ability to solve mathematical problems that may arise in engineering and data science. Ability to apply knowledge about: algebra; geometry; differential and integral calculation; numerical methods; numerical algorithm; statistics and optimisation.   - CE2: Properly identify problems of a predictive nature corresponding to certain objectives and data and use the basic results of regression analysis as the basic basis of prediction methods.   - CE5: Understand and handle fundamental concepts of probability and statistics and be able to represent and manipulate data to extract meaningful information from them.   - CE7: Understand the basic concepts of programming and ability to carry out programs aimed at data analysis.
Learning Outcomes
K3: To know fundamental contents in their area of study starting from the basis of general secondary education and reaching a level proper of advanced textbooks, including also some aspects of the forefront of their field of study. K4: Knowledge of basic scientific and technical subjects that qualify for the learning of new methods and technologies, as well as providing a great versatility to adapt to new situations, in the field of data storage, management and processing. K5: Ability to understand and relate fundamental concepts of probability and statistics and be able to represent and manipulate data to extract meaningful information from them K6: Acquire the fundamentals of Bayesian Statistics and learn the different techniques of intensive computing to implement Bayesian inference and prediction, applying them to data analysis, uncertainty modeling, and decision-making in real-world problems in Data Science and Engineering. S1: To plan and organize team work making the right decisions based on available information and gathering data in digital environments. S3: Ability to solve technological, computer, mathematical and statistical problems that may arise in data engineering and science, applying knowledge of mathematics, probability and statistics, programming, databases, and languages, grammars and automata. S4: Ability to solve mathematically formulated problems applied to various subjects, using numerical algorithms and computational techniques, and applying knowledge of: algebra; geometry; differential and integral calculus; numerical methods; numerical algorithms; statistics and optimization S16: Ability to synthesize the conclusions obtained from the analyses carried out and present them clearly and convincingly both in writing and orally to both specialized and non-specialized audiences C5: Be able to analyze and synthesize basic problems related to engineering and data science, elaborate, defend and efficiently communicate solutions individually and professionally, applying the knowledge, skills, tools and strategies acquired or developed in their area of study.
Description of contents: programme
This course is designed to give a panoramic view of several tools available for predictive modeling, at an introductory-intermediate level. This view covers in-depth the main concepts in linear models and gives an overview on their extensions. The focus is placed on providing the main insights on the statistical/mathematical foundations of the models and on showing the effective implementation of the methods through the use of the statistical software R. 1. Introduction   1.1. Course overview   1.2. Review on probability   1.3. Random vectors and conditional expectation   1.4. Review on statistical inference   1.5. What is predictive modeling? 2. Simple linear regression   2.1. Model formulation and estimation   2.2. Assumptions of the model   2.3. Inference for model parameters   2.4. Prediction   2.5. ANOVA and model fit 3. Multiple linear regression   3.1. Model formulation and estimation   3.2. Assumptions of the models   3.3. Inference for model parameters   3.4. ANOVA and model fit   3.5. Model selection   3.6. Handling nonlinear relationships   3.7. Use of qualitative predictors   3.8. Model diagnostics and multicollinearity 4. Linear regression extensions   4.1. Review on principal component analysis   4.2. Principal components regression   4.3. Partial least squares regression   4.4. Regularized linear models   4.5. Ridge and lasso regression 5. Logistic regression   5.1. Model formulation and interpretation   5.2. Maximum likelihood estimation   5.3. Inference for model parameters   5.4. Model selection and multicollinearity   5.5. Regularized logistic models The program is subject to minor modifications due to the course development and/or academic calendar.
Learning activities and methodology
The lessons primarily consist of theoretical expositions on the statistical methods of the course. These are complemented with illustrative examples. The laboratories are designed to carry out exercises and case studies that elaborate on the practical usage of the seen methods. The implementation of the methods is done with the statistical language R.
Assessment System
  • % end-of-term-examination/test 60
  • % of continuous assessment (assigments, laboratory, practicals...) 40

Calendar of Continuous assessment


Extraordinary call: regulations
Basic Bibliography
  • James, G., Witten, D., Hastie, T., and Tibshirani, R.. An Introduction to Statistical Learning. Springer-Verlag. 2013
  • Papoulis, A. and Pillai, S. U.. Probability, Random Variables, and Stochastic Processes. McGraw-Hill. 2002
Recursos electrónicosElectronic Resources *
Additional Bibliography
  • Hastie, T., Tibshirani, R., and Friedman, J.. The Elements of Statistical Learning. Springer. 2013
  • Kuhn, M. and Johnson, K.. Applied Predictive Modeling. Springer. 2013
  • Panaretos, V. M.. Statistics for Mathematicians. Springer. 2016
  • Peña, D.. Regresión y Diseño de Experimentos. Alianza Editorial. 2002
  • Seber, G. A. F.. Linear Regression Analysis. John Wiley & Sons. 1977
  • Wood, S. N.. Generalized Additive Models: An Introduction with R. Chapman & Hall/CRC. 2006
(*) Access to some electronic resources may be restricted to members of the university community and require validation through Campus Global. If you try to connect from outside of the University you will need to set up a VPN


The course syllabus may change due academic events or other reasons.


More information: https://www.uc3m.es/ss/Satellite/Grado/en/Detalle/Estudio_C/1371241688824/1371212987094/Bachelor_s_Degree_in_Data_Science_and_Engineering