Checking date: 18/05/2022

Course: 2022/2023

Text Mining
Study: M. Computational Social Science (375)

Coordinating teacher: UCAR MARQUES, IÑAKI

Department assigned to the subject: Statistics Department

Type: Compulsory
ECTS Credits: 3.0 ECTS


Requirements (Subjects that are assumed to be known)
Data Programming (19138)
Core Competences: - Having and understanding the knowledge that provides a basis or opportunity to be original in the development and/or application of ideas, often in a research context. - Students know how to apply their acquired knowledge and problem-solving skills in new or unfamiliar settings within broader (or multidisciplinary) contexts related to their field of study. - Students are able to integrate knowledge and to face the complexity of making judgments based on information that, being incomplete or limited, includes reflections on the social and ethical responsibilities linked to the application of their knowledge and judgments. - Students know how to communicate their conclusions and the knowledge and ultimate reasons behind them to specialised and non-specialised audiences in a clear and unambiguous way. - Students have the learning skills that will enable them to continue studying in a way that will be largely self-directed or autonomous. General Competences: - Ability to understand and analyze the main global social theories and how they are changing with the application of computational tools. - Ability to identify, define and formulate social science problems and solve them using computational techniques. This includes the ability to assess all the factors involved, not only technical but also legal. - Ability to compile and analyze existing knowledge in the different areas of computational social sciences, and to propose possible solutions to the problems raised. - Ability to apply theoretical and methodological knowledge of computational social sciences to the analysis and resolution of specific cases and empirical problems. - Ability to address issues raised under new or unfamiliar environments, within the context of computational social sciences. Specific Competences: - Ability to use computational tools specific to the computational social sciences at an advanced level. Learning Outcomes: - Knowledge of text mining structures and procedures. - Ability to use basic methods for extracting information from textual data. - Ability to apply processing techniques to prepare documents for statistical modeling. - Ability to evaluate and use basic predictive models of textual information.
Skills and learning outcomes
Description of contents: programme
1. Theoretical introduction to Natural Language Processing 1.1. Brief history of computational linguistics and main developments 1.2. What is Natural Language Processing and its role in Artificial Intelligence 1.3. Structure of a basic NLP pipeline 1.4. Most common tasks and applications in the industry 1.5. Current importance in the digital society, main initiatives 2. Practical introduction to automatic language analysis with R 2.1. Source text import, dataset design and creation of data structures 2.2. Text cleaning, removal of stopwords and symbols, missing values and duplicates 2.3. Splitting and tokenization processes 2.4. Basic analysis: word count, n-gram extraction, frequency tables 2.5. Intermediate analysis: distinctiveness analysis, tf-idf, bag of words 3. Introduction to sentiment analysis 3.1. What is automatic sentiment analysis in a text: opinion, emotion and intention of the speaker 3.2. Real-world cases of sentiment analysis in the industry and limitations 3.3. Practical training on automatic sentiment analysis: use of lexicons and dictionaries, automatic sentiment mapping, segmentation, word clouds 3.4. Creation of sentiment analysis graphs and reports 4. Introduction to topic modeling 4.1. What is topic modeling, main uses in the industry 4.2. Classifying text into categories: supervised and unsupervised methods 4.3. Practical training in topic modelling: word and topic association, natural group identification and characterization, common terms and overlapping 4.4. Creation of topic modeling graphs and reports for identification of representative ideas 5. Language models 5.1. What are pre-trained language models and their impact on NLP and Machine Learning development 5.2. Uses and implications in the industry and current status, main initiatives 5.3. Practical training on the use and evaluation of basic predictive models with text data
Learning activities and methodology
Training Activities: - Theoretical-practical classes - Tutorials - Individual student work - Partial and final examinations Teaching Methods: - Presentations in the professor's lecture room with computer and audiovisual support, in which the main concepts of the subject are developed and a bibliography is provided to complement the students' learning. - Resolution of practical cases, problems, etc. raised by the professor, either individually or in a group. - Presentation and discussion in class, under the moderation of the professor, of topics related to the content of the subject, as well as practical case studies. - Developing pieces of work and reports, individually or in group.
Assessment System
  • % end-of-term-examination 40
  • % of continuous assessment (assigments, laboratory, practicals...) 60
Basic Bibliography
  • Gabe Ignatow and Rada F. Mihalcea. An Introduction to Text Mining: Research Design, Data Collection, and Analysis.. SAGE Publications. 2017
  • Silge, J., & Robinson, D.. Text mining with R: A tidy approach. O'Reilly Media. 2017
Recursos electrónicosElectronic Resources *
Additional Bibliography
  • Dan Jurafsky and James H. Martin. . Speech and Language Processing (3rd ed.). Stanford University. 2021
  • Dan Jurafsky and James H. Martin. . Speech and Language Processing (3rd ed.). PEARSON, Prentice Hall. 2021
  • Kumar, A., & Paul, A.. Mastering text mining with r: Master text-taming techniques and build effective text-processing applications with R. Packt Publishing Limited. 2016
  • Kwartler, T.. Text mining in practice with R. Winley. 2017
  • Marchette, D. J.. Text data mining using R. Chapman & Hall Crc. 2018
  • Ted Kwartler. Text Mining in Practice with R. Wiley. 2017
(*) Access to some electronic resources may be restricted to members of the university community and require validation through Campus Global. If you try to connect from outside of the University you will need to set up a VPN

The course syllabus may change due academic events or other reasons.