Checking date: 24/04/2025 20:10:02


Course: 2025/2026

Digital Processing of Language
(20331)
Bachelor in Digital Humanities (Plan: 559 - Estudio: 213)


Coordinating teacher: SERRANO MARIN, MARINA

Department assigned to the subject: Humanities: Philosophy, Language, Literature Theory Department

Type: Compulsory
ECTS Credits: 6.0 ECTS

Course:
Semester:




Requirements (Subjects that are assumed to be known)
The course is taught in Spanish, so an adequate level of knowledge of this language is required, as well as fluency and correctness in oral and written expression.
Objectives
1. Understand the theoretical foundations of digital language analysis, including linguistic models and computational techniques. 2. Learn to apply tools and methodologies for the processing and analysis of linguistic data in different digital formats. 3. Develop skills to design and execute digital language processing projects, such as corpus analysis, information extraction and language modeling. 4. Apply artificial intelligence to different levels of language analysis. 5. To foster the ability to critically evaluate techniques and results in the field of natural language processing. 6. Promote the integration of interdisciplinary knowledge, combining linguistics, computer science and digital humanities.
Learning Outcomes
K7: To know the methodologies, tools and techniques of analysis, quantitative and qualitative, applicable to the Digital Humanities S4: Effectively manage databases and other digital resources for research, analysis, and dissemination of content in the field of Humanities S5: Use tools based on extensive language models or generative AI to work effectively with large volumes of text S8: To optimize computer programs and applications for data processing to the different uses and objectives common in the Digital Humanities S9: Update their own knowledge in the field of Digital Humanities in accordance with the latest conceptions and technologies available S11: To present the results of their study or work in a clear and precise way, using the appropriate visualization methods, in the field of Digital Humanities C2: Locate, filter and interpret data in the different fields of Digital Humanities, in order to make judgments, make decisions and apply solutions in specific contexts and situations C3: Effectively manage written and oral communication, both in the traditional academic field and on digital platforms, networks and electronic media
Description of contents: programme
1. Introduction to Natural Language Processing (NLP) 1.1. Natural Language Processing (NLP) Models 1.2. Applications of natural language processing (NLP) 2. Text processing 2.1. Tokenization 2.2. Lemmatization 3. Semantic Web and opinion mining 3.1. Introduction to sentiment, affect and connotation analysis 3.2. SEO (Search Engine Optimization) and LSI (Latent Semantic Indexing) 4. Information retrieval and data mining 4.1. Introduction to information extraction and text classification 4.2. Models for information extraction and text classification 5. Natural language generation (NLG) 5.1. Models and approaches to natural language generation (NLG) 5.2. Existing tools and corpora
Learning activities and methodology
During the lectures, the main theoretical issues of the course will be reviewed. As for the practical classes, the following types of activities will be carried out: - Exercises on digital language processing from free software. - Group work on the different theoretical aspects of the course and presentation of such work. - Individual exercises on the application of artificial intelligence to linguistic problems. Outside the classroom, in addition to the tutorials, the student is expected to carry out the readings recommended in the theoretical classes and to prepare the group work and individual exercises to be completed and presented in the practical classes.
Assessment System
  • % end-of-term-examination/test 50
  • % of continuous assessment (assigments, laboratory, practicals...) 50

Calendar of Continuous assessment


Extraordinary call: regulations
Basic Bibliography
  • ANA GARCÍA SERRANO, ANTONIO MENTA GARUZ. La inteligencia artificial en las Humanidades Digitales: dos experiencias con corpus digitales. Revista de Humanidades Digitales (UNED). 2022
  • BING LIU. Sentiment analysis: Mining opinions, sentiments and emotions. Cambridge University Press. 2020
  • GIOVANNI PARODI, PASCUAL CANTOS, CHAD HOWE (eds.). Lingüística de corpus en español. Routledge. 2022
  • LADY MARIUXI SANGACHA-TAPIA, RICARDO JAVIER CELI, IVÁN LEONEL ACOSTA-GUZMÁN, ELEANOR ALEXANDRA VARELA-TAPIA. Inteligencia Artificial Aplicada a Procesamiento de Lenguaje Natural (NLP) con Python y Machine Learning. AEA. 2024
  • PANG-NING TAN, MICHAEL STEINBACH, VIPIN KUMAR, ANUJ KARPATNE. Introduction to data mining. Pearson. 2019
Recursos electrónicosElectronic Resources *
Additional Bibliography
  • ANDRÉS TORRES RIVERA, ROSA ESTOPÀ, JUAN MANUEL TORRES MORENO. Detección de neologismos semánticos: una aproximación estadística y de aprendizaje automático que combina corpus generales y especializados. Ediciones de la Universidad de Murcia. 2020
  • JAVIER SÁNCHEZ JIMÉNEZ. Towards Automated Complexity Grading : A Python-based Natural Language Processing Application for Textual Analysis of Japanese. Bellaterra: Universitat Autònoma de Barcelona. 2024
  • JOHN ROBERTO RODRÍGUEZ, MARIA SALAMÓ LLORENTE, MARIA ANTÒNIA MARTÍ ANTONÍN. Clasificación automática del registro lingüístico en textos del español: un análisis contrastivo. Linguamática (Universidade do Minho & Universidade de Vigo). 2013
  • JOSÉ LUIS PERMBERTY TAMAYO, JORGE MAURICIO MOLINA MEJÍA, VÍCTOR JULIÁN VALLEJO ZAPATA. UnderRL Tagger: un etiquetador gramatical para lenguas infrasoportadas tecnológicamente y lenguas minoritarias. Forma y Función (Universidad Nacional de Colombia). 2023
  • MYKOLA SALYUK KULINICH. Búsqueda de respuestas utilizando redes neuronales. Universitat Politècnica de València. 2022
  • NEREA FRANCÉS PÉREZ. Métodos de aprendizaje automático y modelos de lenguaje masivos para la detección de estereotipos en comentarios de texto en español. Universitat Politècnica de València. 2024
  • PABLO RAMIRES HERNÁNDEZ, DAVID VALLE CRUZ. Los asistentes virtuales basados en Inteligencia Artificial. RECIBE (Universidad de Guadalajara). 2022
  • TANJIA SCHULTZ, KATRIN KIRCHHOFF. Multilingual Speech Processing. Chantilly: Elsevier Science & Technology. 2006
Recursos electrónicosElectronic Resources *
(*) Access to some electronic resources may be restricted to members of the university community and require validation through Campus Global. If you try to connect from outside of the University you will need to set up a VPN


The course syllabus may change due academic events or other reasons.