Checking date: 15/07/2023


Course: 2023/2024

Data Harvesting
(19145)
Master in Computational Social Science (Plan: 472 - Estudio: 375)
EPC


Coordinating teacher: GENOVA FUSTER, GONZALO

Department assigned to the subject: Computer Science and Engineering Department

Type: Compulsory
ECTS Credits: 3.0 ECTS

Course:
Semester:




Requirements (Subjects that are assumed to be known)
Data Programming (19138)
Objectives
- Knowledge of the general principles of API design and operation, as well as the most common information exchange formats. - Ability to identify and access online APIs to download social observational data. - Ability to compile structured databases from unstructured sources.
Skills and learning outcomes
Description of contents: programme
1. An introduction to Web Scraping - What is Web Scraping? - Types of Web Scraping - Data formats: XML and HTML - Practical access to XML and HTML - Automation for Web Scraping programs - Selenium and JavaScript based scraping - Ethical issues with Web Scraping - Practical exercises 2. Data APIs - What is an API - Fundamentals of API communication - An introduction to the JSON format - Create your own API (and share it) - REST architecture - APIs as a way to share and obtain data (any kind) - Automation of API requests - Talking with Databases - Authentication and ethical access to APIs - Practical exercises 3. Automation of Data Acquisition - Why do we need automation? - Accessing servers - Technologies for automating programs - Automating cron jobs - Logging tasks - Practical exercises
Learning activities and methodology
Training Activities: - Theoretical-practical classes - Tutorials - Group work - Individual student work - Partial and final examinations Teaching Methods: - Presentations in the professor's lecture room with computer and audiovisual support, in which the main concepts of the subject are developed and a bibliography is provided to complement the students' learning. - Critical reading of texts recommended by the subject professor: Press articles, reports, manuals and/or academic articles, either for later discussion in class, or to expand and consolidate knowledge of the subject. - Resolution of practical cases, problems, etc. raised by the professor, either individually or in a group. - Presentation and discussion in class, under the moderation of the professor, of topics related to the content of the subject, as well as practical case studies. - Developing pieces of work and reports, individually or in group.
Assessment System
  • % end-of-term-examination 20
  • % of continuous assessment (assigments, laboratory, practicals...) 80

Basic Bibliography
  • Barberá, P. & Steinert-Threlkeld, Z. . How to use social media data for political science research. In The SAGE handbook of research methods in political science and international relations (Vol. 2, pp. 404-423). . SAGE Publications Ltd, https://dx.doi.org/10.4135/9781526486387. 2020
  • Freelon, D.. Computational research in the post-API age. . Political Communication, 35(4), 665-668.. 2018
  • Nyhuis, D. . Web data collection: potentials and challenges. In: The SAGE handbook of research methods in political science and international relations (Vol. 2, pp. 387-403). . SAGE Publications Ltd, https://dx.doi.org/10.4135/9781526486387. 2020
  • Perriam, J., Birkbak, A., & Freeman, A. . Digital methods in a post-API environment. . International Journal of Social Research Methodology, 23(3), 277-290.. 2020
Recursos electrónicosElectronic Resources *
Additional Bibliography
  • Aydin, O. . R Web Scraping Quick Start Guide: Techniques and tools to crawl and scrape data from websites.. -. 2018
  • Munzert, S., Rubba, C., Meißner, P., & Nyhuis, D. . Automated data collection with R: A practical guide to web scraping and text mining. . John Wiley & Sons.. 2014
(*) Access to some electronic resources may be restricted to members of the university community and require validation through Campus Global. If you try to connect from outside of the University you will need to set up a VPN


The course syllabus may change due academic events or other reasons.