Summary
The subject is roughly divided equally between theory and laboratory work, with three content blocks:
1) Cloud: It analyzes the architectures that have led to the cloud and current models. AWS and Google Cloud are used in the lab.
2) Communications and the Cloud: It covers data representation, the use of REST, queues, and other protocols for interaction, as well as data persistence and extraction (dataset construction). AWS, Google Cloud, Spotify, and other real systems are used in the lab.
3) Big Data: It analyzes the needs, the type of hardware required, distributed storage systems, data center sizing, and architecture. HDFS and Hadoop are used in the lab to understand Location Aware and MapReduce principles, Spark for introducing the use of multiple MapReduce rounds, as well as for streaming and high-level programming supported by a cluster.
Cloud Computing Program
- Introduction to distributed computing: Evolution of computing, legacy systems and evolution to distributed systems, distributed systems and Cloud Computing, models of computing distribution, what is cloud computing?, architectures regarding deployment (IaaS, PaaS, SaaS), security and location aspects in the Cloud, challenges and opportunities, use cases. LABS: AWS EC2, Google Cloud Compute Engine, Google App Engine, AWS Lambda.
- Communications and the Cloud: Legacy protocols, current protocols (synchronous, asynchronous, messaging, queues), data representation, data capture and extraction, architectural and network considerations, challenges and opportunities, use cases. LABS: Introduction to REST with Flask (Google Cloud), programmatic data extraction via API (Twitter, Spotify), forced data extraction via Scraping, MQTT and other IoT/M2M protocols.
- Big Data: Big Data system architecture, analysis of processing/storage hardware characteristics, anatomy of a data center, modern distributed storage systems, modern batch processing systems, modern stream processing systems, highly distributed systems, challenges and opportunities, use cases. LABS: HDFS/Hadoop, Big Data Spark batch, Big Data Spark stream, algorithms in Spark, Spark and notebooks (PySpark).