Tuesday, October 13, 2015

UIUC Cloud Concept notes #2 Cloud Computing Applications

week 1

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:
  • Know what economic advantages are driving the adoption of Cloud Computing
  • Know and explain how virtualization, cloud computing, and big data interact to create exciting business opportunities
  • Explain how the web underlies access to cloud services and remove provisioning
  • Explain how services are created
  • Explain how services can control other services
  • Know the principal architectural components of a cloud and their organization
  • Explain how services and orchestration play a role in each layer of a production cloud: IaaS, PaaS, and SaaS
  • Compare Iaas, Paas, and SaaS
  • Know what services that major Cloud companies provide and how they provide them
  • Intuitively explain how a cloud computer cluster can support a scalable web service

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.
  • Cloud computing
  • Distributed services
  • Cloud services
  • Web middleware
  • Software defined architecture
  • Virtualization
  • Remote procedure execution
  • Load balancer
  • Service object access protocol (SOAP)
  • Representational state transfer (ReST)
  • Protocol buffers

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.
  • What drives Cloud adoption?
  • Why Clouds make economic sense?
  • How Clouds support the Big Data revolution?
  • Who benefits from Clouds?
  • What is the software architecture upon which Cloud Services are built

Readings & Resources

week 2

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:
  • Know the motivations for a distributed programming framework other than the traditional style MPI
  • Realize how COTS (Commercial Off the Shelf) components and clusters impact the design of the new breed of programming frameworks
  • Learn how to “think” in MapReduce terms, reinforced by presentation of four different algorithms implemented in MapReduce style
  • Understand how parallelism is achieved in MapReduce, and how combiners allow optimization
  • Be familiar with the YARN framework, and why it is needed, including HDFS storage system
  • How different components of YARN work with each other
  • Know how the Hadoop YARN framework is used in a real world scenario through guest interviews.
  • Understand How PIG and HIVE allow easier access to data stored in HDFS
  • Learn the fundamental design principles of some distributed data storage systems, including HDFS and Ceph

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.
  • MapReduce
  • Hadoop
  • PIG
  • HDFS
  • Tez

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.
  • Why do we need distributed programming frameworks for tackling Big Data and Cloud problems?
  • What are the benefits and drawbacks of Hadoop and MapReduce?
  • What are the benefits and drawbacks of higher level frameworks such as PIG and Hive?
  • How should the architecture of a Distributed File System (DFS) look like? What components are involved? How can a DFS operate in face of failing system components?

Readings & Resources

week 3

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:
  • Know the motivations for distributed stream processing
  • Learn how to organize a real-time Big Data problem in a "Stream Processing" framework
  • Understand how parallelism is achieved in stream processing
  • Be familiar with Storm Framework and why it's needed
  • Explain how different components of Storm work with one another
  • Learn about the Storm threading model
  • Explain how a simple stream processing problem can be coded as a Storm application

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.
  • Stream Processing
  • Horizontal Scaling
  • Storm
  • Spout
  • Bolt
  • Nimbus
  • Protocol Buffers
  • Acknolwedgements

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.
  • Why do we need stream processing frameworks for high-velocity big data problems?
  • What are the benefits and drawbacks of Storm?
  • How do components in Storm interact?
  • What is the architecture of the Storm framework?
  • How does Storm scale horizontally?
week 4

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:
  • Know why reaching consistency can be slow in distributed systems
  • Understand the implications of Eric Brewer’s CAP Theorem on Clouds
  • Know what components in a cloud are impacted by consistency issues
  • Learn what solutions and techniques are used in Cloud Computing to address consistency issues
  • Understand the interaction of availability, network partition, and consistency
  • Explain informally how the Paxos protocol provides eventual consistency and the limitations of Paxos
  • Explain the popularity of Zookeeper
  • Understand the benefits and limitations of a distributed column-oriented data store
  • Describe the interaction between the HBase Master and the HRegion Servers
  • Explain how iterative data flow and interactivity motivate SPARK

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.
  • Consistency
  • Availability
  • Partition tolerance
  • Eventual Consistency
  • t-Connected Consistency
  • BASE
  • ACID
  • Paxos
  • Quorum
  • HBASE
  • Column-oriented data store
  • SPARK
  • Pregel
  • Hive
  • Scala
  • Acyclic data flow
  • Resilient distributed datasets

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.
  • Why do we need consistency in distributed computing?
  • What are the benefits and limitations of Paxos?
  • What are the advantages and disadvantages of a distributed column store?
  • What is the architecture of the SPARK framework?
week 5

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:
  • Know the motivations for machine learning and graph processing
  • Understand why conventional machine learning and graph processing tools are not sufficient for large data sets
  • Understand why cloud computing provides a platform for large scale machine learning and graph processing
  • Learn how to write graph processing algorithms that have adjacency lists too large for the memory of one machine
  • Learn and explain how to use a basic machine learning tool
  • Understand how to use a K-Means algorithm for clustering data sets
  • Describe a classification problem and Naïve Bayes Classifier
  • Understand concepts of Frequent Pattern Mining
  • Learn about the Graph Processing and Machine Learning libraries of Apache Spark

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.
  • Graph Processing
  • Index-free adjacency
  • Relations
  • Ranking
  • Fusion/Diffusion
  • Machine Learning
  • Large training data set
  • Regression
  • Clustering
  • Classification
  • Recommendation
  • Anomaly detection
  • Pattern recognition

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.
  • How does a graph processing problem get solved by focusing the solution on a vertex?
  • What algorithms are best described as a graph processing algorithm?
  • What are the basic machine learning algorithms?
  • How does a cloud-based machine learning tool help simplify using machine learning to solve big data problems?
  • What are the benefits of using a machine learning / graph specific framework, compared to writing one's algorithm directly on Hadoop?