all things distributed: October 2015

week 1

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:

Know what economic advantages are driving the adoption of Cloud Computing
Know and explain how virtualization, cloud computing, and big data interact to create exciting business opportunities
Explain how the web underlies access to cloud services and remove provisioning
Explain how services are created
Explain how services can control other services
Know the principal architectural components of a cloud and their organization
Explain how services and orchestration play a role in each layer of a production cloud: IaaS, PaaS, and SaaS
Compare Iaas, Paas, and SaaS
Know what services that major Cloud companies provide and how they provide them
Intuitively explain how a cloud computer cluster can support a scalable web service

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.

Cloud computing
Distributed services
Cloud services
Web middleware
Software defined architecture
Virtualization
Remote procedure execution
Load balancer
Service object access protocol (SOAP)
Representational state transfer (ReST)
Protocol buffers

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.

What drives Cloud adoption?
Why Clouds make economic sense?
How Clouds support the Big Data revolution?
Who benefits from Clouds?
What is the software architecture upon which Cloud Services are built

Readings & Resources

Cloudonomics: A Rigorous Approach to Cloud Benefit Quantification
OpenStack Training Video: Deploying OpenStack with Fuel
AWS Resources (They are highly recommended to view before taking the AWS quiz)

week 2

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:

Know the motivations for a distributed programming framework other than the traditional style MPI
Realize how COTS (Commercial Off the Shelf) components and clusters impact the design of the new breed of programming frameworks
Learn how to “think” in MapReduce terms, reinforced by presentation of four different algorithms implemented in MapReduce style
Understand how parallelism is achieved in MapReduce, and how combiners allow optimization
Be familiar with the YARN framework, and why it is needed, including HDFS storage system
How different components of YARN work with each other
Know how the Hadoop YARN framework is used in a real world scenario through guest interviews.
Understand How PIG and HIVE allow easier access to data stored in HDFS
Learn the fundamental design principles of some distributed data storage systems, including HDFS and Ceph

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.

MapReduce
Hadoop
PIG
HDFS
Tez

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.

Why do we need distributed programming frameworks for tackling Big Data and Cloud problems?
What are the benefits and drawbacks of Hadoop and MapReduce?
What are the benefits and drawbacks of higher level frameworks such as PIG and Hive?
How should the architecture of a Distributed File System (DFS) look like? What components are involved? How can a DFS operate in face of failing system components?

Readings & Resources

week 3

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:

Know the motivations for distributed stream processing
Learn how to organize a real-time Big Data problem in a "Stream Processing" framework
Understand how parallelism is achieved in stream processing
Be familiar with Storm Framework and why it's needed
Explain how different components of Storm work with one another
Learn about the Storm threading model
Explain how a simple stream processing problem can be coded as a Storm application

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.

Stream Processing
Horizontal Scaling
Storm
Spout
Bolt
Nimbus
Protocol Buffers
Acknolwedgements

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.

Why do we need stream processing frameworks for high-velocity big data problems?
What are the benefits and drawbacks of Storm?
How do components in Storm interact?
What is the architecture of the Storm framework?
How does Storm scale horizontally?

week 4

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:

Know why reaching consistency can be slow in distributed systems
Understand the implications of Eric Brewer’s CAP Theorem on Clouds
Know what components in a cloud are impacted by consistency issues
Learn what solutions and techniques are used in Cloud Computing to address consistency issues
Understand the interaction of availability, network partition, and consistency
Explain informally how the Paxos protocol provides eventual consistency and the limitations of Paxos
Explain the popularity of Zookeeper
Understand the benefits and limitations of a distributed column-oriented data store
Describe the interaction between the HBase Master and the HRegion Servers
Explain how iterative data flow and interactivity motivate SPARK

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.

Consistency
Availability
Partition tolerance
Eventual Consistency
t-Connected Consistency
BASE
ACID
Paxos
Quorum
HBASE
Column-oriented data store
SPARK
Pregel
Hive
Scala
Acyclic data flow
Resilient distributed datasets

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.

Why do we need consistency in distributed computing?
What are the benefits and limitations of Paxos?
What are the advantages and disadvantages of a distributed column store?
What is the architecture of the SPARK framework?

week 5

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:

Know the motivations for machine learning and graph processing
Understand why conventional machine learning and graph processing tools are not sufficient for large data sets
Understand why cloud computing provides a platform for large scale machine learning and graph processing
Learn how to write graph processing algorithms that have adjacency lists too large for the memory of one machine
Learn and explain how to use a basic machine learning tool
Understand how to use a K-Means algorithm for clustering data sets
Describe a classification problem and Naïve Bayes Classifier
Understand concepts of Frequent Pattern Mining
Learn about the Graph Processing and Machine Learning libraries of Apache Spark

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.

Graph Processing
Index-free adjacency
Relations
Ranking
Fusion/Diffusion
Machine Learning
Large training data set
Regression
Clustering
Classification
Recommendation
Anomaly detection
Pattern recognition

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.

How does a graph processing problem get solved by focusing the solution on a vertex?
What algorithms are best described as a graph processing algorithm?
What are the basic machine learning algorithms?
How does a cloud-based machine learning tool help simplify using machine learning to solve big data problems?
What are the benefits of using a machine learning / graph specific framework, compared to writing one's algorithm directly on Hadoop?

all things distributed

Tuesday, October 13, 2015

UIUC Cloud Concept notes #2 Cloud Computing Applications

Goals and Objectives

Key Phrases/Concepts

Guiding Questions

Readings & Resources

Goals and Objectives

Key Phrases/Concepts

Guiding Questions

Readings & Resources

Goals and Objectives

Key Phrases/Concepts

Guiding Questions

Goals and Objectives

Key Phrases/Concepts

Guiding Questions

Goals and Objectives

Key Phrases/Concepts

Guiding Questions