all things distributed

Thursday, October 11, 2018

System design － educative 笔记

Tiny URL

key实现的方法

encoding
提前把key都生成

Data partition:

Range based
Hash based

Caching, and pruning

------------------

Designing Pastebin, 非常类似tinyUrl

-------------------------

Designing Instagram

Photo storage, object storage

News feed, pulling / pushing model

--------------------

Designing Dropbox

Divide file into chunks, transmit less data during synchronization

local client: long polling to the server to check changes

Updates are frequent, use queue between server and client

only transmitting the diff between files

先传data，再更新metadata

2种dedup的方法

metadata partition的方法

--------------
Designing Facebook Messenger

request/response workflow要看看

每个client都long pulling的话，一共得开多少个socket

保证message的顺序：server端的timestamp ＋ client本地加sequence

用wild column data， hbase。为什么要用这个？

用户的online／offline status的处理，client pull partial users

Push notification when a user is off

------------------
Designing Twitter

Data sharding - 用timestamp和tweetid的组合, 如何解决？

Cache last 3 days of tweets for one user

---------------
Designing Youtube or Netflix

thumbnail 的读写

processing queue for uploading and encoding videos

Tuesday, October 13, 2015

UIUC Cloud Concept notes #2 Cloud Computing Applications

week 1

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:

Know what economic advantages are driving the adoption of Cloud Computing
Know and explain how virtualization, cloud computing, and big data interact to create exciting business opportunities
Explain how the web underlies access to cloud services and remove provisioning
Explain how services are created
Explain how services can control other services
Know the principal architectural components of a cloud and their organization
Explain how services and orchestration play a role in each layer of a production cloud: IaaS, PaaS, and SaaS
Compare Iaas, Paas, and SaaS
Know what services that major Cloud companies provide and how they provide them
Intuitively explain how a cloud computer cluster can support a scalable web service

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.

Cloud computing
Distributed services
Cloud services
Web middleware
Software defined architecture
Virtualization
Remote procedure execution
Load balancer
Service object access protocol (SOAP)
Representational state transfer (ReST)
Protocol buffers

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.

What drives Cloud adoption?
Why Clouds make economic sense?
How Clouds support the Big Data revolution?
Who benefits from Clouds?
What is the software architecture upon which Cloud Services are built

Readings & Resources

Cloudonomics: A Rigorous Approach to Cloud Benefit Quantification
OpenStack Training Video: Deploying OpenStack with Fuel
AWS Resources (They are highly recommended to view before taking the AWS quiz)

week 2

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:

Know the motivations for a distributed programming framework other than the traditional style MPI
Realize how COTS (Commercial Off the Shelf) components and clusters impact the design of the new breed of programming frameworks
Learn how to “think” in MapReduce terms, reinforced by presentation of four different algorithms implemented in MapReduce style
Understand how parallelism is achieved in MapReduce, and how combiners allow optimization
Be familiar with the YARN framework, and why it is needed, including HDFS storage system
How different components of YARN work with each other
Know how the Hadoop YARN framework is used in a real world scenario through guest interviews.
Understand How PIG and HIVE allow easier access to data stored in HDFS
Learn the fundamental design principles of some distributed data storage systems, including HDFS and Ceph

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.

MapReduce
Hadoop
PIG
HDFS
Tez

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.

Why do we need distributed programming frameworks for tackling Big Data and Cloud problems?
What are the benefits and drawbacks of Hadoop and MapReduce?
What are the benefits and drawbacks of higher level frameworks such as PIG and Hive?
How should the architecture of a Distributed File System (DFS) look like? What components are involved? How can a DFS operate in face of failing system components?

Readings & Resources

week 3

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:

Know the motivations for distributed stream processing
Learn how to organize a real-time Big Data problem in a "Stream Processing" framework
Understand how parallelism is achieved in stream processing
Be familiar with Storm Framework and why it's needed
Explain how different components of Storm work with one another
Learn about the Storm threading model
Explain how a simple stream processing problem can be coded as a Storm application

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.

Stream Processing
Horizontal Scaling
Storm
Spout
Bolt
Nimbus
Protocol Buffers
Acknolwedgements

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.

Why do we need stream processing frameworks for high-velocity big data problems?
What are the benefits and drawbacks of Storm?
How do components in Storm interact?
What is the architecture of the Storm framework?
How does Storm scale horizontally?

week 4

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:

Know why reaching consistency can be slow in distributed systems
Understand the implications of Eric Brewer’s CAP Theorem on Clouds
Know what components in a cloud are impacted by consistency issues
Learn what solutions and techniques are used in Cloud Computing to address consistency issues
Understand the interaction of availability, network partition, and consistency
Explain informally how the Paxos protocol provides eventual consistency and the limitations of Paxos
Explain the popularity of Zookeeper
Understand the benefits and limitations of a distributed column-oriented data store
Describe the interaction between the HBase Master and the HRegion Servers
Explain how iterative data flow and interactivity motivate SPARK

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.

Consistency
Availability
Partition tolerance
Eventual Consistency
t-Connected Consistency
BASE
ACID
Paxos
Quorum
HBASE
Column-oriented data store
SPARK
Pregel
Hive
Scala
Acyclic data flow
Resilient distributed datasets

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.

Why do we need consistency in distributed computing?
What are the benefits and limitations of Paxos?
What are the advantages and disadvantages of a distributed column store?
What is the architecture of the SPARK framework?

week 5

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:

Know the motivations for machine learning and graph processing
Understand why conventional machine learning and graph processing tools are not sufficient for large data sets
Understand why cloud computing provides a platform for large scale machine learning and graph processing
Learn how to write graph processing algorithms that have adjacency lists too large for the memory of one machine
Learn and explain how to use a basic machine learning tool
Understand how to use a K-Means algorithm for clustering data sets
Describe a classification problem and Naïve Bayes Classifier
Understand concepts of Frequent Pattern Mining
Learn about the Graph Processing and Machine Learning libraries of Apache Spark

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.

Graph Processing
Index-free adjacency
Relations
Ranking
Fusion/Diffusion
Machine Learning
Large training data set
Regression
Clustering
Classification
Recommendation
Anomaly detection
Pattern recognition

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.

How does a graph processing problem get solved by focusing the solution on a vertex?
What algorithms are best described as a graph processing algorithm?
What are the basic machine learning algorithms?
How does a cloud-based machine learning tool help simplify using machine learning to solve big data problems?
What are the benefits of using a machine learning / graph specific framework, compared to writing one's algorithm directly on Hadoop?

Friday, April 3, 2015

To do

how to design schema?! for no-sql

Sunday, March 22, 2015

UIUC Cloud Concept notes

Part 1:

is cluster a peer to peer to peer system? Or do we use peer to peer system in a cluster?

---------------------------------
gossip-based membership

------------------------------------
Lamport timestamp
It follows some simple rules:

A process increments its counter before each event in that process;
When a process sends a message, it includes its counter value with the message;
On receiving a message, the receiver process sets its counter to be greater than the maximum of its own value and the received value before it considers the message received.

------------------------------
Cassandra
http://www.datastax.com/documentation/articles/cassandra/cassandrathenandnow.html
virtual node in Cassandra
http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2

column ~= cell, column family ~= table

consistent hashing
2 challenges:
First, the random position assignment of each node on the ring leads to non-uniform data and load distribution. Second, the basic algorithm is oblivious to the heterogeneity in the performance of nodes
solutions:
One is for nodes to get assigned to multiple positions in the circle (like in Dynamo), and the second is to analyze load information on the ring and have lightly loaded nodes move on the ring to alleviate heavily loaded nodes as described in

-----------------------------
CAP theorem
http://en.wikipedia.org/wiki/CAP_theorem
------------------------------------------------------------
Chandy–Lamport algorithm

-----------------------------------------------------
Virtual synchrony is an interprocess message passing (sometimes called ordered, reliable multicast) technology. Virtual synchrony systems allow programs running in a network to organize themselves into process groups, and to send messages to groups (as opposed to sending them to specific processes). Each message is delivered to all the group members, in the identical order, and this is true even when two messages are transmitted simultaneously by different senders. Application design and implementation is greatly simplified by this property: every group member sees the same events (group membership changes and incoming messages) in the same order.

http://en.wikipedia.org/wiki/Virtual_synchrony

--------------------------------------------------------------------
Bloom filter
http://stackoverflow.com/questions/4282375/what-is-the-advantage-to-using-bloom-filters
So an example usage pattern might be:
You've got a lot of data, on disk -- you decide on what error bound you want (e.g. 1%), that prescribes the value of m. Then the optimal k is determined (from the formula given in the article). You populate your filter from this disk-bound data once.
Now you have the filter in RAM. When you need to process some element, you query your filter to see if it stands a chance of existing in your data set. If it doesn't, no extra work is done. No disk reads, etc. (Which you would have to do if it were a hash or tree, etc).
Otherwise, if the filter says "Yes, it's in there", there's a 1% chance that it's wrong, so you do the necessary work to find out. 99% of the time, it really will be there, so the work was not for naught.

week1:

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:

Know the key differences between cloud computing and previous generations of distributed systems.
Design MapReduce programs for a variety of problems.
Know how Hadoop schedules jobs.

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.

Clouds
MapReduce paradigm
Hadoop YARN

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.

Why is cloud computing popular today?
What is different in cloud computing compared to previous generations of distributed systems?
How does one program in MapReduce?
How does the MapReduce system schedule jobs?
Readings and Resources
No required reading for this week. But you could look at the Apache Hadoop documentation.

week2:

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:

Analyze various gossip/epidemic protocols.
Design and analyze various distributed membership protocols.
Know what grid computing is.

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you interact with the lectures. These topics will help you better understand the content in this module.

Failure detectors
Membership protocols
Gossip/epidemic protocols
Grid computing

Guiding Questions

Develop your answers to the following guiding questions while completing the activities throughout the week.

Why are gossip and epidemic protocols fast and reliable?
What is the most efficient way for cloud computing systems to detect failures of servers?
How is grid computing related to cloud computing?

week3:

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:

Know how Napster, Gnutella, FastTrack, and BitTorrent work.
Know and analyze how distributed hash tables work (Chord, Pastry, and Kelips).

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you interact with the lectures. These topics will help you better understand the content in this module.

Peer-to-peer systems
Industrial P2P systems: Napster, Gnutella, FastTrack, BitTorrent
Distributed hash tables: Chord, Pastry, Kelips

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.

What is the difference between how Napster clients and Gnutella clients search for files?
What is the difference between Gnutella and FastTrack?
What is BitTorrent’s tit for tat mechanism?
What is consistent hashing?
Why are DHTs efficient in searching?
How does Chord route queries?
How does Pastry route queries?
How does Kelips route queries?
What is churn in P2P systems?
How does Chord maintain correct neighbors in spite of failures and churn?

Readings and Resources

No readings are required for this week. But you can look at the following documentation:

week 4:

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:

Know why key-value/NoSQL are gaining popularity
Know the design of Apache Cassandra
Know the design of Apache HBase
Use various time synchronization algorithms
Apply Lamport and vector timestamps to order events in a distributed system

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.

Key-value and NoSQL stores
Cassandra system
CAP theorem
Consistency-availability tradeoff and spectrum
Eventual consistency
HBase system
ACID vs. BASE
Time synchronization algorithms in asynchronous systems: Cristian's, NTP, and Berkeley algorithms
Lamport causality and timestamps
Vector timestamps

Guiding Questions

Develop your answers to the following guiding questions while completing the assignments throughout the week.

Why are key-value/NoSQL systems popular today?
How does Cassandra make writes fast?
How does Cassandra handle failures?
What is the CAP theorem?
What is eventual consistency?
What is a quorum?
What are the different consistency levels in Cassandra?
How do snitches work in Cassandra?
Why is time synchronization hard in asynchronous systems?
How can you reduce the error while synchronizing time across two machines over a network?
How does HBase ensure consistency?
What is Lamport causality?
Can you assign Lamport timestamps to a run?
Can you assign vector timestamps to a run?

Readings and Resources

There are no readings required for this week. But you can look at the following documentation:

week 5

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:

Design an algorithm to calculate a distributed snapshot.
Assign FIFO/Causal/Total ordering to multicast messages.
Design a reliable multicast protocol.
Know the working of the industry-standard protocol called Paxos.
Know why consensus is hard to solve.

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.

Global Snapshots
Multicast Ordering
Multicast Reliability
Paxos
Impossibility of Consensus

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.

What is the difference between a safety property and a liveness property?
How does the Chandy-Lamport algorithm work?
How do you assign FIFO/Causal timestamps to multicasts in a distributed system?
How does Paxos use quorums to ensure safety?
Why is consensus impossible to solve in asynchronous systems?

Readings & Resources

No readings required for this week. But you can look at the following documentation:

Part 2:

two-phase locking occurs inside of a transaction
release the lock when transaction reaches the commit point

week1:

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:

Design Leader Election algorithms including Ring algorithm and Bully algorithm.
Design Mutual Exclusion algorithms including Ricart-Agrawala’s algorithm and Maekawa’s algorithm.
Know the design of Leader Election used in industry systems: Google’s Chubby system and Apache Zookeeper.
Know how industry systems like Google’s Chubby support mutual exclusion.

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.

Google Chubby Leader Election
Apache Zookeeper Leader Election
Ring Mutual Exclusion
Ricart-Agrawala’s Mutual Exclusion
Maekawa Mutual Exclusion

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.

What are the safety and liveness conditions for Leader Election?
Why is Leader Election hard?
How long does the Ring Election algorithm take to complete?
How long does the Bully Election algorithm take to complete?
How does Google Chubby use quorums for election?
What are the safety and liveness conditions for Mutual Exclusion?
How do semaphores work?
How long does the Ring Mutual Exclusion algorithm take to complete?
How long does the Ricart-Agrawala’s algorithm take to complete?
Why is Maekawa’s algorithm “optimal”?
How does Google Chubby use quorums for mutual exclusion?

Readings & Resources

(Optional, fun video) Amazon Web Services (AWS) Training Videos
(Optional, fun video) Amazon Web Services (AWS) Training Videos (China host)

week2:

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:

Know how Remote Procedure Calls (RPCs) work.
Check a run of transactions for correctness (serial equivalence).
Design systems that use optimistic or pessimistic approaches to ensure correctness in spite of many concurrent clients.
Detect and avoid deadlocks.
Calculate nines availability for a replicated system.
Know how to ensure correctness (consistency) in spite of multiple servers.

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.

LPCs vs RPCs
Marshaling
Serial Equivalence
Pessimistic Concurrency Control
Optimistic Concurrency Control
Deadlocks and their detection/avoidance/prevention
ACID Properties
Nines Availability
Active and Passive Replication
Two-phase commit

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.

Does an RPC always cross machine boundaries?
Why is marshaling needed at all?
What are conflicting operations and how can you use them to detect serial equivalence among transactions?
Is locking a form of pessimistic or optimistic concurrency control?
Does Google docs use pessimistic or optimistic concurrency control?
What is one way to prevent deadlocks among transactions?
What does “three nines availability” really mean?
Why is Two-phase commit preferable over One-phase commit?

week3:

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:

Apply classical scheduling algorithms.
Apply popular Hadoop scheduling algorithms.
Know the internals of Apache Storm, a stream processing engine.
Know how enormous graphs can be processed in clouds.

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.

Classical Scheduling algorithms, including FIFO, Shortest Task First, and Round Robin
Popular Hadoop schedulers including Capacity Scheduler and Fair Scheduler
Internals of Apache Storm, a stream processing engine
Internals of distributed graph processing engines, e.g., Google’s Pregel

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.

Why is shortest-task-first optimal?
What is the difference between the Capacity and Fair schedulers in Hadoop?
What is a Storm topology?
What is Gather-Apply-Scatter paradigm in distributed graph processing?

Readings and Resources

There aren't any readings required for this week, but you can look at the following documentation:

Apache Storm

week4:

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:

Know the internals of Distributed File Systems like NFS and AFS.
Know the internals of Distributed Shared Memory systems.
Know what’s inside a sensor mote and why networks of them are needed.

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.

Distributed File Systems: Why they’re different from single-node file systems
Internals of NFS
Internals of AFS
Distributed Shared Memory: How processes can share memory pages while communicating via messages
Invalidate protocols in Distributed Shared Memory systems
Sensor networks: Why they’ve emerged, what’s inside them, where they’re used, and what are the challenges

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.

Why are Distributed File Systems stateless?
How does NFS provide transparency?
Why is whole file caching a reasonable approach in AFS?
When is invalidate preferable over update in Distributed Shared memory systems?
Why can’t embedded operating systems be used in sensor motes?
What is the disadvantage of using a spanning tree in sensor network, for aggregation?

Readings and Resources

There are no readings required for this week, but you can look at the following documentation:

TinyOS

week5:

Goals and Objectives

After you actively engage in the learning experiences in this module, you should be able to:

Know and design security policies.
Know, apply, and design security mechanisms.
Know the common properties of various types of networks.
See case studies of multiple real datacenter outages, and learn lessons from them.

Key Phrases/Concepts

Keep your eyes open for the following key terms or phrases as you complete the readings and interact with the lectures. These topics will help you better understand the content in this module.

Security policies vs. mechanisms
Security mechanisms: encryption, authentication, authorization
Common characteristics among the various types of networks that surround us
Why do datacenters suffer outages and what can we do about it?

Guiding Questions

Develop your answers to the following guiding questions while completing the readings and working on assignments throughout the week.

What is the difference between confidentiality, integrity, and availability?
What is the difference between authentication, authorization, and auditing?
What is a replay attack?
What is the difference between public-private key cryptography and shared key cryptography?
What is a small world network?
What is the difference between a small network and a power law network?
What is the most common cause of datacenter outages?
What are three techniques to mitigate datacenter outages?

Readings and Resources

No readings are required for this week. But you can look at the following documentations:

Saturday, July 19, 2014

about event-driven

stack ripping???
----
whether to use an async design depends on the time that a method takes to return a value.

it is up to either service or client to use an async model.

async method usually comes with Future call.

Thursday, July 10, 2014

Notes: Distributed System -- MapReduce, GFS, BigTable

Map Reduce
Highlights
Fault tolerance:
worker failure

Completed map tasks are re-executed on a failure because their output is stored on the local disk(s) of the failed machine and is therefore inaccessible. Completed reduce tasks do not need to be re-executed since their output is stored in a global le system.

Page 5

The MapReduce master takes the location information of the input les into account and attempts to schedule a map task on a machine that contains a replica of the corresponding input data. Failing that, it attempts to schedule a map task near a replica of that task's input data (e.g., on a worker machine that is on the same network switch as the machine containing the data).

Page 6, Refinements 4.1-4.4

--------------------------------
GFS
3.4, The snapshot operation makes a copy of a file or a directory tree (the “source”) almost instantaneously, while minimizing any interruptions of ongoing mutations.
Work flow:

Master receives snapshot request

Master revokes any outstanding lease

Master makes a duplicate of metadata for source files of directory

That's it!

Whenever replica receives a request of write to Chunk C(in source files or directory), it will make a new copy of Chunk C in local disc, then starts working in new copy

Unlike many traditional file systems, GFS does not have a per-directory data structure that lists all the files in that directory. Nor does it support aliases for the same file or directory (i.e, hard or symbolic links in Unix terms). GFS logically represents its namespace as a lookup table mapping full pathnames to metadata

master --> prime chunkserver --> secondary chunckserver(replicas) --> chunks

4.5 use version number to detect stale replicas

----------------------------------------
BigTable

The master is responsible for assigning tablets to tablet servers, detecting the addition and expiration of tablet servers, balancing tablet-server load, and garbage collection of files in GFS. In addition, it handles schema changes such as table and column family creations. It does not know the location of a tablet, metadata has that info

5.2, bigtable uses chubby to detect tablet server failures

ensure there is only one active master
store the bootstrap location of BigTable data
discover tablet servers
store BigTable schema information
store access control lists

**
reference: http://blog.csdn.net/opennaive/article/details/7532589
http://jimbojw.com/wiki/index.php?title=Understanding_Hbase_and_BigTable
http://read.seas.harvard.edu/cs261/2011/bigtable.html

**Because each row may have any number of different columns, there's no built-in way to query for a list of all columns in all rows
** In most cases, applications will simply ask for a given cell's data, without specifying a timestamp. In that common case, HBase/BigTable will return the most recent version (the one with the highest timestamp) since it stores these in reverse chronological order

**data replication is taken cared by GFS?