Use Case
Post a tweet </br>
get one’s own timeline </br>
follow/unfollow </br>
like </br>
Assumption
DAU: 150M
Read Query per user per day : 60
Ave Qps: 150M 60 / 86400 = 100k
Peak: 3100k = 300k
Read Qps: 3M
Read to Write Ratio: 10
Write Qps: 30k
Bandwidth: Write Qps content size = 30k 200B = 6M (Not a bottleneck)
Ave followers: 20
news feed: top 50
High Level Architect
Webtier -> Application Tier: user service (Login/Register, MySQL), Tweet service (Post a tweet, One’s Timeline, News Feed, Cassandra), Media Service (Upload Photo/Video, S3), Friendship Service (Follow/Unfollow, NoSQL)
Data Schema
User:
int user_id
String user_name
String password
boolean isStar
Tweet:
int owner_id
<long timestamp, int tweet_id> (Column key)
String content
Friendship:
int from_user_id
int to_user_id
NewFeed:
int user_id (Row key)
<long timestamp, int tweet_id> (Column key)
Business Logic
TweetService
- TweetService.getNewFeed(request)
return NewFeedDB.getNewFeed(request.user_id) - TweetService.postTweet(request)
tweet = TweetDB.insertTweet(request.user_id, request.tweetcontent)
FanoutAsyncProvider.notify(tweet)
return success - FanoutAsyncConsumer.consume(tweet)
friend_id_list = FriendDB.getFriendList(tweet.owner_id)
for each friend_id in friend_id_list:
NewfeedDB.insertTweet(friend_id, tweet)
FriendService
- FriendService.follow(request)
FriendDB.delete(request.from_id, request.to_id)
FollowAsyncProvider.notify(request.from_id, request.to_id) - FollowAsyncProvider.consume(request.from_id, request.to_id)
timeline = TweetDB.getTimeLine(friend_id)
for each tweet in timeline:
NewFeedDB.insertTweet(request.from_id, tweet)
Scale
Step 1: Optimize by dealing with special cases and adding more features
Follow up1: when a movie star post a tweet, take multiple hours to fanout
Solution 1 (recommend): add more push servers to parallelization
Solution 2 (combine pull model, cache aside):
- TweetService.getNewFeed(request)
tweet_id_list = NewFeedDB.getNewFeedList(request.user_id)
for each friend_id in FriendDB.getFriendList(rquest.user_id):
if UserService.isStar(friend_id):
timeline = TweetDB.getOnesTimeline(friend_id)
tweet_id_list.merge(timeline) - TweetService.postTweet(request)
tweet = TweetDB.insertTweet(request.user_id, request.tweetcontent)
if not UserService.isStar(request.user_id):
FanoutAsyncProvider.notify(tweet)
return success - FanoutAsyncConsumer.consume(tweet)
friend_id_list = FriendDB.getFriendList(tweet.owner_id)
for each friend_id in friend_id_list:
NewfeedDB.insertTweet(friend_id, tweet)
Follow up2: majority of movie star’s followers unfollow the movie star
Solution 1: movie star is not determined by follower numbers. The configuration of a movie star is processed by hand
Solution 2: the combined pull model could deal with this problem eventually
Follow up3: like. denormalization
Tweet:
int owner_id
<long timestamp, int tweet_id> (Column key)
String content
int likenum
int retweetnum
int commentnum
Like:
int id
int user_id
int tweet_id
Step2: Maintance
Robust: replia/sharding, master/slaver
Scalability: x-scaling, cache, facebook lease get for thundering herd phenomenon (dist mutex or “never time out with passivly updaing expired value”)
近期评论