new feeds

Use Case

Post a tweet </br>
get one’s own timeline </br>
follow/unfollow </br>
like </br>

Assumption

DAU: 150M
Read Query per user per day : 60
Ave Qps: 150M 60 / 86400 = 100k
Peak: 3100k = 300k
Read Qps: 3M
Read to Write Ratio: 10
Write Qps: 30k
Bandwidth: Write Qps content size = 30k 200B = 6M (Not a bottleneck)
Ave followers: 20
news feed: top 50

High Level Architect

Webtier -> Application Tier: user service (Login/Register, MySQL), Tweet service (Post a tweet, One’s Timeline, News Feed, Cassandra), Media Service (Upload Photo/Video, S3), Friendship Service (Follow/Unfollow, NoSQL)

Data Schema

User:
    int user_id
    String user_name
    String password
    boolean isStar

Tweet:
    int owner_id
    <long timestamp, int tweet_id> (Column key)
    String content

Friendship:
    int from_user_id
    int to_user_id

NewFeed:
    int user_id (Row key)
    <long timestamp, int tweet_id> (Column key)

Business Logic

TweetService

TweetService.getNewFeed(request)
return NewFeedDB.getNewFeed(request.user_id)
TweetService.postTweet(request)
tweet = TweetDB.insertTweet(request.user_id, request.tweetcontent)
FanoutAsyncProvider.notify(tweet)
return success
FanoutAsyncConsumer.consume(tweet)
friend_id_list = FriendDB.getFriendList(tweet.owner_id)
for each friend_id in friend_id_list:
NewfeedDB.insertTweet(friend_id, tweet)

FriendService

FriendService.follow(request)
FriendDB.delete(request.from_id, request.to_id)
FollowAsyncProvider.notify(request.from_id, request.to_id)
FollowAsyncProvider.consume(request.from_id, request.to_id)
timeline = TweetDB.getTimeLine(friend_id)
for each tweet in timeline:
NewFeedDB.insertTweet(request.from_id, tweet)

Scale

Step 1: Optimize by dealing with special cases and adding more features

Follow up1: when a movie star post a tweet, take multiple hours to fanout
Solution 1 (recommend): add more push servers to parallelization
Solution 2 (combine pull model, cache aside):

TweetService.getNewFeed(request)
tweet_id_list = NewFeedDB.getNewFeedList(request.user_id)
for each friend_id in FriendDB.getFriendList(rquest.user_id):
if UserService.isStar(friend_id):
timeline = TweetDB.getOnesTimeline(friend_id)
tweet_id_list.merge(timeline)
TweetService.postTweet(request)
tweet = TweetDB.insertTweet(request.user_id, request.tweetcontent)
if not UserService.isStar(request.user_id):
FanoutAsyncProvider.notify(tweet)
return success
FanoutAsyncConsumer.consume(tweet)
friend_id_list = FriendDB.getFriendList(tweet.owner_id)
for each friend_id in friend_id_list:
NewfeedDB.insertTweet(friend_id, tweet)

Follow up2: majority of movie star’s followers unfollow the movie star
Solution 1: movie star is not determined by follower numbers. The configuration of a movie star is processed by hand
Solution 2: the combined pull model could deal with this problem eventually

Follow up3: like. denormalization

Tweet:
    int owner_id
    <long timestamp, int tweet_id> (Column key)
    String content
    int likenum
    int retweetnum
    int commentnum    

Like:
    int id
    int user_id
    int tweet_id

Step2: Maintance

Robust: replia/sharding, master/slaver

Scalability: x-scaling, cache, facebook lease get for thundering herd phenomenon (dist mutex or “never time out with passivly updaing expired value”)

Use Case

Assumption

High Level Architect

Data Schema

Business Logic

Scale

近期文章

近期评论

标签

热门

文章归档

分类目录

功能