1. Level 1.0
Database Schema:
a. User
| UserID | Name | Age | 
| 1 | Jason | 25 | 
| 2 | Michael | 26 | 
b. Friendship
| FriendshipID | SourceID | TargetID | 
| 1 | 1 | 2 | 
| 2 | 2 | 1 | 
c. News
| NewsID | AuthorID | Content | Timestamp | 
| 1 | 2 | "Hello" | 2015-01-10 | 
| 2 | 1 | "Hi" | 2015-01-10 | 
Why bad?
100+ friends
a. 1 query --> get friends list
b. 1 query -->
SELECT * FROM news
WHERE timestamp > xxx
AND authorID IN frientsList
LIMIT 1000
IN is slow
2. Level 2.0
a. Pull: Get news from each friend, merge them together. NewsFeed generated when user request.
b. Push: NewsFeed generated when news generated. We have anthor table to store newsfeed, may cause duplicate news.
Push: disadvantage: News Delay.
3. Level 3.0
a. Popular star(Justin Bieber)
Flowers 13M +
Async Push may cause over 30 minutes (13M+ insertions, delay too long)
b. Push + Push
for popular star, don‘t push news
for every newsfeed request, merge non-popular users newsfeed(push) and popular users newsfeed(pull).
4. Level 4.0
Push disadvantage:
a. Realtime
b. Storage (Duplicate)
c. Edit
Go back to Pull:
a. cache users‘ latest(14 days) news
b. Broadcast multiple request to multiple server (shard by userID)
c. Merge & sort newsfeed(time, forward frequency, friends‘ forward, dedup, sort)
d. Cache newsfeeds for this user with timestamp (user_login)
原文:http://www.cnblogs.com/litao-tech/p/4214864.html