web crawler

Use Case

Assumption

  1. Crawl 1.6M web page/s.
    • 1 trillion pages
    • update every week
  2. 10p to storage
    • 10K per page

Single Machine

network bottleneck</br>
content switch</br>
thread port limit

High Level Architect

Data Schema

Business Logic

Scale