Design YouTube
5 min readYouTube stats (2020): 2B MAU, 5B videos watched/day, 50M creators, $15.1B ad revenue, 37% of mobile internet traffic, 80 languages.

Step 1 - Understand the problem and establish design scope #
Requirements:
- Core features: upload videos and watch videos
- Clients: mobile apps, web browsers, smart TV
- 5 million DAU, avg 30 min/day
- International users supported
- Accepts most video resolutions and formats
- Encryption required
- Max video size: 1 GB
- Leverage existing cloud infrastructure (CDN, blob storage)
Back of the envelope estimation:
- 5M DAU × 5 videos/day watched
- 10% of users upload 1 video/day
- Avg video size: 300 MB
- Daily storage: 5M × 10% × 300 MB = 150 TB
- CDN cost (US, 0.02/GB):5M×5×0.3GB×0.02/GB): 5M × 5 × 0.3GB × 0.02 = $150,000/day

Step 2 - Propose high-level design and get buy-in #
Leverage cloud services: CDN and blob storage (building from scratch too complex; even Netflix uses AWS, Facebook uses Akamai).

Three components: Client → CDN (video streaming) + API servers (everything else).
Video uploading flow #

Components:
- Load balancer: Distributes requests to API servers
- API servers: All non-streaming requests
- Metadata DB: Video metadata (sharded, replicated)
- Metadata cache: Video metadata + user objects
- Original storage: Blob storage for source videos
- Transcoding servers: Convert videos to different formats/bitrates
- Transcoded storage: Blob storage for processed videos
- CDN: Caches and serves videos
- Completion queue + Completion handler: Message queue tracking transcoding completion; workers update metadata cache/DB
Two parallel processes:
Flow A — Upload actual video:

- Video uploaded to original storage → 2. Transcoding servers fetch and transcode → 3a. Transcoded videos → transcoded storage → CDN; 3b. Completion events → completion queue → completion handler updates metadata DB/cache → 4. API servers notify client
Flow B — Update metadata: Client sends metadata (filename, size, format) in parallel; API servers update metadata cache + DB.

Video streaming flow #
Streaming protocol options: MPEG-DASH, Apple HLS, Microsoft Smooth Streaming, Adobe HDS. Videos streamed directly from CDN edge server closest to user → minimal latency.

Step 3 - Design deep dive #
Video transcoding #
Why transcode:
- Raw video huge: 1hr HD at 60fps → hundreds of GB
- Device/browser compatibility requires multiple formats
- Adaptive bitrate: high-res for high bandwidth, low-res for low bandwidth
- Network conditions change → auto/manual quality switching
Video format anatomy:
- Container: File extension (.avi, .mov, .mp4) — wraps video, audio, metadata
- Codecs: Compression algorithms — H.264, VP9, HEVC
Directed Acyclic Graph (DAG) model #
Transcoding is computationally expensive. DAG defines tasks in stages for sequential/parallel execution (inspired by Facebook's SVE).

Tasks:
- Inspection: Quality check, malformed detection
- Video encodings: Different resolutions, codecs, bitrates

- Thumbnail: User-uploaded or auto-generated
- Watermark: Identifying image overlay
Video transcoding architecture #

Six components:
Preprocessor:

- Video splitting: Split into GOP (Group of Pictures) chunks — each a few seconds, independently playable
- Handles old clients that can't split videos
- DAG generation: From client programmer config files

- Cache data: Store GOPs and metadata in temporary storage for retry on failure
DAG scheduler:

Splits DAG into task stages, puts them in resource manager's task queue.

Example: Stage 1 — video, audio, metadata split. Stage 2 — video encoding + thumbnail from video; audio encoding from audio.
Resource manager:

Three queues: Task queue (priority), Worker queue (worker utilization), Running queue (active task/worker bindings). Task scheduler picks optimal task+worker, instructs execution.

Task workers:

Run DAG-defined tasks. Different workers for different task types.

Temporary storage:

Multiple storage types: metadata in memory (small, frequent access); video/audio in blob storage. Freed after processing completes.
Encoded video:

Final output, e.g., funny_720p.mp4.
System optimizations #
Speed — Parallelize video uploading: Split video into GOP chunks client-side. Enables fast resumable uploads.

Speed — Upload centers close to users: Multiple upload centers globally via CDN.

Speed — Parallelism everywhere: Introduce message queues between processing stages to decouple modules. Encoding module doesn't wait for download module output — processes events from queue in parallel.

Safety — Pre-signed upload URL:

- Client requests pre-signed URL from API server → 2. API server returns URL → 3. Client uploads directly. Only authorized users can upload to correct location. (Amazon S3: "pre-signed URL"; Azure: "Shared Access Signature")
Safety — Protect videos:
- DRM: Apple FairPlay, Google Widevine, Microsoft PlayReady
- AES encryption: Decrypt on playback; only authorized users
- Visual watermarking: Company logo/name overlay
Cost-saving:
YouTube videos follow long-tail distribution — few popular, many rarely viewed.
- Only serve popular videos from CDN; rest from high-capacity video servers

- Fewer encoded versions for less popular content; encode short videos on-demand
- Don't distribute region-specific popular videos globally
- Build own CDN + partner with ISPs (Netflix model)
Error handling #
| Component | Error handling |
|---|---|
| Upload | Retry a few times |
| Split video | Old clients: server-side splitting |
| Transcoding | Retry |
| Preprocessor | Regenerate DAG |
| DAG scheduler | Reschedule task |
| Resource manager queue | Use replica |
| Task worker | Retry on new worker |
| API server | Stateless → redirect to other server |
| Metadata cache | Multi-replica; replace dead node |
| Metadata DB master | Promote slave to master |
| Metadata DB slave | Use other slave + bring up replacement |
Step 4 - Wrap up #
Additional talking points:
- Scale API tier: Stateless → easy horizontal scaling
- Scale database: Replication and sharding
- Live streaming: Higher latency requirements, lower parallelism needs, different error handling (no slow retries)
- Video takedowns: Copyright violations, pornography — system detection on upload + user flagging
Reference materials [1] YouTube by the numbers (Omnicore) [2] YouTube Demographics (HubSpot) [3] CloudFront Pricing [4] Netflix on AWS [5] Akamai [6] Binary large object (Wikipedia) [7] Streaming Protocols (Dacast) [8] SVE: Distributed Video Processing at Facebook Scale [9] Weibo video processing architecture (Chinese) [10] Shared Access Signature (Azure) [11] YouTube scalability talk [12] Characteristics of internet short video sharing (arXiv) [13] Content Popularity for Open Connect (Netflix)