#streaming
under# CDN (Netflix Open Connect example) 2021
notes from videotech streaming conference 2021
## Goals
- optimize content delivery to users
- increase outgoing bandwidth of the system
- increase system reliability
- decrease load to the magistral network
1 origin
2 mid-tier cache
3 points of presence
4 end users
## work with 1 CDN provider
pros:
- easy to setup
- works out of the box
- it just works
cons:
- single point of failure
- not clear understand how good it works
QoE factors:
- content encoding (under our control)
- CDN (can't compare different CDNs)
- ISP (can compare different ISPs)
- Home network (can compare different home networks)
- Client software / adaptive streaming (under our control)
- Device capabilities and performance (can compare different devices)
- difficult to optimize price
- minimum commit
- negotiation
- vendor lock-in
## work with multiple CDN providers
buy or build?
how to distribute traffic between CDNs?
- DNS-based
- Manifest re-writing
- Manifest generation
- Client-based
how to build by yourself?
reasons to build:
- commercial CDNs is not enough
- functionality
- price
- have resources and support to build
- maybe you will be able to sell it to others
## Multi origin
one:
+ single point of failure
- cost is lower
multiple:
- cost is higher
- failure resilience:
- CDN QoE depends only on CDN
### traffic distribution
- minimum commit
- QoE optimization
- service failure resilience
- content-aware routing
### analysis of QoE: dimensions for Multi-CDN
- which CDN deliver content
- CDN Point of Presence (PoP)
- ask CDN providers
- use public data (geo lookup)
- client device type
- content-related dimensions (popularity, size, type)
## Build our own CDN (Netflix Open Connect example)
why?
- optimization
- reduce costs
- increase quality
- what helps optimize?
- specializations
- vertical integration
- when?
- scale
- Development (software&hardware, networking, operations, ISP relations)
- what exactly to build?
- how our tasks is different from commercial CDN?
- how we can use it for our benefit?
- make own CDN simpler and more efficient
### Subtasks
- Deployment: where to put servers
- Steering: how to find nearest server (PoP) for each users
- Load balancing & Failover: what to do with traffic inside POP and how to handle failures
- Caching: how to cache content-aware
### Deployment
- where to put servers
- in ISP datacenters (Internet Service Providers)
- in IX (Internet Exchange Points)
### Steering
anycast
- pop1 and pop2 have the same IP address
- we announce this IP address from both pops
- internet routers will choose the nearest pop
cons:
- we can not control the Steering
- problems we need to solve with providers
pros:
- internet working on us
DNS-based
- smart auth DNS response IP with nearest pop
- smart auth DNS can see only IP caching resolver
pros:
- basic solution geo-DNS
- we control the Steering
cons
- we know only IP caching resolver, but he can be in the different network from the client
- caching delays, non-standard resolvers that override TTL
control plane steering
pros:
- control plane know the client location -> better Steering
- we can use factros like popularity or availability
cons:
- need smart client which will know about our CDN and will request the nearest pop
Open Connect uses control plane Steering
### Load balancing & Failover
Traffic inside Pop
- Dedicated load balancer
pros:
- control over traffic
- LB can use capacity and heartbeat signals from servers
cons:
- LB = bottleneck
- additional failure point
- LAN anycast with ECMP (Equal Cost Multi Path)
pros:
- dont need LB
cons:
- difficult load balancing
- DNS-based load balancing
pros:
- dont need LB
cons:
- delays in DNS Caching
- Control plane load balancing
pros:
- can resolve multiple links to servers
- track health and capacity of servers
- failover logic in control plane
cons:
- need smart client which will know about our CDN and will request to our control plane
### caching
- Proxy caching
client -> edge server -> miss cache -> mid-tier cache -> origin
pros:
- dont need smart client
cons:
- server read and write a lot of data
- Directed caching with pre-positioning content
we have control plane which knows about client location and can pre-position content to the nearest pop
control plane decides where content has to be pre-positioned ahead of time
smart client can ask control plane where to get content
(uses in Open Connect)
predict content popularity and pre-positioning content-aware
pros:
- disk writes happen outside of the critical time
- more efficient caching
cons:
- control plane should know where is each file
- errors in predictions about content popularity
## Optimizations
- increase offload (for ISP)
- reduce Money/Mbps (for Netflix and subscribers)
- reduce W/Mbps (for Netflix and ISP)
- increase QoE (for subscribers)