#streaming under# CDN (Netflix Open Connect example) 2021 notes from videotech streaming conference 2021 ## Goals - optimize content delivery to users - increase outgoing bandwidth of the system - increase system reliability - decrease load to the magistral network 1 origin 2 mid-tier cache 3 points of presence 4 end users ## work with 1 CDN provider pros: - easy to setup - works out of the box - it just works cons: - single point of failure - not clear understand how good it works QoE factors: - content encoding (under our control) - CDN (can't compare different CDNs) - ISP (can compare different ISPs) - Home network (can compare different home networks) - Client software / adaptive streaming (under our control) - Device capabilities and performance (can compare different devices) - difficult to optimize price - minimum commit - negotiation - vendor lock-in ## work with multiple CDN providers buy or build? how to distribute traffic between CDNs? - DNS-based - Manifest re-writing - Manifest generation - Client-based how to build by yourself? reasons to build: - commercial CDNs is not enough - functionality - price - have resources and support to build - maybe you will be able to sell it to others ## Multi origin one: + single point of failure - cost is lower multiple: - cost is higher - failure resilience: - CDN QoE depends only on CDN ### traffic distribution - minimum commit - QoE optimization - service failure resilience - content-aware routing ### analysis of QoE: dimensions for Multi-CDN - which CDN deliver content - CDN Point of Presence (PoP) - ask CDN providers - use public data (geo lookup) - client device type - content-related dimensions (popularity, size, type) ## Build our own CDN (Netflix Open Connect example) why? - optimization - reduce costs - increase quality - what helps optimize? - specializations - vertical integration - when? - scale - Development (software&hardware, networking, operations, ISP relations) - what exactly to build? - how our tasks is different from commercial CDN? - how we can use it for our benefit? - make own CDN simpler and more efficient ### Subtasks - Deployment: where to put servers - Steering: how to find nearest server (PoP) for each users - Load balancing & Failover: what to do with traffic inside POP and how to handle failures - Caching: how to cache content-aware ### Deployment - where to put servers - in ISP datacenters (Internet Service Providers) - in IX (Internet Exchange Points) ### Steering anycast - pop1 and pop2 have the same IP address - we announce this IP address from both pops - internet routers will choose the nearest pop cons: - we can not control the Steering - problems we need to solve with providers pros: - internet working on us DNS-based - smart auth DNS response IP with nearest pop - smart auth DNS can see only IP caching resolver pros: - basic solution geo-DNS - we control the Steering cons - we know only IP caching resolver, but he can be in the different network from the client - caching delays, non-standard resolvers that override TTL control plane steering pros: - control plane know the client location -> better Steering - we can use factros like popularity or availability cons: - need smart client which will know about our CDN and will request the nearest pop Open Connect uses control plane Steering ### Load balancing & Failover Traffic inside Pop - Dedicated load balancer pros: - control over traffic - LB can use capacity and heartbeat signals from servers cons: - LB = bottleneck - additional failure point - LAN anycast with ECMP (Equal Cost Multi Path) pros: - dont need LB cons: - difficult load balancing - DNS-based load balancing pros: - dont need LB cons: - delays in DNS Caching - Control plane load balancing pros: - can resolve multiple links to servers - track health and capacity of servers - failover logic in control plane cons: - need smart client which will know about our CDN and will request to our control plane ### caching - Proxy caching client -> edge server -> miss cache -> mid-tier cache -> origin pros: - dont need smart client cons: - server read and write a lot of data - Directed caching with pre-positioning content we have control plane which knows about client location and can pre-position content to the nearest pop control plane decides where content has to be pre-positioned ahead of time smart client can ask control plane where to get content (uses in Open Connect) predict content popularity and pre-positioning content-aware pros: - disk writes happen outside of the critical time - more efficient caching cons: - control plane should know where is each file - errors in predictions about content popularity ## Optimizations - increase offload (for ISP) - reduce Money/Mbps (for Netflix and subscribers) - reduce W/Mbps (for Netflix and ISP) - increase QoE (for subscribers)