Internal Content Distribution Network

So.. we had a problem a couple of years back, our network was designed using MPLS and other direct connection networks and this did not do well for streaming the company's annual meeting. The old system had every user connecting directly to the video provider's media streaming, which taxed the central up link feed.

The solution:

Create an internal CDN to minimize internet and intranet traffic, and provide a frontend viewer to the data. This solution was used in the annual company meetings in 2014 and 2016.

The pieces:

  • libav/avconv
  • varnish
  • apache2
  • php
  • jwplayer
  • pingfederate

Overview:

Incoming stream is rtmp, so use libav to convert to HLS streams. Those HLS streams are now available for apache on the converter system to serve. Varnish caches are installed at as many distributed remote sites as possible. Create a base page with PHP as a frontend, using PHP to determine the most local site based on incoming IP address. Landing page has an embedded jwplayer viewer so the users can stream. Page is protected by SSO with Ping Federate to prevent unauthorized view and to give logging.

Design considerations:

  • The system must be reliable
    • To us this means that we run the system in multiple data centers with automatic fail-over
  • Must work with Internet Explorer browsers, preferably IE8 and newer
  • Must be minimally invasive for end users, no plugins
  • Must be secured against Active Directory logins to prevent unauthorized access
  • Must record user logins
  • Must provide data about where the videos were watched for analysis after the fact
  • Must be able to handle >500 concurrent users watching the streams

Libav:

Starting up the transcode is fairly simple:

#!/bin/bash

rm /var/www/html/video/*.m3u8  
rm /var/www/html/video/*.ts  
STREAM=rtmp://cp122890.live.edgefcs.net/live/2a9722f7-803e-d016-f6a4-e3eba1d3efc3_livestream1@30063

avconv -i $STREAM -strict experimental -acodec aac  -vcodec h264  -g 180 -hls_time 10 -hls_list_size 6 video.m3u8

mv /var/www/html/video/video.m3u8 /var/www/html/video/video.m3u8.save  

Here we use avconv with 10 second chunks of transcoded video and keep 6 of them around. This gives us 60 seconds of buffer that the caches can work with.

This script is started on the transcode nodes (TCD1 and TCD2), and is run in a directory shared by apache2.

Varnish:

We used two layers of Varnish caching, the core and the distributed network. This was decided to implement redundancy as well as to minimize load against the transcode servers.

Core configuration:

backend tcd1 {  
  .host = "tcd1.example.com";
  .port = "80";
  .first_byte_timeout = 30s;
  .connect_timeout = 5s;
  .probe = {
     .url = "/video/video.m3u8";
     .interval = 2s;
     .timeout = 1s;
     .window = 5;
     .threshold = 3;
     }
}

backend tcd2 {  
  .host = "tcd2.example.com";
  .port = "80";
  .first_byte_timeout = 30s;
  .connect_timeout = 5s;
  .probe = {
     .url = "/video/video.m3u8";
     .interval = 2s;
     .timeout = 1s;
     .window = 5;
     .threshold = 3;
     }
}


sub vcl_recv    {  
  set req.backend = tcd1;
     if (req.restarts == 1 || !req.backend.healthy)
           {set req.backend = tcd2;}
}

sub vcl_fetch {  
  if (req.url ~ "\.m3u8$") {set beresp.ttl = 5s;}
  if (req.url ~ "\.ts$") {set beresp.ttl = 1m;}
}

Here we set the timeouts for caching the data very short, since this is streaming live video and after a minute of cache all users will be past this block. The m3u8 file needs to be a very short cache, hence the 5s.

It is setup to pull from the TCD1 backend first, and if that server fails we start pulling from the TCD2. This is because the transcodes are not exactly the same, so pulling from both transcode backends will cause issues with the video stream.

Health checks ensure that the backends are healthy.

On the distributed clients we use a slightly different configuration:

backend varnish1 {  
 .host = "varnish1.example.com";
 .port = "6081";
 .first_byte_timeout = 30s;
 .connect_timeout = 5s;
 .probe = {
        .url = "/video/video.m3u8";
        .interval = 10s;
        .timeout = 5s;
        .window = 5;
        .threshold = 3;
 }
}

backend varnish2 {  
 .host = "varnish2.example.com";
 .port = "6081";
 .first_byte_timeout = 30s;
 .connect_timeout = 5s;
 .probe = {
        .url = "/video/video.m3u8";
        .interval = 10s;
        .timeout = 5s;
        .window = 5;
        .threshold = 3;
 }
}

director default round-robin {  
 {.backend = varnish1;}
 {.backend = varnish2;}
}

sub vcl_recv {set req.backend = default;}

sub vcl_fetch {  
  if (req.url ~ "\.m3u8$") {set beresp.ttl = 5s;}
  if (req.url ~ "\.ts$") {set beresp.ttl = 1m;}
}

Here again we see the same timeouts, however the load balancing is now round robin between the core instances. This is possible because we are pulling only from the active transcode system on both of the core caches, so the data on the core caches will be identical. This provides some load balancing and redundancy in the case where one of the caches has a failure. The same health checks that monitored the core are implemented here to ensure the health of the core caches.

Apache2/PHP/JWPlayer:

In the transcode step we used Apache to serve the transcoded video as static pages, however in this step the setup required some more work to present the video player and ensure that the client logged into the closest cache.

PHP code at start of index.php:

<?PHP

#Check an IP if it is in the designated network
function IsInNetwork($givenIP, $networkIP, $netmask)  
{
    return ((ip2long($givenIP) & ip2long($netmask)) == ip2long($networkIP));
}

#Setup some variables
$exthosts = array( "external1.example.com:6081", "external2.example.com:6081" );
$host = $exthosts[array_rand( $exthosts )];
$host2 = str_replace (":6081","",$host);
$ipaddress = $_SERVER["REMOTE_ADDR"];
$corehosts = array( "varnish1.company.com:6081", "varnish2.company.com:6081" );

#Define remote servers array with network they are responsible for
$targets = array(
    array ( "10.0.0.0", "remote1.company.com:6081" ),
    array ( "10.1.0.0", "remote2.company.com:6081" ),
    array ( "10.2.0.0", "remote3.company.com:6081" ),
# Many more entries in this table to fill out the network
);

#First set a default if there is no specific server defined.
# Check if they are internal or external at this time.
if (IsInNetwork($ipaddress, "10.0.0.0", "255.0.0.0")) {$host = $corehosts[array_rand( $corehosts )];  
} else  
{ $host = $exthosts[array_rand( $exthosts )]; }


#Now parse through the $targets list and see if the client has a local cache
foreach ($targets as &$value) {  
  if (IsInNetwork($ipaddress, $value[0], "255.255.255.0")) {
    $host = $value[1];
    $host2 = str_replace (".company.com:6081","",$host);
  }
}

#Finally setup the URL to use later on in the player definition
$url = "http://" . $host . "/video/video.m3u8";

?>

Setup some logging via syslog:

<?PHP  
syslog(5,"Video Started for ". $_SERVER['PHP_AUTH_USER'] . " from " .$ipaddress. " with URL: ". $url .' on ' . <_SERVER['HTTP_USER_AGENT']);  
?>

Player definition:

<?PHP  
#Block users from using the Termnial server farm to watch video
$bannedIPs = array("10.0.0.50", "10.0.0.51", "10.0.0.52");
if ( in_array($ipaddress, $bannedIPs) ) {  
print "<b>Access from this system is disallowed. Terminal Servers are inappropriate for streaming video.</b>";  
} else {
print '<div id="myElement">';  
print '<script type=\'text/javascript\'>';  
print  'jwplayer("myElement").setup({';                                                                            print 'file: "'.$url.'",';  
print 'width: 640,';  
print 'height: 360,';  
print 'image: "http://video.example.com/logo.png",';  
print 'autostart: true, controls: true, skin:"stormtrooper",';  
print 'androidhls: true';  
print  '});';  
print  '</script>';  
print '</div>';  
}
?>

PingFederate:

Above there are references to PHPAUTHUSER, which are used for logging. These are fed into Apache's environment via the Ping Federate Apache2 agent.

Detailed installation and configuration instructions are found here: PingFederate Apache Integration Kit 2.4.1 for Linux

Future?

A few things can be improved, which I will blog on soon:

  • Using service discovery to automatically populate table
  • Smarter discovery using latency
  • Better High Availability on the avconv transcode