What do Netflix use to create videos?

If you have ever wondered what tools, protocols and encryption Netflix use a quick peak into one of their mp4′s shows that they use a lot of the usual tools in their workflow. Note that this sample was derived from the Chrome browser and there are likely different variations for different devices.

For the first chunk / preview
1. The container is an mp4 for avc1
2. The video is packaged using GPAC aka MP4Box (GPAC0.5.1-DEV-rev4944M)
3. Video is H.264 and audio aac
4. H.264 is encoded using x264, settings
x264 – core 118 r234 d84818a – H.264/MPEG-4 AVC codec – Copyleft 2003-2011 – http://www.videolan.org/x264.html – options: cabac=1 ref=3 deblock=1:0:0 analyse=0×1:0×111 me=umh subme=7 psy=1 psy_rd=1.00:0.00 mixed_ref=1 me_range=16 chroma_me=1 trellis=2 8x8dct=0 cqm=0 deadzone=21,11 fast_pskip=0 chroma_qp_offset=-2 threads=12 sliced_threads=0 nr=0 decimate=1 interlaced=0 bluray_compat=0 constrained_intra=0 bframes=2 b_pyramid=0 b_adapt=2 b_bias=0 direct=3 weightb=1 open_gop=0 weightp=2 keyint=250 keyint_min=25 scenecut=40 intra_refresh=0 rc_lookahead=40 rc=2pass mbtree=1 bitrate=600 ratetol=1.0 qcomp=0.50 qpmin=6 qpmax=51 qpstep=4 cplxblur=20.0 qblur=0.5 ip_ratio=1.40 aq=1:1.00

For other chunks
1. mp4/dash
2. Netflix encryption – NetflixPstrm
3. Created with Netflix Media Library Version 80.0.158
4. AAC-LC audio
mp4a – MP4 Audio Description) at 8 (87 bytes)
| | | | | | | Channels: 2
| | | | | | | Sample Size: 16
| | | | | | | Sample Rate: 24000.0
| | | | | | | (esds – Extended Sample Description) at 28 (51 bytes)
| | | | | | | | Audio Type: 64
| | | | | | | | Buffer Size: 579
| | | | | | | | Bitrate: 65587
| | | | | | | | Max Bitrate: 107720
| | | | | | | | Audio Specifc Config:
| | | | | | | | Audio Object Type: 2 – AAC LC (Low Complexity)
| | | | | | | | Sampling Frequency: 24000
| | | | | | | | Channel Config: 2 channels – Left, Right
| | | | | | | | SBR Present: 0
5. A DASH index / SIDX box
6. AVC video
(avcC – AVC Configuration) at 78 (56 bytes)
| | | | | | | | Version: 1
| | | | | | | | Profile: 77 (Main)
| | | | | | | | Profile Compatibility: 64 (Main)
| | | | | | | | Level: 3.0
| | | | | | | | NAL Unit Length: 4
| | | | | | | | Sequence Set Count: 1
| | | | | | | | NAL unit type: 7
| | | | | | | | Profile: 77 (Main)
| | | | | | | | Level: 3.0
| | | | | | | | SPS ID: 0
| | | | | | | | Max Frame Num: 16
| | | | | | | | Max Ref Frames: 3
| | | | | | | | Chroma Format: 1 (4:2:0)
| | | | | | | | Bit Depth: 8 (luma), 8 (chroma)
| | | | | | | | Macroblock Width: 32 (512 pixels luma, 256 pixels chroma)
| | | | | | | | Map Unit Height: 24 (384 pixels luma, 192 pixels chroma)
| | | | | | | | VUI parameters present: 1
| | | | | | | | Sample Aspect Ratio: 4:3
| | | | | | | | Timing Info: num_units_in_tick=1001, time_scale=48000, fixed_frame_rate_flag=1
| | | | | | | | NAL/HRD present! (
7. Common Encryption for PlayReady, Widevine and a custom system not registered on the DASH list, assume this is Netflix specific encryption
| pssh – Protection System Specific Header at 372 (702 bytes)
| | System ID: 9A04F079-9840-4286-AB92-E65BE0885F95
| | Version: 0
| | Data Size: 670
| | Header: ��16AESCTRAAAAANQDUfcAAAAAAAAAAA==OgsMEY/8lsI=true
| pssh – Protection System Specific Header at 1074 (52 bytes)
| | System ID: EDEF8BA9-79D6-4ACE-A3C8-27DCD51D21ED
| | Version: 0
| | Data Size: 20
| | Header:��Q
| pssh – Protection System Specific Header at 1126 (76 bytes)
| | System ID: 29701FE4-3CC7-4A34-8C5B-AE90C7439A47
| | Version: 0
| | Data Size: 44
| | Header: ��Q�r����ԗ���P\i���
EH%M�<�^

Encryption is using an IV size of 8 which is best for cross platform compatibility
The KeyID looks quite short:

Impact of adding additional keyframes on video size

The following is a quick test on looking at the impact on filesize of adding additional keyframes to an encode for the purpose of making it more suitable for segmentation in adaptive bitrate delivery.

The following shows that on a very hard to encode source file (reflections on water) that adding a keyframe every 50 frames (2 seconds on 25 fps source) that the overall increase in size was only 96k (approx 1%).

Note that the video is 1153 frames long so we would expect to have at least 23 key frames given a requested keyframe interval of 50. The source also had very few traditional scene changes which is reflected by the x264 generated number of keyframes of only 6 when left to only adding keyframes on scenecut.

Sample command to show GOP / I frame structure using ffprobe:

ffprobe -select_streams v:0 -show_frames goptest.mov |grep key_frame|less

File with x264 selected key frames

Encode settings

/usr/local/bin/ffmpeg -y -i goptest.mov -codec:v libx264 -b:v 6000K -s 1920x1080 -preset slower -tune film -me_range 24 -bufsize 50000K -maxrate 50000K -refs 4 -profile:v high -level 4.1 -threads 0 -sn 'goptest_1080p.mp4'

ffmpeg encode results

video:34160kB audio:538kB subtitle:0 global headers:0kB muxing overhead 0.077031%
[libx264 @ 0x7fc873846800] frame I:6     Avg QP:16.50  size: 25800
[libx264 @ 0x7fc873846800] frame P:1131  Avg QP:23.22  size: 30785
[libx264 @ 0x7fc873846800] frame B:16    Avg QP: 6.54  size:   354
[libx264 @ 0x7fc873846800] consecutive B-frames: 98.0%  0.3%  0.3%  1.4%
[libx264 @ 0x7fc873846800] mb I  I16..4: 66.5% 32.0%  1.5%
[libx264 @ 0x7fc873846800] mb P  I16..4: 49.4% 40.7%  1.3%  P16..4:  6.4%  0.4%  0.1%  0.0%  0.0%    skip: 1.6%
[libx264 @ 0x7fc873846800] mb B  I16..4:  0.2%  0.1%  0.0%  B16..8:  9.5%  0.0%  0.0%  direct: 0.1%  skip:90.2%  L0:68.0% L1:31.9% BI: 0.1%
[libx264 @ 0x7fc873846800] final ratefactor: 22.37
[libx264 @ 0x7fc873846800] 8x8 transform intra:44.5% inter:93.6%
[libx264 @ 0x7fc873846800] direct mvs  spatial:62.5% temporal:37.5%
[libx264 @ 0x7fc873846800] coded y,uvDC,uvAC intra: 21.3% 67.9% 13.0% inter: 17.0% 39.8% 0.6%
[libx264 @ 0x7fc873846800] i16 v,h,dc,p: 35% 25% 18% 22%
[libx264 @ 0x7fc873846800] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 14% 19% 36%  3%  5%  5% 10%  3%  5%
[libx264 @ 0x7fc873846800] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 18% 19% 44%  1%  6%  4%  6%  1%  2%
[libx264 @ 0x7fc873846800] i8c dc,h,v,p: 43% 33% 19%  5%
[libx264 @ 0x7fc873846800] Weighted P-Frames: Y:0.3% UV:0.2%
[libx264 @ 0x7fc873846800] ref P L0: 62.8%  3.6% 17.0%  9.3%  7.3%
[libx264 @ 0x7fc873846800] ref B L0: 99.4%  0.6%
[libx264 @ 0x7fc873846800] ref B L1: 99.7%  0.3%
[libx264 @ 0x7fc873846800] kb/s:6067.44

File with keyint=50

Encode settings

Note that just using keyint on it’s own doesn’t necessarily mean that keyframes will be forced every 50 frames, see below for alternative option by switching x264 scenecut off

/usr/local/bin/ffmpeg -y -i goptest.mov -codec:v libx264 -b:v 6000K -s 1920x1080 -preset slower -tune film -me_range 24 -bufsize 50000K -maxrate 50000K -refs 4 -profile:v high -level 4.1 -x264opts keyint=50 -threads 0 -sn 'goptest_1080p_keyint50.mp4'

ffmpeg encode results

video:34256kB audio:538kB subtitle:0 global headers:0kB muxing overhead 0.077043%
[libx264 @ 0x7fe631846800] frame I:26    Avg QP:19.07  size: 41326
[libx264 @ 0x7fe631846800] frame P:1112  Avg QP:23.33  size: 30574
[libx264 @ 0x7fe631846800] frame B:15    Avg QP: 6.27  size:   300
[libx264 @ 0x7fe631846800] consecutive B-frames: 98.1%  0.3%  0.5%  1.0%
[libx264 @ 0x7fe631846800] mb I  I16..4: 52.5% 43.3%  4.2%
[libx264 @ 0x7fe631846800] mb P  I16..4: 49.8% 40.4%  1.3%  P16..4:  6.3%  0.4%  0.1%  0.0%  0.0%    skip: 1.8%
[libx264 @ 0x7fe631846800] mb B  I16..4:  0.2%  0.1%  0.0%  B16..8:  2.8%  0.0%  0.0%  direct: 0.0%  skip:96.8%  L0:12.8% L1:87.0% BI: 0.2%
[libx264 @ 0x7fe631846800] final ratefactor: 22.35
[libx264 @ 0x7fe631846800] 8x8 transform intra:44.1% inter:94.0%
[libx264 @ 0x7fe631846800] direct mvs  spatial:60.0% temporal:40.0%
[libx264 @ 0x7fe631846800] coded y,uvDC,uvAC intra: 21.3% 67.2% 13.0% inter: 16.4% 38.5% 0.6%
[libx264 @ 0x7fe631846800] i16 v,h,dc,p: 35% 25% 18% 22%
[libx264 @ 0x7fe631846800] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 14% 19% 37%  3%  5%  5% 10%  3%  5%
[libx264 @ 0x7fe631846800] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 19% 20% 43%  1%  5%  4%  6%  1%  2%
[libx264 @ 0x7fe631846800] i8c dc,h,v,p: 43% 33% 19%  5%
[libx264 @ 0x7fe631846800] Weighted P-Frames: Y:0.3% UV:0.2%
[libx264 @ 0x7fe631846800] ref P L0: 63.5%  3.6% 16.9%  9.1%  7.0%
[libx264 @ 0x7fe631846800] ref B L0: 89.3% 10.7%
[libx264 @ 0x7fe631846800] kb/s:6084.54

Whith scene cut disabled

Note that with scenecut disable there are actually less key frames than with scenecut enabled, which makes sense as opposed to just dropping a key frame in every 50 frames it would also add them when a scene change is detetecd

/usr/local/bin/ffmpeg -y -i goptest.mov -codec:v libx264 -b:v 6000K -s 1920x1080 -preset slower -tune film -me_range 24 -bufsize 50000K -maxrate 50000K -refs 4 -profile:v high -level 4.1 -x264opts keyint=50:no-scenecut -threads 0 -sn 'goptest_1080p_keyint50_noscenecut.mp4'
frame= 1153 fps=1.1 q=-1.0 Lsize= 34821kB time=00:00:46.04 bitrate=6195.7kbits/s dup=15 drop=0 
video:34255kB audio:538kB subtitle:0 global headers:0kB muxing overhead 0.077056%
[libx264 @ 0x7f9b9b846800] frame I:24 Avg QP:19.70 size: 45538
[libx264 @ 0x7f9b9b846800] frame P:1112 Avg QP:23.33 size: 30551
[libx264 @ 0x7f9b9b846800] frame B:17 Avg QP: 7.33 size: 643
[libx264 @ 0x7f9b9b846800] consecutive B-frames: 97.9% 0.2% 0.5% 1.4%
[libx264 @ 0x7f9b9b846800] mb I I16..4: 51.0% 45.6% 3.4%
[libx264 @ 0x7f9b9b846800] mb P I16..4: 49.8% 40.4% 1.3% P16..4: 6.3% 0.4% 0.1% 0.0% 0.0% skip: 1.7%
[libx264 @ 0x7f9b9b846800] mb B I16..4: 0.2% 0.1% 0.0% B16..8: 8.3% 0.0% 0.0% direct: 0.5% skip:90.8% L0:73.7% L1:26.2% BI: 0.0%
[libx264 @ 0x7f9b9b846800] final ratefactor: 22.35
[libx264 @ 0x7f9b9b846800] 8x8 transform intra:44.2% inter:94.3%
[libx264 @ 0x7f9b9b846800] direct mvs spatial:64.7% temporal:35.3%
[libx264 @ 0x7f9b9b846800] coded y,uvDC,uvAC intra: 21.2% 67.2% 12.9% inter: 16.5% 38.6% 0.5%
[libx264 @ 0x7f9b9b846800] i16 v,h,dc,p: 35% 25% 18% 22%
[libx264 @ 0x7f9b9b846800] i8 v,h,dc,ddl,ddr,vr,hd,vl,hu: 14% 19% 37% 3% 5% 5% 10% 3% 5%
[libx264 @ 0x7f9b9b846800] i4 v,h,dc,ddl,ddr,vr,hd,vl,hu: 18% 20% 43% 1% 5% 4% 6% 1% 2%
[libx264 @ 0x7f9b9b846800] i8c dc,h,v,p: 43% 33% 19% 5%
[libx264 @ 0x7f9b9b846800] Weighted P-Frames: Y:0.3% UV:0.2%
[libx264 @ 0x7f9b9b846800] ref P L0: 63.3% 3.6% 16.9% 9.2% 7.0%
[libx264 @ 0x7f9b9b846800] ref B L0: 98.5% 1.5% 0.0%
[libx264 @ 0x7f9b9b846800] ref B L1: 99.6% 0.4%
[libx264 @ 0x7f9b9b846800] kb/s:6084.43

 

 

Useful script for multi-rate HLS output from ffmpeg

#!/bin/bash
VIDSOURCE=”$1RESOLUTION=”854x480”
BITRATE1=”800000BITRATE2=”600000BITRATE3=”400000”
 
AUDIO_OPTS=”-c:a libfaac -b:a 160000 -ac 2VIDEO_OPTS1=”-s $RESOLUTION -c:v libx264 -b:v $BITRATE1 -vprofilebaseline -preset medium -x264opts level=41VIDEO_OPTS2=”-s $RESOLUTION -c:v libx264 -b:v $BITRATE2 -vprofile
baseline -preset medium -x264opts level=41VIDEO_OPTS3=”-s $RESOLUTION -c:v libx264 -b:v $BITRATE3 -vprofile
baseline -preset medium -x264opts level=41OUTPUT_HLS=”-hls_time 3 -hls_list_size 10 -hls_wrap 30 -start_number 1ffmpeg -i$VIDSOURCE-y -threads 4
              $AUDIO_OPTS $VIDEO_OPTS1 $OUTPUT_HLS stream_hi.m3u8
              $AUDIO_OPTS $VIDEO_OPTS2 $OUTPUT_HLS stream_med.m3u8
              $AUDIO_OPTS $VIDEO_OPTS3 $OUTPUT_HLS stream_low.m3u8

Credit to: jeisom@gmail.com