Video Encoding Tips

Ever so often I see a poorly encoded video on the internet, which either has black bars, interlacing artifacts, too low bitrate, too large size, incorrect aspect ratio, … and in the worst cases, a mix of all these things. Here are some short tips to reduce the risk of making these mistakes.

The tips give concrete instructions for the program HandBrake, which is a freely available, popular, and good tool for encoding videos.

Ensure your aspect ratio is right

In short: ensure that a circle in the original video is still a circle in your encoded video. For anamorphic material (non-square pixels), you should only convert it to square pixels when downscaling it, otherwise you should preserve the pixels as they are.

Aspect ratiosThe aspect ratio is the width of the displayed video image divided by its height. For instance, classic TV shows had a ratio of 4:3, modern widescreen video typically is 16:9. Make sure that no matter how you rescale your video (e.g. if you you want to reduce the resolution to make it smaller), the proportion of width divided by height remains the same as the original.

There is one caveat here regarding standard-definition TV material. Suppose you have a recording on DVD in standard 480p widescreen format. This means the video frames stored on the DVD will have a resolution of 720×480 pixels. Mind how this does not correspond to a 16:9 ratio even though your TV displays it as such, it is actually a 3:2 ratio. What gives? Well, your TV knows that the video is supposed to be displayed as 16:9 and will therefore stretch it horizontally while rendering the image. This means the pixels are not square, like they are in all recent formats like 720p and 1080p. The cinematographic term for this is ‘anamorphic’ video.

When converting such a DVD to a video file on a hard disk, you may be tempted to make the pixels square. My advice is not to do this unless you are downscaling the video. If you want to simply store the video with minimal loss of image quality, you should keep the original resolution (in this example: 720×480), and ensure the aspect ratio information is preserved, such that the media player knows how to stretch the image. In Handbrake, ensure ‘Keep Aspect Ratio’ remains checked, and ‘Anamorphic’ is set to Strict.

Apply proper deinterlacing

In short: always deinterlace if you know the video material is interlaced: do not rely on ‘smart’ methods unless the material is a mix of interlaced and progressive. Make sure that the output has the same framerate as the original source material (e.g. 24 FPS for a feature film, 50 FPS for a PAL TV show).

Interlacing is the practice of embedding two video images in one frame, with each odd line of the frame belonging to the first image, and each even line to the second (or vice versa). This stems from the era of cathode ray tube televisions. The tube would first print the first set of lines (called the first ‘field’) and then the second. This way, one could transport e.g. 50 FPS material through a 25 FPS transmission. Of course, the vertical resolution of each field was halved, but in case of a static image, each frame retains the full resolution. This was a clever trick to exchange temporal and spatial resolution. Mind that even if a 25 FPS interlaced video stream only has 25 unique images per second, it will still be interlaced: every frame will contain a field with half the previous image, and the other field will contain half the current image.

Interlaced imageInterlacing does not play well however with panel displays, which lack the memory effect of a typical CRT that provides a natural smoothing of interlaced material. Many media players simply dump the interlaced frames on-screen, resulting in typical ‘combing’ artefacts as shown in the image. Some media players can perform deinterlacing on-the-fly. If you are re-encoding a video anyway, it is better to apply high-quality deinterlacing to obtain a progressive video that can be readily played.

Deinterlace vs. decomb

There are two main approaches here: in HandBrake for instance they are called ‘deinterlace’ vs. ‘decomb’. The first will always perform deinterlacing, the latter will try to be smart and only deinterlace when it detects combing. My advice is to avoid the decomb method unless you are encoding material that is a mix of interlaced and progressive video that is not easy to separate. My experience with the decomb filter is that its auto-detection fails quite often and produces jittery artefacts and stuttering. When encoding material that is certain to be interlaced anyway, it makes much more sense to deinterlace every frame. I recommend using the ‘slower’ setting in HandBrake.

Inverse telecine

Next to decomb vs. deinterlace, the second question is what the framerate of the result should be. This is not a trivial question, because a 30 FPS video might for instance represent a 24 FPS feature film, 60 FPS video, or even a 25 FPS European TV show. When converting a 24 FPS film to NTSC format, the so-called telecine process is used. This duplicates frames and interlaces them in a specific way to obtain 30 FPS. The good news is that many DVDs do not use a ‘hard telecine’ where the process is applied to the encoded video, but instead contain progressive video with markers that tell the playback device to perform the telecine. When encoding such material, it requires neither inverse telecine nor deinterlacing. If you encounter a hard telecined video however, you need to enable an inverse telecine filter.

(Note: in Europe this is done differently: because the 25 FPS framerate from PAL TV is close enough to 24, the film is simply sped up by 4.2%. The audio as well, which is why you would hear for instance Walter White on a European TV speak in a pitch almost a semitone higher than in the USA.)

Regular deinterlace vs. ‘Bob’

Until recently, deinterlace methods only tried to produce one deinterlaced output frame per input frame. As I said above, video may contain two unique fields per frame, and in that case one would want two progressive deinterlaced frames per interlaced frame. In HandBrake, both the deinterlace and decomb filters have a ‘Bob’ setting (no idea where the name comes from). This is the one you need if you want to get both fields out of a single interlaced frame. Only enable this if the video really has two fields per frame, otherwise you'll just get each frame repeated twice.

It is unfortunately difficult to give a set of rules for determining correct deinterlace and framerate settings for your output video. When in doubt, my advice is to try the ‘Bob’ filter first, force the framerate to twice that of the source, and do a test run on a fragment of the video. In a player that can advance frame-by-frame, check the output of a scene with a lot of movement. If you use the right settings, there must be no repeated frames, and no ‘ghosting’ due to mixed frames.

As an extreme example, at one time I managed to recover the original 25 FPS video frames from an NTSC conversion of a PAL TV show. Although this conversion is similar to telecine, it is not the same, and an inverse telecine filter cannot be used. In HandBrake, I enabled the ‘Bob’ deinterlace filter and forced the framerate to 25, and then I hoped that the offset of the frame resampler was in sync with the process that had duplicated frames for the NTSC conversion. This was the case for many of the episodes, but for quite a few of them I had to manually add or repeat the first few frames to get the sync right. It did not take long before I gave up and simply bought the PAL DVDs. This scenario only makes sense if the PAL source would be unavailable.

Remove letterboxing (‘black bars’)

In short: your output file should not have any black bars above or below the image, except in a film with mixed aspect ratios.

Letterbox Remember the part about aspect ratio? What happens when a film has been made in e.g. a 21:9 ratio, and needs to be stored in a format that is forced to always have a 16:9 ratio? The image is not tall enough to fill the entire height. The solution is simple: add black bars or so-called ‘mattes’ above and below the image. This is called ‘letterboxing’. In principle you could just re-encode a letterboxed video as-is, but there are a few potential problems. First, if the black areas are not perfectly black but contain a bit of noise, you will be wasting precious bits on encoding this useless noise. Second, the sharp edges between the black areas and the image will require more bits to encode, and can cause compression artefacts at lower bitrates. Therefore it is highly recommended to crop away any letterboxing when there is no strict requirement on the aspect ratio of the encoded file.

The good news is that programs like HandBrake have a pretty good automatic crop feature. In HB it is enabled by default. You should always have a quick look at the image preview window however, to see if not by a freak chance the program picked all dark frames for detecting the auto-crop, and therefore cuts away too much of the image. You still need to ensure the aspect ratio of the result is right, and matches that of the true source material, not of the letterboxed image.

There is one caveat: some films can have a variable letterbox: sometimes they switch aspect ratios to achieve a certain effect. Some films like The Dark Knight are recorded in a mix of IMAX (1.43:1 aspect ratio) and regular film format (16:9, or 1.78:1). If the autocrop would only look at the regular segments, part of the IMAX image would be cut away. In this case there is no way around leaving the letterbox around the segments with the wider aspect ratio.

Use an appropriate bitrate, or a constant quality

In short: use quality-based encoding whenever you can. Otherwise, use two-pass encoding.

How much bitrate a certain video requires to be encoded with sufficient fidelity that there is no visible quality loss or no obvious degradation, depends entirely on the content of the video. It is obvious that a video where every frame is a perfectly black area, requires almost no information to describe (in fact, I just gave the description and it only took 51 bytes). The other opposite would be a video where every frame is perfectly random noise that cannot be predicted. Real videos are anywhere in between. Generally, the more stuff moves, the more noise in the image, and the harder to describe the visual content, the more bits are needed to represent it adequately.

There are two main ways to encode a video file:

  1. Enforce a certain bitrate. Either the bitrate is enforced every second, or as an average across the entire file.
  2. Enforce a certain minimum quality. The encoder program will dynamically vary the bitrate to ensure the quality meets the minimum.

For some reason, the fixed/average bitrate has become the typical way in which most people encode their videos, although it makes little sense for most use cases. The only case where enforcing a certain bitrate per second makes sense, is when streaming the file over a limited capacity channel like digital TV broadcast. The only case where enforcing an average bitrate across the entire file makes sense, is when the video file must fit in a medium of limited size like a DVD. The average user nowadays stores films on hard drives and have network connections that far exceed the capacity of HDTV, therefore should not really care about the bitrate as long as it stays within bounds of the limits of their playback device.

Nevertheless, if you need to ensure a video file has a certain size, a fixed average bitrate is the way to go. My video bitrate calculator can be of use to determine the required bitrate. Very important: when encoding this way, you should always enable two-pass encoding. This will first do a quick run over your video file to determine where to spend the most bits to keep the overall quality as constant as possible, then it will do the actual encoding. It takes more time, but the end result will be of much better quality and will also be much closer to the desired file size, than if you would only do a single pass.

If you have no strict requirements on how large the video file must be, using quality-based encoding makes a lot more sense. It has the additional advantage that it only requires a single pass. In HandBrake, select ‘Constant Quality’. The slider for H.264 is an exponential slider: for every notch you move it to the right, the output size of your video file will be roughly multiplied by the same factor. The RF 0 value actually means lossless encoding. The lowest sensible RF value to be used is widely considered to be 18 (beyond that, you're wasting bits on encoding invisible details). The highest sensible RF value is about 28. For high-definition movies, 22 is generally a good value, for DVD material it is better to stay around 20.

Denoise if necessary

In short: if you want to preserve film grain, you will need a very high bitrate. If you want a small file, apply denoising to get good image quality at a low bitrate. NLMeans works best.

Modern codecs like H.264 are pretty good at keeping quality acceptable even at lower bitrates. However, although these codecs do have a kind of denoising effect at low bitrates, below a certain point this breaks down and the codec makes a mess of it. If you have a noisy video source (e.g. low-quality VHS tapes, a DVD of an old TV show, a film with a lot of ‘grain’), and you cannot afford encoding it at the extremely high bitrate that will correctly preserve all the noise, then it is a better idea to filter out as much of the noise as possible before the actual encoding starts. The codec will then have a much easier job at producing a good image at a low bitrate.

Recent versions of HandBrake have two types of denoise filters: the old ‘HQDN3D’ (has nothing to do with Duke Nukem 3D by the way), and the new ‘NLMeans’. The HQDN3D filter is fast, and appropriate for material with mild high-frequency noise like minor film grain. If it still does not give good results at the ‘medium’ setting, try the NLMeans filter instead: it is much slower, however it performs much, much better in general. When properly configured, NLMeans can remove most of the noise while preserving details in the image, while HQDN3D will inevitably degrade the entire image. You should first do a few test runs on a fragment of the video to see what kind of result you obtain with various settings, and how much of a reduction in bitrate they offer compared to an encode without denoising.

Above is an example of denoising on a fragment from a film with quite a bit of grain. Hover over the titles to see the different results (requires JavaScript). The bitrate figures shown are for a fixed-quality encoding at RF 21. This shows several things: first, the bitrate requirement is excessive without denoising. Second, although the ‘strong’ HQDN3D setting achieves similar denoising performance as ‘medium’ NLMeans, it destroys most of the fine details while the latter does not. Third, even though NLMeans does preserve more detail, overall it still removes more noise and results in the lowest bitrate at this RF setting.

©2015/05 Alexander Thomas