How to improve video resolution without pixelation

Peter Huh
4 min readJun 1, 2021

Resolution refers to the number of distinct pixels that could be displayed in a given dimension. Increasing resolution is in fact quite simple. Any video editing tools or even a couple lines of code can make a 256x256 image into a 512x512 image very quickly.

Since “more information” has to be filled in to make 256 x 256 pixels into 512x512, these tools typically rely on a variety of interpolation methods. One of the most common interpolation method is bicubic interpolation and this can be done with a single line of code in Python.

import cv2img = cv2.imread('path/to/img')resized_img = cv2.resize(
img, (512,512), interpolation=cv2.INTER_CUBIC
)

The above lines will produce an output like this.

256x256 original image
512x512 resolution using bicubic interpolation

As you can see, the image got bigger. However, the quality of the image got worse — it got pixelated. So how can we enhance the actual quality and not just the resolution? Interpolation methods may not be enough for this. We may need some deep learning solutions.

First Attempt — OpenCV Contrib dnn_superres:

Since my ultimate goal is to improve video resolution, I searched for a model that is very light. The first thing I found was OpenCV contrib’s dnn_superres.

To run dnn_superres, you first need to install OpenCV with the contrib modules.

pip install opencv-contrib-python

Then, you must manually download one of the following four pre-trained models. EDSR, ESPCN, FSRCNN, and LapSRN. EDSR is the heaviest model, and ESPCN, FSRCNN are both very light.

import cv2
from cv2 import dnn_superres
#initialize SR Object
sr = dnn_superres.DnnSuperResImpl_create()
#read image
image = cv2.imread('path/to/image')
# Read pre-trained model
path_to_model = "FSRCNN_x2.pb"
sr.readModel(path_to_model)
# Load model into SR oject
sr.setModel("fsrcnn", 2)
# Upsample image
result = sr.upsample(image)
# Save image
cv2.imwrite("./result.jpeg", result)
Left — Resized to 512 x 512 using Bicubic Interpolation. Right — Resized to 512 x 512 using FSRCNN_X2 super resolution.

To be honest, left and right images look almost identical. I tried it with all of the four EDSR, ESPCN, FSRCNN, and LapSRN models, but the results were all similar. Since these are light models and were trained on a small data set, performance may not be so great. However, FSRCNN can process 24 fps with a CPU, so this is definitely a good candidate for real-time video enhancing with some additional fine-tuning.

Second Attempt — Face Restoration Using DFDNet

Since I am dealing with President Trump’s face image, I searched for face restoration techniques to achieve what I need — something that could reduce blurs on face image. Then I came across DFDNet implemented here.

It’s quite simple to run inference on images by following instructions in the above link. It first detects face in an image and crops it out. It restores face with DFDNet, and recombines restored face to the original image while upscaling it.

Since disk I/O operations are expensive, instead of unpacking video into frame images and recombining them at the end, I modified the inference file to loop through video using cv2.VideoCapture().

video_result = []
while (cap.isOpened()):
ret, frame = cap.read()
if not ret:
break
... Do DFDNet Stuff on frame... video_result.append(final_res)

This occupies more RAM, but can process video files faster.

One problem was that DFDNet was causing a weird color change. I was able to resolve this issue by histogram matching.

from skimage import io, exposure//restored_face - DFD Net output
//init_crop - cropped face
//matched - color corrected image through histogram matching
matched = exposure.match_histograms(restored_face, init_crop, multichannel=multi)
left: cropped face original / middle: DFDNet output / right: color corrected through histogram matching
left: original image / right: face restoration using DFDNet

DFDNet clearly improves the quality of image. The effects are quite noticeable especially around President Trump’s eyes and forehead.

I ran this inference on a video file with 137 frames, and it took a total of 84.56 seconds, meaning it can enhance around 1.62 frames per second. This is very slow and the process also occupies a lot of RAM. It suggests that DFDNet is inapplicable for real time video enhancing. However, DFDNet sure is very performant.

Conclusion

Overall, I realized improving video resolution without pixelation is a very difficult task. It is challenging to tackle both speed and quality at the same time, but I believe there will be, or already exists a solution to this. I am looking forward to experiment with more open source super resolution models, and fine-tune the ones I tried so far.

--

--