
Image Compression
We live in an unprecedented period of time where we are surrounded by numerous digital devices that enable us to carry out our daily activities—such as working, playing, and caring for others—much more effectively than in the past. Many of these activities include streaming vast amounts of data made up of photos and videos, and the leisure industry is always raising the standard of experience to provide a better user experience, such as HD vs. 4K UHD vs. 8K UHD, SDR vs. HDR.
International standards organizations invest a lot of time creating new encoding standards to address these needs, but are still unable to meet all the demands made by the market and the growing number of applications. As a consequence, video quality is constantly being increased at the expense of compression efficiency.
Therefore, a fair balance between quality and compression efficiency must be found amongst a number of factors including bandwidth, service delivery costs, etc. On the other hand, deep-learning techniques are proposed either as end-to-end approaches to provide a significant improvement of the overall quality of the compression standard itself, or to substitute blocks (i.e., intra-coding, downsampling/upsampling blocks) into an existing compression.

AI-Based Media Coding
As a founding member of the MPAI standard, our group is taking part in the exploration phase to see whether it is possible to improve the performance of the Essential Video Coding (EVC) by enhancing or replacing existing video coding tools with AI tools while keeping complexity increase to an acceptable level.
Related Papers
2023

NoR-VDPNet++: Real-Time No-Reference Image Quality Metrics
( , , )
Efficiency and efficacy are desirable properties for any evaluation metric having to do with Standard Dynamic Range (SDR) imaging or with High Dynamic Range (HDR) imaging. However, it is a daunting task to satisfy both properties simultaneously. On the one side, existing evaluation metrics like HDR-VDP 2.2 can accurately mimic the Human Visual System (HVS), but this typically comes at a very high computational cost. On the other side, computationally cheaper alternatives (e.g., PSNR, MSE, etc.) fail to capture many crucial aspects of the HVS. In this work, we present NoR-VDPNet++, a deep learning architecture for converting full-reference accurate metrics into no-reference metrics thus reducing the computational burden. We show NoR-VDPNet++ can be successfully employed in different application scenarios.
@ARTICLE{10089442,
author={Banterle, Francesco and Artusi, Alessandro and Moreo, Alejandro and Carrara, Fabio and Cignoni, Paolo},
journal={IEEE Access},
title={NoR-VDPNet++: Real-Time No-Reference Image Quality Metrics},
year={2023},
volume={11},
number={},
pages={34544-34553},
doi={10.1109/ACCESS.2023.3263496}}
2023

Modern High Dynamic Range Imaging at the Time of Deep Learning
( , , )
In this tutorial, we introduce how the High Dynamic Range (HDR) imaging field has evolved in this new era where machine
learning approaches have become dominant. The main reason for this success is that the use of machine learning and deep
learning has automatized many tedious tasks achieving high-quality results overperforming classic methods. After an introduction to classic HDR imaging and its open problem, we will summarize the main approaches for merging of multiple exposures,
single image reconstructions or inverse tone mapping, tone mapping, and display visualization.
@inproceedings {10.2312:egt.20231033,
booktitle = {Eurographics 2023 – Tutorials},
editor = {Serrano, Ana and Slusallek, Philipp},
title = {{Modern High Dynamic Range Imaging at the Time of Deep Learning}},
author = {Banterle, Francesco and Artusi, Alessandro},
year = {2023},
publisher = {The Eurographics Association},
ISSN = {1017-4656},
ISBN = {978-3-03868-212-7},
DOI = {10.2312/egt.20231033}
}
2022

IBC2022 Tech Papers: Towards an AI-enhanced video coding standard
( , , )
This paper describes the ongoing activities of the Enhanced Video coding (EVC) project of the Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI). Theproject investigates how the performances of existing codecs can be improved by enhancing or replacing specific encoding tools with AI-based counterparts. The MPEG EVC codec baseline profile has been chosen as reference as it relies on encoding tools thatare at least 20 years mature yet has compression efficiency close to HEVC. A framework has been developed to interface the encoder/decoder with neural networks, independently from the specific learning toolkit, simplifying experimentation. So far, the EVCproject has investigated the intra prediction and the super resolution coding tools. The standard intra prediction modes have been integrated by a learnable predictor: experiments in standard test conditions show rate reductions for intra coded frames in excess of 4% over the reference.The use of super resolution, a state-of-the-art deep-learning approach named Densely Residual Laplacian Network (DRLN),at the decoder side have been found to provide further gains, over the reference, in the order of 3% inthe SD to HD context
2022

AI-Based Media Coding Standards
( , , )
Moving Picture, Audio and Data Coding by Artificial Intelligence (MPAI) is the first standards organization to develop data coding standards that have artificial intelligence (AI) as their core technology. MPAI believes that universally accessible standards for AI-based data coding can have the same positive effects on AI as standards had on digital media. Elementary components of MPAI standards–AI modules (AIMs)–expose standard interfaces for operation in a standard AI framework (AIF). As their performance may depend on the technologies used, MPAI expects that competing developers providing AIMs will promote horizontal markets of AI solutions that build on and further promote AI innovation. Finally, the MPAI framework licences (FWLs) provide guidelines to intellectual property right (IPR) holders facilitating the availability of compatible licenses to standard users.
@article{basso2022ai,
title={AI-Based Media Coding Standards},
author={Basso, Andrea and Ribeca, Paolo and Bosi, Marina and Pretto, Niccol{\`o} and Chollet, G{\’e}rard and Guarise, Michelangelo and Choi, Miran and Chiariglione, Leonardo and Iacoviello, Roberto and Banterle, Franesco and others},
journal={SMPTE Motion Imaging Journal},
volume={131},
number={4},
pages={10–20},
year={2022},
publisher={SMPTE}
}
2021

AI-based Media Coding and Beyond
( , , )
MPAI – Moving Picture, Audio and Data Coding by Artificial Intelligence is the first body developing data coding standards that have Artificial Intelligence (AI) as its core technology. MPAI believes that universally accessible standards for AI-based data coding can have the same positive effects on AI as standards had on digital media.
Elementary components of MPAI standards – AI Modules (AIM) – expose standard interfaces for operations in a standard AI Framework (AIF). As their performance may depend on the technologies used, MPAI expects that competing developers providing AIMs will promote horizontal markets of AI solutions that build on and further promote AI innovation.
Finally, the MPAI Framework Licenses provide guidelines to IPR holders facilitating the availability of compatible licenses to standard users.
2021

NoR-VDPNet++: Efficient Training and Architecture for Deep No-Reference Image Quality Metrics
( , , )
Efficiency and efficacy are two desirable properties of the outmost importance for any evaluation metric having to do with Standard Dynamic Range (SDR) imaging or High Dynamic Range (HDR) imaging. However, these properties are hard to achieve simultaneously. On the one side, metrics like HDR-VDP2.2 are known to mimic the human visual system (HVS) very accurately, but its high computational cost prevents its widespread use in large evaluation campaigns. On the other side, computationally cheaper alternatives like PSNR or MSE fail to capture many of the crucial aspects of the HVS. In this work, we try to get the best of the two worlds: we present NoR-VDPNet++, an improved variant of a previous deep learning-based metric for distilling HDR-VDP2.2 into a convolutional neural network (CNN). In this work, we try to get the best of the two worlds: we present NoR-VDPNet++, an improved versionof a deep learning-based metric for distilling HDR-VDP2.2 into a convolutional neural network (CNN).
@incollection{banterle2021nor,
title={NoR-VDPNet++: Efficient Training and Architecture for Deep No-Reference Image Quality Metrics},
author={Banterle, Francesco and Artusi, Alessandro and Moreo, Alejandro and Carrara, Fabio},
booktitle={ACM SIGGRAPH 2021 Talks},
pages={1–2},
year={2021}
}
2020

NoR-VDPNet: A No-Reference High-Dynamic-Range Quality Metric Trained on HDR-VDP 2
( , , )
HDR-VDP 2 has convincingly shown to be a reliable metric for image quality assessment, and it is currently playing a remarkable role in the evaluation of complex image processing algorithms. However, HDR-VDP 2 is known to be computationally expensive (both in terms of time and memory) and is constrained to the availability of a ground-truth image (the so-called reference) against to which the quality of a processed imaged is quantified. These aspects impose severe limitations on the applicability of HDR-VDP 2 to realworld scenarios involving large quantities of data or requiring real-time responses. To address these issues, we propose Deep No-Reference Quality Metric (NoR-VDPNet), a deep-learning approach that learns to predict the global image quality feature (i.e., the mean-opinion-score index Q) that HDR-VDP 2 computes. NoR-VDPNet is no-reference (i.e., it operates without a ground truth reference) and its computational cost is substantially lower when compared to HDR-VDP 2 (by more than an order of magnitude). We demonstrate the performance of NoR-VDPNet in a variety of scenarios, including the optimization of parameters of a denoiser and JPEG-XT.
@INPROCEEDINGS{9191202,
author={F. {Banterle} and A. {Artusi} and A. {Moreo} and F. {Carrara}},
booktitle={2020 IEEE International Conference on Image Processing (ICIP)},
title={Nor-Vdpnet: A No-Reference High Dynamic Range Quality Metric Trained On Hdr-Vdp 2},
year={2020},
volume={},
number={},
pages={126-130},
doi={10.1109/ICIP40778.2020.9191202}}
2019

Efficient Evaluation of Image Quality via Deep-Learning Approximation of Perceptual Metrics
( , , )
Image metrics based on Human Visual System (HVS) play a remarkable role in the evaluation of complex image processing algorithms. However, mimicking the HVS is known to be complex and computationally expensive (both in terms of time and memory), and its usage is thus limited to a few applications and to small input data. All of this makes such metrics not fully attractive in real-world scenarios. To address these issues, we propose Deep Image Quality Metric (DIQM), a deep-learning approach to learn the global image quality feature (mean-opinion-score). DIQM can emulate existing visual metrics efficiently, reducing the computational costs by more than an order of magnitude with respect to existing implementations.
@ARTICLE{8861304, author={A. {Artusi} and F. {Banterle} and F. {Carra} and A. {Moreno}}, journal={IEEE Transactions on Image Processing}, title={Efficient Evaluation of Image Quality via Deep-Learning Approximation of Perceptual Metrics}, year={2020}, volume={29}, number={}, pages={1843-1855}, abstract={Image metrics based on Human Visual System (HVS) play a remarkable role in the evaluation of complex image processing algorithms. However, mimicking the HVS is known to be complex and computationally expensive (both in terms of time and memory), and its usage is thus limited to a few applications and to small input data. All of this makes such metrics not fully attractive in real-world scenarios. To address these issues, we propose Deep Image Quality Metric (DIQM), a deep-learning approach to learn the global image quality feature (mean-opinion-score). DIQM can emulate existing visual metrics efficiently, reducing the computational costs by more than an order of magnitude with respect to existing implementations.}, keywords={approximation theory;computer vision;feature extraction;image enhancement;learning (artificial intelligence);neural nets;deep-learning approximation;perceptual metrics;human visual system;HVS;image processing algorithms;Deep Image Quality Metric;DIQM;deep-learning approach;visual metrics;image quality feature;Measurement;Image quality;Visualization;Distortion;Indexes;Feature extraction;Convolutional neural networks (CNNs);objective metrics;image evaluation;human visual system;JPEG-XT;and HDR imaging}, doi={10.1109/TIP.2019.2944079}, ISSN={1941-0042}, month={9},}