2025年12月1日–4日,2025国际电气与电子工程师协会视觉通信与图像处理国际会议(IEEE VCIP 2025)于奥地利克拉根福举行,今年的主题是“SUSTAINABLE AND TRUSTWORTHY VISUAL COMMUNICATIONS IN THE AGE OF AI”。
深度学习图像视频压缩
腾讯多媒体实验室荣获“Ultra Low-Bitrate Video Compression”比赛冠军,专家研究员 Liang Zhao博士代表腾讯受邀进行了主题为“Practical End-to-End Image/Video Compression”的分享,介绍了腾讯多媒体实验室在基于端到端深度学习的图像视频方面的研究进展。所提出的方法在性能与效率上取得显著突破,画质与编解码速度均大幅超越传统编解码方案。也体现了实验室在“AI+”战略下,持续推动端到端深度学习图像视频编码从理论走向实用化落地。

今年挑战赛的一大主题是“Practical”,对于端到端图像/视频编码,如何将它真正落地应用是学界,更是工业界面临的一个直接挑战。一些常见的AI编码器方案需要较复杂的网络层数、浮点运算、以及熵模型的自相关依赖以保证画质。

图:端到端图像编码器示例
但这样的网络设计使得终端实际解码时一方面无法保证跨平台一致性,另一方面,自相关依赖也使得无法充分利用GPU并行性。面对这样的问题,多媒体实验室研究员基于更高效的网络设计,进一步引入大幅工程优化。目前该方案已经在AI图片编解码场景得以实现,PSNR/SSIM指标远优于265的画质同时,在移动终端解码性能超越265 HEIF。

图:终端画质与解码时延效果(大幅超越HEIF编解码)
大会入选论文
除斩获赛事冠军外,在VCIP 2025会议,腾讯多媒体实验室有4篇论文入选,内容涵盖视频压缩中的块划分,帧内预测,帧内块拷贝以及3D场景重建等研究方向,展现了多媒体实验室在视频压缩和3D场景重建领域的技术能力与创新突破。
以下为入选论文概况:
01
超越 AV1 的无损编码改进
Lossless Coding Improvement beyond AV1
Tianqi Liu, Liang Zhao, Madhu Peringassery Krishnan, Shan Liu, Minhao Tang
Abstract:
Lossless compression plays an important role in the storage and transmission of data with stringent quality requirements. There is a substantial demand for enhancing the lossless compression performance of the current AV1 codec. In this paper, two novel techniques, named Residual Block Refinement (RBR) mode and Multi-Residual Blocks (MRB) mode, are introduced to improve the lossless coding performance beyond AV1. For the RBR mode, the main idea is to perform a lossless block refinement within the residual block to further reduce redundancy. For the MRB mode, the first partial residual block utilizes a traditional transform and quantization process to generate a lossy representation of the original residual samples with efficient energy compaction. The second partial residual block is further coded to achieve a perfect representation of the difference between the original residual block and the reconstructed first residual block. The experimental results reveal that an average coding performance -2.76%, -1.04%, and -1.18% are achieved on top of the AOMedia Video Model (AVM) v6.0.0 in terms of Bitrate savings for allIntra (AI), Random Access (RA), and Low Delay (LD) configurations, respectively.

Figure 1. Proposed Residual Block Refinement Mode

Figure 2. Proposed Multi Residual Blocks Mode
02
改进的超越 AV1的块内复制模式
Improved Intra Block Copy Mode Beyond AV1
Qiangyang Zhou, Liang Zhao, Wei Kuang, Madhu Peringassery Krishnan, Tianqi Liu, Jayasingam Adhuran, Shan Liu
Abstract:
Intra Block Copy (IntraBC) is a key coding tool for screen content video, enabling blocks to reference previously reconstructed regions within the same frame. While IntraBC was introduced in AV1 and further enhanced in VVC, the AVM development framework offers new opportunities to improve its coding efficiency and hardware compatibility. This paper proposes three enhancements adopted into the AVM reference software: (1) a unified and extended local reference buffer design supporting multiple superblock sizes with fixed memory constraints; (2) a decoupled global-local search strategy that im proves block vector prediction efficiency; and (3) the integration of intra Block-Adaptive Weighted Prediction (BAWP) into local IntraBC. Experimental results show significant screen content coding gains:-3.53%,-2.25%,-1.43% (YUV-PSNR) for all intra, random access and low delay, respectively, with minimal runtime overhead. These contributions have been adopted into the AVM standard and reference software.

03
AVM中半解耦树分区的低延迟方案
Low latency scheme for semi decoupled tree partition in AVM
Jayasingam Adhuran, Liang Zhao, Madhu Peringassery Krishnan, Tianqi Liu, Shan Liu
Abstract:
The latest initiative of Alliance for Open Media Video (AOM), named as AOM Video Model (AVM), is expected to introduce new coding tools to enhance compression benefits. Semi-Decoupled Partitioning (SDP) in AVM decouples the shared tree to support separate block partitioning for the luma and chroma channels from 64×64. Further, Chroma from Luma (CfL) is a chroma-only coding tool in AVM that applies collocated luma reconstructed samples in predicting chroma samples. The dependency of reconstructed luma samples in CfL can result in a delayed decoding process of chroma blocks in separate tree partitioning and introduce a worst-case latency of 4096 luma samples. In response, this study proposes a CfL constrained strategy to reduce the worst-case latency by selectively disallowing the CfL mode in a given chroma partition tree. Detailed latency analysis is also provided to confirm the reduction of worst-case latency to 2048 luma samples. The experiments are implemented on top research-v10.0.0 under Common Test Conditions (CTC) V7. The experimental results show that when the worst-case decoder latency is minimized to 2048 luma samples, the coding loss can be kept to minimal with an average loss of 0.02% for the YUV components in random access configurations with no change in encoder and decoder timings.

Figure 1. Cases when maximum decoder latency is greater than 2048 luma samples.

Figure 2. Cases when maximum decoder latency is not greater than 2048 luma samples.
04
FPS:高斯泼溅质量评估的新型测试视角选择策略
FPS: A Novel Test View Selection Strategy for Gaussian Splatting Quality Evaluation
Xueshi Hou, Shan Liu
Abstract:
Current quality evaluations of 3D Gaussian Splatting typically rely on fixed-interval view sampling (e.g., selecting every 8th view as test view) to split training and test datasets, which may lead to biased assessments of reconstruction quality due to limited spatial coverage and view diversity. To address this issue, we propose a novel view selection strategy based on Farthest Point Sampling (FPS) that optimizes test view selection by maximizing the spatial coverage of camera positions and the diversity of viewing directions. In our experiments, the Baseline refers to the official implementation of the classic 3D Gaussian Splatting method, and our FPS method modifies the train/test split selection strategy within the same framework for fair comparison. We conduct a subjective experiment by rendering Gaussian Splatting results into video sequences and collecting subjective scores from 16 viewers on 10 scenes, including both real-world and synthetic datasets. Experimental results demonstrate that our FPS-based method provides a more comprehensive and reliable evaluation of 3D Gaussian Splatting quality, outperforming the conventional fixed-interval view sampling approach and establishing a robust framework for performance assessment.

会议议程:
请随时与我们联系并分享您的需求
腾讯多媒体实验室
medialab@tencent.com