腾讯携手谷歌在SPIE 2025组织体积视觉媒体Special Session，多媒体实验室5篇论文入选

2025年8月3日-8月7日，SPIE大会于圣地亚哥顺利举办。腾讯携手谷歌组织体积视觉媒体Special Session。在本次会议中，多媒体实验室发表5篇论文，包含4篇独立论文和1篇联合发表论文。

请在此添加图片描述

腾讯多媒体实验室高级研究员Dr. Chao Huang, Dr. Fang-Yi Chao, and Dr. Pranav Kadam

请在此添加图片描述

Tencent & Google

体积视觉媒体是一种新兴的 3D 沉浸式技术，可以捕捉并呈现具有深度和体积的物体或环境。这使得用户可以从多个视角与数字内容进行交互，从而创造更逼真的全息体验。体积视觉媒体是增强现实 (AR)、虚拟现实 (VR) 和混合现实 (MR) 应用的核心，推动着游戏、电影制作、动画、制造和文化遗产等领域的创新。随着市场对扩展现实、空间和沉浸式媒体应用的兴趣日益浓厚，对高效体积视觉数据编码和处理技术的需求持续增长。

腾讯多媒体实验室共有5篇论文入选SPIE 2025，内容涵盖体积视觉媒体的编码、处理和渲染等研究方向，展现了多媒体实验室在三维数据压缩和处理方面的技术能力与创新突破。

以下为入选论文概况：

通过连接跳过和预测细化对重复网格连通分量进行高效编码

Efficient Coding of Repeated Mesh Connected Components via Connectivity Skip and Prediction Refinement

Pranav Kadam, Chao Huang, Shan Liu

Abstract:

Repeated connected components exist in meshes, especially in digitally created contents (DCC). These connected components share the same connectivity and are typically related by a rigid transformation (rotation and/or translation). Encoding the same connectivity multiple times is redundant. In addition, the existence of rigid transform among repeated connected components can be exploited to better predict vertex positions. In this paper, we first propose to concurrently encode mesh connected components which share the same connectivity. The input mesh is examined and connected components with the same connectivity are grouped together. Then, for each group, the shared connectivity is coded followed by coding positions of all the vertices within the group. Secondly, a prediction refinement scheme is proposed in which the residue of an already encoded vertex improves the prediction of its corresponding vertex in another connected component within the same group.

请在此添加图片描述

Link:

https://spie.org/optics-photonics/presentation/Efficient-coding-of-repeated-mesh-connected-components-via-connectivity-skip/13605-20

由网格几何引导的高效纹理坐标预测

Efficient texture coordinate prediction guided by mesh geometry

Fang-Yi Chao, Chao Huang, Shan Liu

Abstract:

Texture coordinates are an essential mesh attribute for rendering the texture image onto a 3D mesh surface. They are 2D coordinates that describe the mapping of the 3D mesh surface into a 2D texture space. Texture coordinates can be generated by parameterization algorithms that preserve the inner angle and area of the faces in 3D mesh geometry to minimize distortion. These conformal or quasi-conformal properties are leveraged in the geometry-guided texture coordinates prediction, which takes account of the associated vertices in the 3D geometry to compute the prediction weights for 2D texture coordinates. However, since only the tangential direction of the 3D geometry is considered, the prediction performs poorly when the surface is bumpy. This paper proposes an effective algorithm to take account of normal directions of the 3D geometry surface for more accurate 2D texture coordinate prediction.

请在此添加图片描述

Link:

https://spie.org/optics-photonics/presentation/Efficient-texture-coordinate-prediction-guided-by-mesh-geometry/13605-21

可定向流形多边形网格的快速确定

Efficient determination of orientable manifold polygon meshes

Chao Huang, Shan Liu

Abstract:

Many polygon mesh processing methods, including compression methods like the the Edgebreaker and the Dual-Degree algorithms, require the input mesh to be an orientable 2-manifold. In this paper, we present an algorithm to efficiently determine whether a polygon mesh or any of its connected components (CCs) is an orientable 2-manifold. We show the effectiveness of the proposed method on polygon mesh compression by applying it to adaptively select the connectivity coding algorithms for different CCs in the input mesh.

请在此添加图片描述

Link:

https://spie.org/optics-photonics/presentation/Efficient-determination-of-orientable-manifold-polygon-meshes/13605-17

使用 VVC 实现 VR 多视图流传输和显示的编码、解码和渲染管道

Encoding, Decoding and Rendering Pipeline for Multiview Streaming and Display in VR using VVC

Xueshi Hou, Shan Liu

Abstract:

Virtual reality (VR) head-mounted displays (HMDs) require high bitrates for immersive, high-quality visuals. Versatile Video Coding (VVC) significantly improves compression efficiency, making it ideal for VR multiview streaming and display. However, VR multiview streaming and display face two challenges: ultra-high bandwidth and ultra-low latency requirements for an optimal user experience. This paper presents an integrated, end-to-end pipeline to address these challenges, including (i) field of view (FOV)-based bitrate allocation, (ii) adaptive streaming for dynamic network conditions, and (iii) real-time decoding and rendering optimizations. We explore FOV-based bitrate allocation to prioritize user’s focus area and adaptive streaming algorithms to handle network fluctuations. For high-end devices like Vision Pro, we implement a software-based decoding and rendering approach with VVC, minimizing latency and improving rendering efficiency. This work presents a VVC-based pipeline for multiview streaming and display in VR, focusing on bitrate allocation, adaptive streaming, and real-time decoding and rendering optimizations.

请在此添加图片描述

Link:

https://spie.org/optics-photonics/presentation/Encoding-decoding-and-rendering-pipeline-for-multiview-streaming-and-display/13605-32

三角形网格连通性的正则化

Connectivity regularization for triangular meshes

Igor Vytyaz, Chao Huang, Ondrej Stava, Pranav Kadam, Roshan Baliga, Shan Liu, Frank Galligan

Abstract:

This paper introduces novel connectivity regularization techniques for triangle-dominated meshes aimed at improving lossless compression efficiency. It addresses the irregular connectivity that arises from the triangulation of quad meshes, where vertex degrees typically range from four to eight. The proposed regularization detects triangle pairs that comprise the original quads and applies one of two regularization techniques. The first technique flips diagonal edges of certain quads to achieve consistent diagonal edge orientations. The second technique restores the original quads by removing their diagonal edges. This results in a connectivity where most vertices are surrounded either by six triangles or by four quads. To ensure lossless compression, the affected edge data can be encoded in the bitstream, allowing the decoder to restore the original mesh connectivity. The results demonstrate compression gains for meshes with position, texture coordinate, and normal attributes. Adaptively applying the two techniques yielded an average compression gain of 1.9% when preserving the affected edge data for lossless compression and 5.9% when discarding the affected edge data.

请在此添加图片描述