🎉

CVPR’25] Three papers have been accepted!

Tags
Academic
Time
2025/02/25
2 more properties

  Three papers have been accepted to CVPR 2025

This year, we received 13,008 valid submissions that underwent the review process. The program committee recommended 2878 papers for acceptance, resulting in an acceptance rate of 22.1%.

Title: Dr. Splat: Directly Referring 3D Gaussian Splatting via Direct Language Embedding Registration

Authors: Kim Jun-Seong (POSTECH), GeonU Kim (POSTECH), Kim Yu-Ji (POSTECH), Yu-Chiang Frank Wang (NVIDIA), Jaesung Choe (NVIDIA), Tae-Hyun Oh (KAIST, POSTECH)
We introduce Dr. Splat, a novel approach for open-vocabulary 3D scene understanding leveraging 3D Gaussian Splatting. Unlike existing language-embedded 3DGS methods, which rely on a rendering process, our method directly associates language-aligned CLIP embeddings with 3D Gaussians for holistic 3D scene understanding. The key of our method is a language feature registration technique where CLIP embeddings are assigned to the dominant Gaussians intersected by each pixel-ray. Moreover, we integrate Product Quantization (PQ) trained on general large-scale image data to compactly represent embeddings without per-scene optimization. Experiments demonstrate that our approach significantly outperforms existing approaches in 3D perception benchmarks, such as open-vocabulary 3D semantic segmentation, 3D object localization, and 3D object selection tasks.

Title: Perceptually Accurate 3D Talking Head Generation: New Definitions, Speech-Mesh Representation, and Evaluation Metrics

Authors: Lee Chae-Yeon* (POSTECH), Oh Hyun-Bin* (POSTECH), Han EunGi (POSTECH), Kim Sung-Bin (POSTECH), Suekyeong Nam (KRAFTON), Tae-Hyun Oh (KAIST, POSTECH)
Recent advancements in speech-driven 3D talking head generation have achieved impressive advance in lip synchronization. However, existing models still fall short in capturing a perceptual alignment between diverse speech characteristics and lip movements. In this work, we define essential criteria—temporal synchronization, lip readability, and expressiveness— for perceptually accurate lip movements in response to speech signals. We also introduce a speech-mesh synchronized representation that captures the intricate correspondence between speech and facial mesh. We plug in this representation as a perceptual loss to guide lip movements, ensuring they are perceptually aligned with the given speech. Additionally, we utilize this representation as a perceptual metric and introduce two other physically-grounded lip synchronization metrics to evaluate these three criteria. Experiments demonstrate that training 3D talking head models with our perceptual loss significantly enhances all three aspects of perceptually accurate lip synchronization.

Title: Robust 3D Shape Reconstruction in Zero-Shot from a Single Image in the Wild

Authors: Junhyeong Cho (ADD), Kim Youwang (POSTECH), Hunmin Yang (ADD, KAIST), Tae-Hyun Oh (KAIST, POSTECH)
Recent monocular 3D shape reconstruction methods have shown promising zero-shot results on object-segmented images without any occlusions. However, their effectiveness is significantly compromised in real-world settings, due to imperfect object segmentation by off-the-shelf models and the prevalence of occlusions. To address these issues, we propose a unified regression model that integrates segmentation and reconstruction, specifically designed for occlusion-aware 3D shape reconstruction. To facilitate its reconstruction in the wild, we also introduce a scalable data synthesis pipeline that simulates a wide range of variations in objects, occluders, and backgrounds. Training on our synthesized data enables the proposed model to achieve state-of-the-art zero-shot results on real-world images, using significantly fewer model parameters than competing approaches.