Zero-Shot Rankability: Revealing Latent Ordinal Structure in Multimodal Large Language Models via Language
Authors: Nam Hyeon-Woo (POSTECH), Moon Ye-Bin (POSTECH), Sohwi Lim (KAIST), Kwon Byung-Ki (POSTECH), Tae-Hyun Oh (KAIST)
Recent work shows that vision encoders capture ordinal attributes along linear axes, which can be recovered from as few as two labeled images. However, in the zero-shot setting, the text-driven rank axis for Vision-Language Models (VLMs) like CLIP remains suboptimal. In this work, we study the embeddings of Multimodal LLMs (MLLMs). We hypothesize that MLLMs can overcome this limitation due to three potential advantages: their inherent ordinal understanding, capacity for conditional embeddings, and a small cross-modal gap. We show that MLLMs are rankable using only text prompts. Experiments demonstrate that a text-driven rank axis for MLLM embeddings achieves 90% of the performance of the supervised linear rank axis, significantly outperforming the 61% observed in VLM embeddings. We validate that this capability stems from MLLMs' conditional embeddings and a smaller modality gap than VLMs. Furthermore, we demonstrate that this property generalizes to the audio domain. Our findings suggest that language provides a direct interface for probing latent ordinal structures in MLLMs.
Measurement-Consistent Langevin Corrector for Stabilizing Latent Diffusion Inverse Problem Solvers
Authors: Lee Hyoseok (KAIST), Sohwi Lim (KAIST), Eunju Cha (Sookmyung Women's University), Tae-Hyun Oh (KAIST)
While latent diffusion models (LDMs) have emerged as powerful priors for inverse problems, existing LDM-based solvers frequently suffer from instability. In this work, we first identify the instability as a discrepancy between the solver dynamics and stable reverse diffusion dynamics learned by the diffusion model, and show that reducing this gap stabilizes the solver. Building on this, we introduce Measurement-Consistent Langevin Corrector (MCLC), a theoretically grounded plug-and-play stabilization module that remedies the LDM-based inverse problem solvers through measurement-consistent Langevin updates. Compared to prior approaches that rely on linear manifold assumptions, which often fail to hold in latent space, MCLC provides a principled stabilization mechanism, leading to more stable and reliable behavior in latent space.
A Language-Guided Bayesian Optimization for Efficient LoRA Hyperparameter Search
Authors: Baek Seong-Eun (POSTECH), Lee Jung-Mok (KAIST), Kim Sung-Bin (POSTECH), Tae-Hyun Oh (KAIST)
Fine-tuning Large Language Models (LLMs) with Low-Rank Adaptation (LoRA) offers a resource-efficient way to personalize or specialize. However, LoRA is highly sensitive to hyperparameter choices, and performing an exhaustive hyperparameter search remains computationally intensive. To address these challenges, we propose a framework that integrates the domain knowledge of pre-trained LLMs into the Bayesian Optimization (BO) process to efficiently search for LoRA hyperparameters. To leverage pre-trained LLMs' knowledge, our approach repurposes them as a discrete-to-continuous mapping module to link hyperparameters and their domain knowledge to a continuous vector space, where BO is conducted. We design and control the mapping via language prompting, providing a domain-aware textual prompt that describes the relationships among hyperparameters and their respective roles. This allows us to explicitly inject domain knowledge about LoRA into the LLM in natural language. We also introduce an additional learnable token to capture residual information that is difficult to describe linguistically in the prompt. This aids BO to sample more high-performing hyperparameters. In addition, by leveraging the strong correlation observed between the performance obtained from full and subset training datasets in LoRA training regimes, we introduce proxy training and evaluation using a data subset. This significantly improves the efficiency of our method. We demonstrate that our hyperparameter, discovered with only about 30 iterations, achieves more than 20% performance improvement over standard hyperparameters found from about 45,000 combinations. Code will be released.
