Related papers: Learning to Approximate: Auto Direction Vector Set…
Deep learning (DL) has shown state-of-the-art performance in trajectory prediction, which is critical to safe navigation in autonomous driving (AD). However, most DL-based methods suffer from catastrophic forgetting, where adapting to a new…
In this letter, we propose HV-Net, a new method for hypervolume approximation in evolutionary multi-objective optimization. The basic idea of HV-Net is to use DeepSets, a deep neural network with permutation invariant property, to…
Vector Symbolic Architecture (VSA) is emerging in machine learning due to its efficiency, but they are hindered by issues of hyperdimensionality and accuracy. As a promising mitigation, the Low-Dimensional Computing (LDC) method…
We propose L2T, an advancement of visual instruction tuning (VIT). While VIT equips Multimodal LLMs (MLLMs) with promising multimodal capabilities, the current design choices for VIT often result in overfitting and shortcut learning,…
Reinforcement learning from human feedback (RLHF) with reward models has advanced alignment of generative models to human aesthetic and perceptual preferences. However, jointly optimizing multiple rewards often incurs an alignment tax,…
Imitation learning is a promising approach for training autonomous vehicles (AV) to navigate complex traffic environments by mimicking expert driver behaviors. While existing imitation learning frameworks focus on leveraging expert…
Conventional video compression (VC) methods are based on motion compensated transform coding, and the steps of motion estimation, mode and quantization parameter selection, and entropy coding are optimized individually due to the…
Distributed trajectory optimization via ADMM-DDP is a powerful approach for coordinating multi-agent systems, but it requires extensive tuning of tightly coupled hyperparameters that jointly govern local task performance and global…
Activation steering has emerged as a promising approach for efficiently adapting large language models (LLMs) to downstream behaviors. However, most existing steering methods rely on a single static direction per task or concept, making…
We introduce LLaVA-Reward, an efficient reward model designed to automatically evaluate text-to-image (T2I) generations across multiple perspectives, leveraging pretrained multimodal large language models (MLLMs). Existing MLLM-based…
Text summarization is a crucial task that requires the simultaneous optimization of multiple objectives, including consistency, coherence, relevance, and fluency, which presents considerable challenges. Although large language models (LLMs)…
Low-Rank Adaptation (LoRA) has emerged as one of the most widely used parameter-efficient fine-tuning (PEFT) methods for adapting large language models (LLMs) to downstream tasks. While highly effective in single-task settings, it struggles…
Reinforcement learning has shown great potential in developing high-level autonomous driving. However, for high-dimensional tasks, current RL methods suffer from low data efficiency and oscillation in the training process. This paper…
Efficient Visual Instruction Fine-Tuning (EVIT) seeks to adapt Multimodal Large Language Models (MLLMs) to downstream tasks with minimal computational overhead. However, as task diversity and complexity increase, EVIT faces significant…
Vector graphics are widely used in digital art and highly favored by designers due to their scalability and layer-wise properties. However, the process of creating and editing vector graphics requires creativity and design expertise, making…
Hyperdimensional computing (HDC) is an emerging computing paradigm that imitates the brain's structure to offer a powerful and efficient processing and learning model. In HDC, the data are encoded with long vectors, called hypervectors,…
In deep learning tasks, the learning rate determines the update step size in each iteration, which plays a critical role in gradient-based optimization. However, the determination of the appropriate learning rate in practice typically…
Variational autoencoders (VAEs) are popular likelihood-based generative models which can be efficiently trained by maximizing an Evidence Lower Bound (ELBO). There has been much progress in improving the expressiveness of the variational…
The hypervolume indicator is one of the most used set-quality indicators for the assessment of stochastic multiobjective optimizers, as well as for selection in evolutionary multiobjective optimization algorithms. Its theoretical properties…
Vision-Language-Action (VLA) models are emerging as a promising paradigm for end-to-end autonomous driving, valued for their potential to leverage world knowledge and reason about complex driving scenes. However, existing methods suffer…