Related papers: MIREncoder: Multi-modal IR-based Pretrained Embedd…

Performance Optimization using Multimodal Modeling and Heterogeneous GNN

Growing heterogeneity and configurability in HPC architectures has made auto-tuning applications and runtime parameters on these systems very complex. Users are presented with a multitude of options to configure parameters. In addition to…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-04-28 Akash Dutta , Jordi Alcaraz , Ali TehraniJamsaz , Eduardo Cesar , Anna Sikora , Ali Jannesari

Unleashing the Power of Compiler Intermediate Representation to Enhance Neural Program Embeddings

Neural program embeddings have demonstrated considerable promise in a range of program analysis tasks, including clone identification, program repair, code completion, and program synthesis. However, most existing methods generate neural…

Software Engineering · Computer Science 2022-04-21 Zongjie Li , Pingchuan Ma , Huaijin Wang , Shuai Wang , Qiyi Tang , Sen Nie , Shi Wu

A Shared Encoder Approach to Multimodal Representation Learning

Multimodal representation learning has demonstrated remarkable potential in enabling models to process and integrate diverse data modalities, such as text and images, for improved understanding and performance. While the medical domain can…

Computer Vision and Pattern Recognition · Computer Science 2025-03-04 Shuvendu Roy , Franklin Ogidi , Ali Etemad , Elham Dolatabadi , Arash Afkanpour

IRCoder: Intermediate Representations Make Language Models Robust Multilingual Code Generators

Code understanding and generation have fast become some of the most popular applications of language models (LMs). Nonetheless, research on multilingual aspects of Code-LMs (i.e., LMs for code generation) such as cross-lingual transfer…

Artificial Intelligence · Computer Science 2024-04-16 Indraneil Paul , Goran Glavaš , Iryna Gurevych

LightningDOT: Pre-training Visual-Semantic Embeddings for Real-Time Image-Text Retrieval

Multimodal pre-training has propelled great advancement in vision-and-language research. These large-scale pre-trained models, although successful, fatefully suffer from slow inference speed due to enormous computation cost mainly from…

Computation and Language · Computer Science 2021-04-13 Siqi Sun , Yen-Chun Chen , Linjie Li , Shuohang Wang , Yuwei Fang , Jingjing Liu

Multimodal Deep Learning for Low-Resource Settings: A Vector Embedding Alignment Approach for Healthcare Applications

Large-scale multi-modal deep learning models have revolutionized domains such as healthcare, highlighting the importance of computational power. However, in resource-constrained regions like Low and Middle-Income Countries (LMICs), limited…

Machine Learning · Computer Science 2024-06-06 David Restrepo , Chenwei Wu , Sebastián Andrés Cajas , Luis Filipe Nakayama , Leo Anthony Celi , Diego M López

M$^{3}$3D: Learning 3D priors using Multi-Modal Masked Autoencoders for 2D image and video understanding

We present a new pre-training strategy called M$^{3}$3D ($\underline{M}$ulti-$\underline{M}$odal $\underline{M}$asked $\underline{3D}$) built based on Multi-modal masked autoencoders that can leverage 3D priors and learned cross-modal…

Computer Vision and Pattern Recognition · Computer Science 2023-09-28 Muhammad Abdullah Jamal , Omid Mohareri

MLComp: A Methodology for Machine Learning-based Performance Estimation and Adaptive Selection of Pareto-Optimal Compiler Optimization Sequences

Embedded systems have proliferated in various consumer and industrial applications with the evolution of Cyber-Physical Systems and the Internet of Things. These systems are subjected to stringent constraints so that embedded software must…

Machine Learning · Computer Science 2021-10-12 Alessio Colucci , Dávid Juhász , Martin Mosbeck , Alberto Marchisio , Semeen Rehman , Manfred Kreutzer , Guenther Nadbath , Axel Jantsch , Muhammad Shafique

FOLDER: Accelerating Multi-modal Large Language Models with Enhanced Performance

Recently, Multi-modal Large Language Models (MLLMs) have shown remarkable effectiveness for multi-modal tasks due to their abilities to generate and understand cross-modal data. However, processing long sequences of visual tokens extracted…

Computer Vision and Pattern Recognition · Computer Science 2025-04-11 Haicheng Wang , Zhemeng Yu , Gabriele Spadaro , Chen Ju , Victor Quétu , Shuai Xiao , Enzo Tartaglione

Magic-MM-Embedding: Towards Visual-Token-Efficient Universal Multimodal Embedding with MLLMs

Multimodal Large Language Models (MLLMs) have shown immense promise in universal multimodal retrieval, which aims to find relevant items of various modalities for a given query. But their practical application is often hindered by the…

Computer Vision and Pattern Recognition · Computer Science 2026-02-06 Qi Li , Yanzhe Zhao , Yongxin Zhou , Yameng Wang , Yandong Yang , Yuanjia Zhou , Jue Wang , Zuojian Wang , Jinxiang Liu

Do Recommender Systems Really Leverage Multimodal Content? A Comprehensive Analysis on Multimodal Representations for Recommendation

Multimodal Recommender Systems aim to improve recommendation accuracy by integrating heterogeneous content, such as images and textual metadata. While effective, it remains unclear whether their gains stem from true multimodal understanding…

Information Retrieval · Computer Science 2025-08-07 Claudio Pomo , Matteo Attimonelli , Danilo Danese , Fedelucio Narducci , Tommaso Di Noia

Towards Robust Low-Resource Fine-Tuning with Multi-View Compressed Representations

Due to the huge amount of parameters, fine-tuning of pretrained language models (PLMs) is prone to overfitting in the low resource scenarios. In this work, we present a novel method that operates on the hidden representations of a PLM to…

Computation and Language · Computer Science 2023-05-29 Linlin Liu , Xingxuan Li , Megh Thakkar , Xin Li , Shafiq Joty , Luo Si , Lidong Bing

Multi-task Learning based Pre-trained Language Model for Code Completion

Code completion is one of the most useful features in the Integrated Development Environments (IDEs), which can accelerate software development by suggesting the next probable token based on the contextual code in real-time. Recent studies…

Software Engineering · Computer Science 2021-01-01 Fang Liu , Ge Li , Yunfei Zhao , Zhi Jin

PromptEmbedder:: Efficient and Transferable Text Embedding via Dual-LLM Soft Prompting

Large Language Models (LLMs) have demonstrated remarkable efficacy in text embedding, yet current adaptation methods like LoRA face significant bottlenecks in computational efficiency and cross-architecture transferability. Whenever a new…

Computation and Language · Computer Science 2026-05-28 Yu-Che Tsai , Kuan-Yu Chen , Yuan-Hao Chen , Yu-Han Chang , Ching-Yu Tsai , Yu-Hsiang Chuang , Shou-De Lin

Multi-Modality Expansion and Retention for LLMs through Parameter Merging and Decoupling

Fine-tuning Large Language Models (LLMs) with multimodal encoders on modality-specific data expands the modalities that LLMs can handle, leading to the formation of Multimodal LLMs (MLLMs). However, this paradigm heavily relies on…

Computation and Language · Computer Science 2025-05-26 Junlin Li , Guodong DU , Jing Li , Sim Kuan Goh , Wenya Wang , Yequan Wang , Fangming Liu , Ho-Kin Tang , Saleh Alharbi , Daojing He , Min Zhang

MOVE: A Mixture-of-Vision-Encoders Approach for Domain-Focused Vision-Language Processing

Multimodal language models (MLMs) integrate visual and textual information by coupling a vision encoder with a large language model through the specific adapter. While existing approaches commonly rely on a single pre-trained vision…

Computer Vision and Pattern Recognition · Computer Science 2025-02-24 Matvey Skripkin , Elizaveta Goncharova , Dmitrii Tarasov , Andrey Kuznetsov

Investigating Redundancy in Multimodal Large Language Models with Multiple Vision Encoders

Recent multimodal large language models (MLLMs) increasingly integrate multiple vision encoders to improve performance on various benchmarks, assuming that diverse pretraining objectives yield complementary visual signals. However, we show…

Computer Vision and Pattern Recognition · Computer Science 2026-02-16 Yizhou Wang , Song Mao , Yang Chen , Yufan Shen , Yinqiao Yan , Pinlong Cai , Ding Wang , Guohang Yan , Zhi Yu , Xuming Hu , Botian Shi

Elevating Visual Perception in Multimodal LLMs with Visual Embedding Distillation

In recent times, the standard practice for developing MLLMs is to feed features from vision encoder(s) into the LLM and train with natural language supervision. This approach often causes models to lean towards language comprehension and…

Computer Vision and Pattern Recognition · Computer Science 2025-10-20 Jitesh Jain , Zhengyuan Yang , Humphrey Shi , Jianfeng Gao , Jianwei Yang

Meta Large Language Model Compiler: Foundation Models of Compiler Optimization

Large Language Models (LLMs) have demonstrated remarkable capabilities across a variety of software engineering and coding tasks. However, their application in the domain of code and compiler optimization remains underexplored. Training…

Programming Languages · Computer Science 2024-07-04 Chris Cummins , Volker Seeker , Dejan Grubisic , Baptiste Roziere , Jonas Gehring , Gabriel Synnaeve , Hugh Leather

Memory Reviving, Continuing Learning and Beyond: Evaluation of Pre-trained Encoders and Decoders for Multimodal Machine Translation

Multimodal Machine Translation (MMT) aims to improve translation quality by leveraging auxiliary modalities such as images alongside textual input. While recent advances in large-scale pre-trained language and vision models have…

Computation and Language · Computer Science 2025-04-28 Zhuang Yu , Shiliang Sun , Jing Zhao , Tengfei Song , Hao Yang