English
Related papers

Related papers: Task Me Anything

200 papers

Comprehensive evaluation of Multimodal Large Language Models (MLLMs) has recently garnered widespread attention in the research community. However, we observe that existing benchmarks present several common barriers that make it difficult…

Computer Vision and Pattern Recognition · Computer Science 2025-02-06 Yi-Fan Zhang , Huanyu Zhang , Haochen Tian , Chaoyou Fu , Shuangqing Zhang , Junfei Wu , Feng Li , Kun Wang , Qingsong Wen , Zhang Zhang , Liang Wang , Rong Jin , Tieniu Tan

Since the release of ChatGPT, the field of Natural Language Processing has experienced rapid advancements, particularly in Large Language Models (LLMs) and their multimodal counterparts, Large Multimodal Models (LMMs). Despite their…

Computation and Language · Computer Science 2024-08-27 Florian Schneider , Sunayana Sitaram

Real-world multi-modal problems are rarely solved by a single machine learning model, and often require multi-step computational plans that involve stitching several models. Tool-augmented LLMs hold tremendous promise for automating the…

Computer Vision and Pattern Recognition · Computer Science 2024-09-24 Zixian Ma , Weikai Huang , Jieyu Zhang , Tanmay Gupta , Ranjay Krishna

This paper presents a comprehensive evaluation of cost-efficient Large Language Models (LLMs) for diverse biomedical tasks spanning both text and image modalities. We evaluated a range of closed-source and open-source LLMs on tasks such as…

Computation and Language · Computer Science 2025-07-21 Israt Jahan , Md Tahmid Rahman Laskar , Chun Peng , Jimmy Huang

Large Language Models (LLMs) have emerged as powerful tools for automating complex reasoning and decision-making tasks. In telecommunications, they hold the potential to transform network optimization, automate troubleshooting, enhance…

Large Vision-Language Models (LVLMs) show significant strides in general-purpose multimodal applications such as visual dialogue and embodied navigation. However, existing multimodal evaluation benchmarks cover a limited number of…

Large Language Models (LLMs) excel in code-related tasks like code generation, but benchmark evaluations often overlook task characteristics, such as difficulty. Moreover, benchmarks are usually built using tasks described with a single…

Software Engineering · Computer Science 2025-10-27 Florian Tambon , Amin Nikanjam , Cyrine Zid , Foutse Khomh , Giuliano Antoniol

Evaluating generative foundation models on open-ended multimodal understanding (MMU) and generation (MMG) tasks across diverse modalities (e.g., images, audio, video) poses significant challenges due to the complexity of cross-modal…

Computation and Language · Computer Science 2025-03-25 Shu Pu , Yaochen Wang , Dongping Chen , Yuhang Chen , Guohao Wang , Qi Qin , Zhongyi Zhang , Zhiyuan Zhang , Zetong Zhou , Shuang Gong , Yi Gui , Yao Wan , Philip S. Yu

Multimodal large language models (MLLMs), building upon the foundation of powerful large language models (LLMs), have recently demonstrated exceptional capabilities in generating not only texts but also images given interleaved multimodal…

Computer Vision and Pattern Recognition · Computer Science 2023-11-30 Bohao Li , Yuying Ge , Yixiao Ge , Guangzhi Wang , Rui Wang , Ruimao Zhang , Ying Shan

Recent evaluations of Large Multimodal Models (LMMs) have explored their capabilities in various domains, with only few benchmarks specifically focusing on urban environments. Moreover, existing urban benchmarks have been limited to…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Baichuan Zhou , Haote Yang , Dairong Chen , Junyan Ye , Tianyi Bai , Jinhua Yu , Songyang Zhang , Dahua Lin , Conghui He , Weijia Li

Large multimodal models (LMMs) have demonstrated impressive capabilities in understanding various types of image, including text-rich images. Most existing text-rich image benchmarks are simple extraction-based question answering, and many…

Computer Vision and Pattern Recognition · Computer Science 2024-08-28 Jian Chen , Ruiyi Zhang , Yufan Zhou , Ryan Rossi , Jiuxiang Gu , Changyou Chen

Built on the power of LLMs, numerous multimodal large language models (MLLMs) have recently achieved remarkable performance on various vision-language tasks. However, most existing MLLMs and benchmarks primarily focus on single-image input…

Computer Vision and Pattern Recognition · Computer Science 2024-10-10 Haowei Liu , Xi Zhang , Haiyang Xu , Yaya Shi , Chaoya Jiang , Ming Yan , Ji Zhang , Fei Huang , Chunfeng Yuan , Bing Li , Weiming Hu

Existing MLLM benchmarks face significant challenges in evaluating Unified MLLMs (U-MLLMs) due to: 1) lack of standardized benchmarks for traditional tasks, leading to inconsistent comparisons; 2) absence of benchmarks for mixed-modality…

Computer Vision and Pattern Recognition · Computer Science 2025-04-08 Wulin Xie , Yi-Fan Zhang , Chaoyou Fu , Yang Shi , Bingyan Nie , Hongkai Chen , Zhang Zhang , Liang Wang , Tieniu Tan

An increasing number of organizations are deploying Large Language Models (LLMs) for a wide range of tasks. Despite their general utility, LLMs are prone to errors, ranging from inaccuracies to hallucinations. To objectively assess the…

Artificial Intelligence · Computer Science 2024-10-15 Kiran Busch , Henrik Leopold

Multimodal Large Language Models (MLLMs) have made notable advances in visual understanding, yet their abilities to recognize objects modified by specific attributes remain an open question. To address this, we explore MLLMs' reasoning…

Computer Vision and Pattern Recognition · Computer Science 2024-11-28 Jiaxuan Li , Junwen Mo , MinhDuc Vo , Akihiro Sugimoto , Hideki Nakayama

Recent breakthroughs in large multimodal models (LMMs) have significantly advanced both text-to-image (T2I) generation and image-to-text (I2T) interpretation. However, many generated images still suffer from issues related to perceptual…

Computer Vision and Pattern Recognition · Computer Science 2025-04-14 Jiarui Wang , Huiyu Duan , Yu Zhao , Juntong Wang , Guangtao Zhai , Xiongkuo Min

Large Multimodal Models (LMMs) are typically trained on vast corpora of image-text data but are often limited in linguistic coverage, leading to biased and unfair outputs across languages. While prior work has explored multimodal…

Computer Vision and Pattern Recognition · Computer Science 2025-07-11 Ananya Raval , Aravind Narayanan , Vahid Reza Khazaie , Shaina Raza

Recent advances in large language models (LLMs) have enabled the emergence of general-purpose agents for automating end-to-end machine learning (ML) workflows, including data analysis, feature engineering, model training, and competition…

Artificial Intelligence · Computer Science 2025-09-12 Hangyi Jia , Yuxi Qian , Hanwen Tong , Xinhui Wu , Lin Chen , Feng Wei

Thousands of diverse benchmarks have been developed to measure the quality of large language models (LLMs). Yet prior work has demonstrated that LLM performance is often sufficiently explained by a small set of latent factors, or abilities.…

Computation and Language · Computer Science 2026-04-03 Michael Krumdick , Adam Wiemerslage , Seth Ebner , Charles Lovering , Chris Tanner

Recent breakthroughs in large multimodal models (LMMs), such as the impressive GPT-4o-Native, have demonstrated remarkable proficiency in following general-purpose instructions for image generation. However, current benchmarks often lack…

Computer Vision and Pattern Recognition · Computer Science 2025-05-27 Jiayu Wang , Yang Jiao , Yue Yu , Tianwen Qian , Shaoxiang Chen , Jingjing Chen , Yu-Gang Jiang
‹ Prev 1 2 3 10 Next ›