Related papers: Curriculum Learning with Quality-Driven Data Selec…

Exploring Large Language Models for Feature Selection: A Data-centric Perspective

The rapid advancement of Large Language Models (LLMs) has significantly influenced various domains, leveraging their exceptional few-shot and zero-shot learning capabilities. In this work, we aim to explore and understand the LLMs-based…

Artificial Intelligence · Computer Science 2024-10-24 Dawei Li , Zhen Tan , Huan Liu

CLUES: Collaborative High-Quality Data Selection for LLMs via Training Dynamics

Recent research has highlighted the importance of data quality in scaling large language models (LLMs). However, automated data quality control faces unique challenges in collaborative settings where sharing is not allowed directly between…

Computation and Language · Computer Science 2025-07-08 Wanru Zhao , Hongxiang Fan , Shell Xu Hu , Wangchunshu Zhou , Bofan Chen , Nicholas D. Lane

Open-source Large Language Models are Strong Zero-shot Query Likelihood Models for Document Ranking

In the field of information retrieval, Query Likelihood Models (QLMs) rank documents based on the probability of generating the query given the content of a document. Recently, advanced large language models (LLMs) have emerged as effective…

Information Retrieval · Computer Science 2023-10-23 Shengyao Zhuang , Bing Liu , Bevan Koopman , Guido Zuccon

Large Language Model-guided Document Selection

Large Language Model (LLM) pre-training exhausts an ever growing compute budget, yet recent research has demonstrated that careful document selection enables comparable model quality with only a fraction of the FLOPs. Inspired by efforts…

Computation and Language · Computer Science 2024-06-10 Xiang Kong , Tom Gunter , Ruoming Pang

Visual Error Patterns in Multi-Modal AI: A Statistical Approach

Multi-modal large language models (MLLMs), such as GPT-4o, excel at integrating text and visual data but face systematic challenges when interpreting ambiguous or incomplete visual stimuli. This study leverages statistical modeling to…

Machine Learning · Computer Science 2024-12-09 Ching-Yi Wang

Is One-Shot In-Context Learning Helpful for Data Selection in Task-Specific Fine-Tuning of Multimodal LLMs?

Injecting world knowledge into pretrained multimodal large language models (MLLMs) is essential for domain-specific applications. Task-specific fine-tuning achieves this by tailoring MLLMs to high-quality in-domain data but encounters…

Multimedia · Computer Science 2026-03-31 Xiao An , Jiaxing Sun , Ting Hu , Wei He

Taking the Next Step with Generative Artificial Intelligence: The Transformative Role of Multimodal Large Language Models in Science Education

The integration of Artificial Intelligence (AI), particularly Large Language Model (LLM)-based systems, in education has shown promise in enhancing teaching and learning experiences. However, the advent of Multimodal Large Language Models…

Artificial Intelligence · Computer Science 2025-02-10 Arne Bewersdorff , Christian Hartmann , Marie Hornberger , Kathrin Seßler , Maria Bannert , Enkelejda Kasneci , Gjergji Kasneci , Xiaoming Zhai , Claudia Nerdel

Boosting Zero-Shot Crosslingual Performance using LLM-Based Augmentations with Effective Data Selection

Large language models (LLMs) are very proficient text generators. We leverage this capability of LLMs to generate task-specific data via zero-shot prompting and promote cross-lingual transfer for low-resource target languages. Given…

Computation and Language · Computer Science 2024-07-16 Barah Fazili , Ashish Sunil Agrawal , Preethi Jyothi

Strategic Data Ordering: Enhancing Large Language Model Performance through Curriculum Learning

The rapid advancement of Large Language Models (LLMs) has improved text understanding and generation but poses challenges in computational resources. This study proposes a curriculum learning-inspired, data-centric training strategy that…

Computation and Language · Computer Science 2024-05-14 Jisu Kim , Juhwan Lee

Curriculum Meta-Learning for Few-shot Classification

We propose an adaptation of the curriculum training framework, applicable to state-of-the-art meta learning techniques for few-shot classification. Curriculum-based training popularly attempts to mimic human learning by progressively…

Machine Learning · Computer Science 2021-12-07 Emmanouil Stergiadis , Priyanka Agrawal , Oliver Squire

Train a Unified Multimodal Data Quality Classifier with Synthetic Data

The Multimodal Large Language Models (MLLMs) are continually pre-trained on a mixture of image-text caption data and interleaved document data, while the high-quality data filtering towards image-text interleaved document data is…

Computer Vision and Pattern Recognition · Computer Science 2025-10-20 Weizhi Wang , Rongmei Lin , Shiyang Li , Colin Lockard , Ritesh Sarkhel , Sanket Lokegaonkar , Jingbo Shang , Xifeng Yan , Nasser Zalmout , Xian Li

Indexing Multimodal Language Models for Large-scale Image Retrieval

Multimodal Large Language Models (MLLMs) have demonstrated strong cross-modal reasoning capabilities, yet their potential for vision-only tasks remains underexplored. We investigate MLLMs as training-free similarity estimators for…

Computer Vision and Pattern Recognition · Computer Science 2026-04-16 Bahey Tharwat , Giorgos Kordopatis-Zilos , Pavel Suma , Ian Reid , Giorgos Tolias

Scaling Evidence-based Instructional Design Expertise through Large Language Models

This paper presents a comprehensive exploration of leveraging Large Language Models (LLMs), specifically GPT-4, in the field of instructional design. With a focus on scaling evidence-based instructional design expertise, our research aims…

Computation and Language · Computer Science 2023-06-27 Gautam Yadav

CLewR: Curriculum Learning with Restarts for Machine Translation Preference Learning

Large language models (LLMs) have demonstrated competitive performance in zero-shot multilingual machine translation (MT). Some follow-up works further improved MT performance via preference optimization, but they leave a key aspect largely…

Computation and Language · Computer Science 2026-04-20 Alexandra Dragomir , Florin Brad , Radu Tudor Ionescu

A Comprehensive Study of Multimodal Large Language Models for Image Quality Assessment

While Multimodal Large Language Models (MLLMs) have experienced significant advancement in visual understanding and reasoning, their potential to serve as powerful, flexible, interpretable, and text-driven models for Image Quality…

Computer Vision and Pattern Recognition · Computer Science 2024-07-12 Tianhe Wu , Kede Ma , Jie Liang , Yujiu Yang , Lei Zhang

Meta-rater: A Multi-dimensional Data Selection Method for Pre-training Language Models

The composition of pre-training datasets for large language models (LLMs) remains largely undisclosed, hindering transparency and efforts to optimize data quality, a critical driver of model performance. Current data selection methods, such…

Computation and Language · Computer Science 2025-08-07 Xinlin Zhuang , Jiahui Peng , Ren Ma , Yinfan Wang , Tianyi Bai , Xingjian Wei , Jiantao Qiu , Chi Zhang , Ying Qian , Conghui He

Your Vision-Language Model Itself Is a Strong Filter: Towards High-Quality Instruction Tuning with Data Selection

Data selection in instruction tuning emerges as a pivotal process for acquiring high-quality data and training instruction-following large language models (LLMs), but it is still a new and unexplored research area for vision-language models…

Computation and Language · Computer Science 2024-02-21 Ruibo Chen , Yihan Wu , Lichang Chen , Guodong Liu , Qi He , Tianyi Xiong , Chenxi Liu , Junfeng Guo , Heng Huang

Exploring Curriculum Learning for Vision-Language Tasks: A Study on Small-Scale Multimodal Training

For specialized domains, there is often not a wealth of data with which to train large machine learning models. In such limited data / compute settings, various methods exist aiming to $\textit{do more with less}$, such as finetuning from a…

Machine Learning · Computer Science 2024-10-22 Rohan Saha , Abrar Fahim , Alona Fyshe , Alex Murphy

Exploring Instruction Data Quality for Explainable Image Quality Assessment

In recent years, with the rapid development of powerful multimodal large language models (MLLMs), explainable image quality assessment (IQA) has gradually become popular, aiming at providing quality-related descriptions and answers of…

Computer Vision and Pattern Recognition · Computer Science 2025-10-07 Yunhao Li , Sijing Wu , Huiyu Duan , Yucheng Zhu , Qi Jia , Guangtao Zhai

XAI4LLM. Let Machine Learning Models and LLMs Collaborate for Enhanced In-Context Learning in Healthcare

Clinical decision support systems require models that are not only highly accurate but also equitable and sensitive to the implications of missed diagnoses. In this study, we introduce a knowledge-guided in-context learning (ICL) framework…

Machine Learning · Computer Science 2025-07-28 Fatemeh Nazary , Yashar Deldjoo , Tommaso Di Noia , Eugenio di Sciascio