English
Related papers

Related papers: The MERIT Dataset: Modelling and Efficiently Rende…

200 papers

Semantic retrieval is crucial for modern applications yet remains underexplored in current research. Existing datasets are limited to single languages, single images, or singular retrieval conditions, often failing to fully exploit the…

Computer Vision and Pattern Recognition · Computer Science 2025-12-04 Wei Chow , Yuan Gao , Linfeng Li , Xian Wang , Qi Xu , Hang Song , Lingdong Kong , Ran Zhou , Yi Zeng , Yidong Cai , Botian Jiang , Shilin Xu , Jiajun Zhang , Minghui Qiu , Xiangtai Li , Tianshu Yang , Siliang Tang , Juncheng Li

The milestone improvements brought about by deep representation learning and pre-training techniques have led to large performance gains across downstream NLP, IR and Vision tasks. Multimodal modeling techniques aim to leverage large…

Computer Vision and Pattern Recognition · Computer Science 2023-02-21 Krishna Srinivasan , Karthik Raman , Jiecao Chen , Michael Bendersky , Marc Najork

In this paper, we introduce the MLM (Multiple Languages and Modalities) dataset - a new resource to train and evaluate multitask systems on samples in multiple modalities and three languages. The generation process and inclusion of semantic…

Machine Learning · Computer Science 2020-10-27 Jason Armitage , Endri Kacupaj , Golsa Tahmasebzadeh , Swati , Maria Maleshkova , Ralph Ewerth , Jens Lehmann

Machine learning (ML) datasets, often perceived as neutral, inherently encapsulate abstract and disputed social constructs. Dataset curators frequently employ value-laden terms such as diversity, bias, and quality to characterize datasets.…

Machine Learning · Computer Science 2024-07-12 Dora Zhao , Jerone T. A. Andrews , Orestis Papakyriakopoulos , Alice Xiang

Document understanding tasks, in particular, Visually-rich Document Entity Retrieval (VDER), have gained significant attention in recent years thanks to their broad applications in enterprise AI. However, publicly available data have been…

Computation and Language · Computer Science 2023-10-27 Lijun Yu , Jin Miao , Xiaoyu Sun , Jiayi Chen , Alexander G. Hauptmann , Hanjun Dai , Wei Wei

Lecture slide presentations, a sequence of pages that contain text and figures accompanied by speech, are constructed and presented carefully in order to optimally transfer knowledge to students. Previous studies in multimedia and…

Artificial Intelligence · Computer Science 2022-08-18 Dong Won Lee , Chaitanya Ahuja , Paul Pu Liang , Sanika Natu , Louis-Philippe Morency

Currently, data and model size dominate the narrative in the training of super-large, powerful models. However, there has been a lack of exploration on the effect of other attributes of the training dataset on model performance. We…

Machine Learning · Computer Science 2025-01-22 Kavita Selva , Satita Vittayaareekul , Brando Miranda

As large language models continue to advance, their application in educational contexts remains underexplored and under-optimized. In this paper, we address this gap by introducing the first diverse benchmark tailored for educational…

Computation and Language · Computer Science 2026-01-07 Bin Xu , Yu Bai , Huashan Sun , Yiguan Lin , Siming Liu , Xinyue Liang , Yaolin Li , Zhuangzhi Dong , Jingren Zhang , Yufan Deng , Xinyu Zou , Yang Gao , Heyan Huang

One of the most significant challenges in Music Emotion Recognition (MER) comes from the fact that emotion labels can be heterogeneous across datasets with regard to the emotion representation, including categorical (e.g., happy, sad)…

Sound · Computer Science 2025-04-14 Jaeyong Kang , Dorien Herremans

Multimodal reward models (MRMs) play a crucial role in aligning Multimodal Large Language Models (MLLMs) with human preferences. Training a good MRM requires high-quality multimodal preference data. However, existing preference datasets…

Artificial Intelligence · Computer Science 2026-04-22 Zhihong Zhang , Jie Zhao , Xiaojian Huang , Jin Xu , Zhuodong Luo , Xin Liu , Jiansheng Wei , Xuejin Chen

Representation learning is a critical ingredient for natural language processing systems. Recent Transformer language models like BERT learn powerful textual representations, but these models are targeted towards token- and sentence-level…

Computation and Language · Computer Science 2020-05-21 Arman Cohan , Sergey Feldman , Iz Beltagy , Doug Downey , Daniel S. Weld

Knowledge Tracing (KT) models students' evolving knowledge states to predict future performance, serving as a foundation for personalized education. While traditional deep learning models achieve high accuracy, they often lack…

Computation and Language · Computer Science 2026-03-25 Runze Li , Kedi Chen , Guwei Feng , Mo Yu , Jun Wang , Wei Zhang

There is an emerging line of research on multimodal instruction tuning, and a line of benchmarks has been proposed for evaluating these models recently. Instead of evaluating the models directly, in this paper, we try to evaluate the…

Computer Vision and Pattern Recognition · Computer Science 2024-01-02 Ning Liao , Shaofeng Zhang , Renqiu Xia , Min Cao , Yu Qiao , Junchi Yan

Language models can be trained to recognize the moral sentiment of text, creating new opportunities to study the role of morality in human life. As interest in language and morality has grown, several ground truth datasets with moral…

Computation and Language · Computer Science 2023-04-06 Siyi Guo , Negar Mokhberian , Kristina Lerman

Accurate staging of liver fibrosis from magnetic resonance imaging (MRI) is crucial in clinical practice. While conventional methods often focus on a specific sub-region, multi-view learning captures more information by analyzing multiple…

Computer Vision and Pattern Recognition · Computer Science 2025-03-04 Yuanye Liu , Zheyao Gao , Nannan Shi , Fuping Wu , Yuxin Shi , Qingchao Chen , Xiahai Zhuang

Matching submissions with suitable reviewers at scale is a growing challenge for major venues, yet existing approaches either rely on coarse proxy signals that conflate general relatedness with true suitability, or require expensive human…

Computation and Language · Computer Science 2026-05-28 Zixuan Yang , Yibo Zhao , Weicong Liu , Xiang Li

Machine learning advances in the last decade have relied significantly on large-scale datasets that continue to grow in size. Increasingly, those datasets also contain different data modalities. However, large multi-modal datasets are hard…

Machine Learning · Computer Science 2021-10-28 Itai Gat , Idan Schwartz , Alexander Schwing

In this study, we compared the performance of four different methods for multi label text classification using a specific imbalanced business dataset. The four methods we evaluated were fine tuned BERT, Binary Relevance, Classifier Chains,…

Information Retrieval · Computer Science 2023-06-13 Muhammad Arslan , Christophe Cruz

With the advent of technology and use of latest devices, they produces voluminous data. Out of it, 80% of the data are unstructured and remaining 20% are structured and semi-structured. The produced data are in heterogeneous format and…

This paper embarks on an exploration into the Large Language Model (LLM) datasets, which play a crucial role in the remarkable advancements of LLMs. The datasets serve as the foundational infrastructure analogous to a root system that…

Computation and Language · Computer Science 2024-02-29 Yang Liu , Jiahuan Cao , Chongyu Liu , Kai Ding , Lianwen Jin
‹ Prev 1 2 3 10 Next ›