English
Related papers

Related papers: LLMStructBench: Benchmarking Large Language Model …

200 papers

The rapid advancement of large language models (LLMs) demands robust, unbiased, and scalable evaluation methods. However, human annotations are costly to scale, model-based evaluations are susceptible to stylistic biases, and…

Extracting structured information from text, such as key-value pairs that could augment tabular data, is quite useful in many enterprise use cases. Although large language models (LLMs) have enabled numerous automated pipelines for…

Computation and Language · Computer Science 2025-07-30 Satyananda Kashyap , Sola Shirai , Nandana Mihindukulasooriya , Horst Samulowitz

As Large Language Models (LLMs) become integral to software development workflows, their ability to generate structured outputs has become critically important. We introduce StructEval, a comprehensive benchmark for evaluating LLMs'…

The evaluation of large language models (LLMs) is crucial to assess their performance and mitigate potential security risks. In this paper, we introduce PromptBench, a unified library to evaluate LLMs. It consists of several key components…

Artificial Intelligence · Computer Science 2024-08-21 Kaijie Zhu , Qinlin Zhao , Hao Chen , Jindong Wang , Xing Xie

Unstructured documents like PDFs contain valuable structured information, but downstream systems require this data in reliable, standardized formats. LLMs are increasingly deployed to automate this extraction, making accuracy and…

Machine Learning · Computer Science 2026-02-17 Nick Ferguson , Josh Pennington , Narek Beghian , Aravind Mohan , Douwe Kiela , Sheshansh Agrawal , Thien Hang Nguyen

Large language models (LLMs) are becoming attractive as few-shot reasoners to solve Natural Language (NL)-related tasks. However, the understanding of their capability to process structured data like tables remains an under-explored area.…

Computation and Language · Computer Science 2024-07-18 Yuan Sui , Mengyu Zhou , Mingjie Zhou , Shi Han , Dongmei Zhang

Large Language Models (LLMs) have emerged as a powerful tool in advancing the Text-to-SQL task, significantly outperforming traditional methods.Nevertheless, as a nascent research field, there is still no consensus on the optimal prompt…

Computation and Language · Computer Science 2026-03-20 Bin Zhang , Yuxiao Ye , Guoqing Du , Xiaoru Hu , Zhishuai Li , Chi Harold Liu , Zhiwei Xu , Guoliang Fan , Rui Zhao , Ziyue Li , Hangyu Mao

Despite the remarkable capabilities of Large Language Models (LLMs) like GPT-4, producing complex, structured tabular data remains challenging. Our study assesses LLMs' proficiency in structuring tables and introduces a novel fine-tuning…

Computation and Language · Computer Science 2024-04-08 Xiangru Tang , Yiming Zong , Jason Phang , Yilun Zhao , Wangchunshu Zhou , Arman Cohan , Mark Gerstein

Multi-turn instruction following capability constitutes a core competency of large language models (LLMs) in real-world applications. Existing evaluation benchmarks predominantly focus on fine-grained constraint satisfaction and…

Computation and Language · Computer Science 2025-06-02 Jinnan Li , Jinzhe Li , Yue Wang , Yi Chang , Yuan Wu

Although large language models (LLMs) have demonstrated their strong intelligence ability, the high demand for computation and storage hinders their practical application. To this end, many model compression techniques are proposed to…

Computation and Language · Computer Science 2024-11-01 Ge Yang , Changyi He , Jinyang Guo , Jianyu Wu , Yifu Ding , Aishan Liu , Haotong Qin , Pengliang Ji , Xianglong Liu

The effective utilization of structured data, integral to corporate data strategies, has been challenged by the rise of large language models (LLMs) capable of processing unstructured information. This shift prompts the question: can LLMs…

Computation and Language · Computer Science 2024-10-22 Zhouhong Gu , Haoning Ye , Xingzhou Chen , Zeyang Zhou , Hongwei Feng , Yanghua Xiao

Recently, there has been a growing interest among large language model (LLM) developers in LLM-based document reading systems, which enable users to upload their own documents and pose questions related to the document contents, going…

Computation and Language · Computer Science 2024-07-16 Anni Zou , Wenhao Yu , Hongming Zhang , Kaixin Ma , Deng Cai , Zhuosheng Zhang , Hai Zhao , Dong Yu

Software testing is a crucial phase in the software life cycle, helping identify potential risks and reduce maintenance costs. With the advancement of Large Language Models (LLMs), researchers have proposed an increasing number of LLM-based…

Software Engineering · Computer Science 2024-09-27 Quanjun Zhang , Ye Shang , Chunrong Fang , Siqi Gu , Jianyi Zhou , Zhenyu Chen

Extracting structured information from visual documents (Visual Information Extraction, VIE) is a cornerstone of business automation. While recent Multimodal Large Language Models (MLLMs) have shown promising capabilities, existing…

Computer Vision and Pattern Recognition · Computer Science 2026-05-22 Yandi Wang , Libin Zhan , Ziwei Huang , Tiancheng Luo , Yuxuan Jiang , Wang Dong , Leilei Gan , Jun Chen

Multimodal Large Language Models (MLLM) have made significant progress in the field of document analysis. Despite this, existing benchmarks typically focus only on extracting text and simple layout information, neglecting the complex…

Computer Vision and Pattern Recognition · Computer Science 2024-07-04 Lei Chen , Feng Yan , Yujie Zhong , Shaoxiang Chen , Zequn Jie , Lin Ma

LLM development has aroused great interest in Sequential Recommendation (SR) applications. However, comprehensive evaluation of SR models remains lacking due to the limitations of the existing benchmarks: 1) an overemphasis on accuracy,…

Information Retrieval · Computer Science 2026-04-14 Jianhong Li , Zeheng Qian , Wangze Ni , Haoyang Li , Hongwei Yao , Yang Bai , Kui Ren

The ability to follow instructions is crucial for Large Language Models (LLMs) to handle various real-world applications. Existing benchmarks primarily focus on evaluating pure response quality, rather than assessing whether the response…

Computation and Language · Computer Science 2024-06-06 Yuxin Jiang , Yufei Wang , Xingshan Zeng , Wanjun Zhong , Liangyou Li , Fei Mi , Lifeng Shang , Xin Jiang , Qun Liu , Wei Wang

Multimodal large language models (MLLMs) are increasingly deployed in real-world, agentic settings where outputs must not only be correct, but also conform to predefined data schemas. Despite recent progress in structured generation in…

Computer Vision and Pattern Recognition · Computer Science 2026-03-19 Di Feng , Kaixin Ma , Feng Nan , Haofeng Chen , Bohan Zhai , David Griffiths , Mingfei Gao , Zhe Gan , Eshan Verma , Yinfei Yang , Zhifeng Chen , Afshin Dehghan

We present a benchmark targeting a novel class of systems: semantic query processing engines. Those systems rely inherently on generative and reasoning capabilities of state-of-the-art large language models (LLMs). They extend SQL with…

The recent development and success of Large Language Models (LLMs) necessitate an evaluation of their performance across diverse NLP tasks in different languages. Although several frameworks have been developed and made publicly available,…

‹ Prev 1 2 3 10 Next ›