Related papers: InData: Towards Secure Multi-Step, Tool-Based Data…

Why Do Open-Source LLMs Struggle with Data Analysis? A Systematic Empirical Study

Large Language Models (LLMs) hold promise in automating data analysis tasks, yet open-source models face significant limitations in these kinds of reasoning-intensive scenarios. In this work, we investigate strategies to enhance the data…

Computation and Language · Computer Science 2025-11-14 Yuqi Zhu , Yi Zhong , Jintian Zhang , Ziheng Zhang , Shuofei Qiao , Yujie Luo , Lun Du , Da Zheng , Ningyu Zhang , Huajun Chen

Evaluating Implicit Regulatory Compliance in LLM Tool Invocation via Logic-Guided Synthesis

The integration of large language models (LLMs) into autonomous agents has enabled complex tool use, yet in high-stakes domains, these systems must strictly adhere to regulatory standards beyond simple functional correctness. However,…

Computation and Language · Computer Science 2026-01-14 Da Song , Yuheng Huang , Boqi Chen , Tianshuo Cong , Randy Goebel , Lei Ma , Foutse Khomh

Large Language Models as Robust Data Generators in Software Analytics: Are We There Yet?

Large Language Model (LLM)-generated data is increasingly used in software analytics, but it is unclear how this data compares to human-written data, particularly when models are exposed to adversarial scenarios. Adversarial attacks can…

Software Engineering · Computer Science 2025-05-07 Md. Abdul Awal , Mrigank Rochan , Chanchal K. Roy

LAMDAS: LLM as an Implicit Classifier for Domain-specific Data Selection

Adapting large language models (LLMs) to specific domains often faces a critical bottleneck: the scarcity of high-quality, human-curated data. While large volumes of unchecked data are readily available, indiscriminately using them for…

Computation and Language · Computer Science 2025-09-09 Jian Wu , Hang Yu , Bingchang Liu , Wenjie Yang , Peng Di , Jianguo Li , Yue Zhang

An Empirical Study of Reasoning Steps in Thinking Code LLMs

Thinking Large Language Models (LLMs) generate explicit intermediate reasoning traces before final answers, potentially improving transparency, interpretability, and solution accuracy for code generation. However, the quality of these…

Artificial Intelligence · Computer Science 2025-11-11 Haoran Xue , Gias Uddin , Song Wang

Entropy-based Exploration Conduction for Multi-step Reasoning

Multi-step processes via large language models (LLMs) have proven effective for solving complex reasoning tasks. However, the depth of exploration of the reasoning procedure can significantly affect the task performance. Existing methods to…

Artificial Intelligence · Computer Science 2025-06-19 Jinghan Zhang , Xiting Wang , Fengran Mo , Yeyang Zhou , Wanfu Gao , Kunpeng Liu

MLLM-DataEngine: An Iterative Refinement Approach for MLLM

Despite the great advance of Multimodal Large Language Models (MLLMs) in both instruction dataset building and benchmarking, the independence of training and evaluation makes current MLLMs hard to further improve their capability under the…

Machine Learning · Computer Science 2023-09-12 Zhiyuan Zhao , Linke Ouyang , Bin Wang , Siyuan Huang , Pan Zhang , Xiaoyi Dong , Jiaqi Wang , Conghui He

SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety

The last two years have seen a rapid growth in concerns around the safety of large language models (LLMs). Researchers and practitioners have met these concerns by creating an abundance of datasets for evaluating and improving LLM safety.…

Computation and Language · Computer Science 2025-01-13 Paul Röttger , Fabio Pernisi , Bertie Vidgen , Dirk Hovy

Multi-LogiEval: Towards Evaluating Multi-Step Logical Reasoning Ability of Large Language Models

As Large Language Models (LLMs) continue to exhibit remarkable performance in natural language understanding tasks, there is a crucial need to measure their ability for human-like multi-step logical reasoning. Existing logical reasoning…

Computation and Language · Computer Science 2024-10-08 Nisarg Patel , Mohith Kulkarni , Mihir Parmar , Aashna Budhiraja , Mutsumi Nakamura , Neeraj Varshney , Chitta Baral

Knowledge-to-Data: LLM-Driven Synthesis of Structured Network Traffic for Testbed-Free IDS Evaluation

Realistic, large-scale, and well-labeled cybersecurity datasets are essential for training and evaluating Intrusion Detection Systems (IDS). However, they remain difficult to obtain due to privacy constraints, data sensitivity, and the cost…

Cryptography and Security · Computer Science 2026-01-09 Konstantinos E. Kampourakis , Vyron Kampourakis , Efstratios Chatzoglou , Georgios Kambourakis , Stefanos Gritzalis

Data Advisor: Dynamic Data Curation for Safety Alignment of Large Language Models

Data is a crucial element in large language model (LLM) alignment. Recent studies have explored using LLMs for efficient data collection. However, LLM-generated data often suffers from quality issues, with underrepresented or absent aspects…

Computation and Language · Computer Science 2024-10-08 Fei Wang , Ninareh Mehrabi , Palash Goyal , Rahul Gupta , Kai-Wei Chang , Aram Galstyan

SciDA: Scientific Dynamic Assessor of LLMs

Advancement in Large Language Models (LLMs) reasoning capabilities enables them to solve scientific problems with enhanced efficacy. Thereby, a high-quality benchmark for comprehensive and appropriate assessment holds significance, while…

Computation and Language · Computer Science 2025-06-17 Junting Zhou , Tingjia Miao , Yiyan Liao , Qichao Wang , Zhoufutu Wen , Yanqin Wang , Yunjie Huang , Ge Yan , Leqi Wang , Yucheng Xia , Hongwan Gao , Yuansong Zeng , Renjie Zheng , Chen Dun , Yitao Liang , Tong Yang , Wenhao Huang , Ge Zhang

Leveraging Large Language Models for Command Injection Vulnerability Analysis in Python: An Empirical Study on Popular Open-Source Projects

Command injection vulnerabilities are a significant security threat in dynamic languages like Python, particularly in widely used open-source projects where security issues can have extensive impact. With the proven effectiveness of Large…

Software Engineering · Computer Science 2025-05-22 Yuxuan Wang , Jingshu Chen , Qingyang Wang

ORACLE: Optimizing Reasoning Abilities of Large Language Models via Constraint-Led Synthetic Data Elicitation

Training large language models (LLMs) with synthetic reasoning data has become a popular approach to enhancing their reasoning capabilities, while a key factor influencing the effectiveness of this paradigm is the quality of the generated…

Artificial Intelligence · Computer Science 2026-03-24 Zhuojie Yang , Wentao Wan , Keze Wang

Automated Framework to Evaluate and Harden LLM System Instructions against Encoding Attacks

System Instructions in Large Language Models (LLMs) are commonly used to enforce safety policies, define agent behavior, and protect sensitive operational context in agentic AI applications. These instructions may contain sensitive…

Cryptography and Security · Computer Science 2026-04-02 Anubhab Sahu , Diptisha Samanta , Reza Soosahabi

HARDMath: A Benchmark Dataset for Challenging Problems in Applied Mathematics

Advanced applied mathematics problems are underrepresented in existing Large Language Model (LLM) benchmark datasets. To address this, we introduce HARDMath, a dataset inspired by a graduate course on asymptotic methods, featuring…

Machine Learning · Computer Science 2024-12-17 Jingxuan Fan , Sarah Martinson , Erik Y. Wang , Kaylie Hausknecht , Jonah Brenner , Danxian Liu , Nianli Peng , Corey Wang , Michael P. Brenner

To Err is Machine: Vulnerability Detection Challenges LLM Reasoning

In this paper, we present a challenging code reasoning task: vulnerability detection. Large Language Models (LLMs) have shown promising results in natural-language and math reasoning, but state-of-the-art (SOTA) models reported only 54.5%…

Software Engineering · Computer Science 2025-01-09 Benjamin Steenhoek , Md Mahbubur Rahman , Monoshi Kumar Roy , Mirza Sanjida Alam , Hengbo Tong , Swarna Das , Earl T. Barr , Wei Le

CyberThreat-Eval: Can Large Language Models Automate Real-World Threat Research?

Analyzing Open Source Intelligence (OSINT) from large volumes of data is critical for drafting and publishing comprehensive CTI reports. This process usually follows a three-stage workflow -- triage, deep search and TI drafting. While Large…

Cryptography and Security · Computer Science 2026-03-11 Xiangsen Chen , Xuan Feng , Shuo Chen , Matthieu Maitre , Sudipto Rakshit , Diana Duvieilh , Ashley Picone , Nan Tang

Text Annotation via Inductive Coding: Comparing Human Experts to LLMs in Qualitative Data Analysis

This paper investigates the automation of qualitative data analysis, focusing on inductive coding using large language models (LLMs). Unlike traditional approaches that rely on deductive methods with predefined labels, this research…

Computation and Language · Computer Science 2025-12-02 Angelina Parfenova , Andreas Marfurt , Alexander Denzler , Juergen Pfeffer

Increasing LLM Coding Capabilities through Diverse Synthetic Coding Tasks

Large language models (LLMs) have shown impressive promise in code generation, yet their progress remains limited by the shortage of large-scale datasets that are both diverse and well-aligned with human reasoning. Most existing resources…

Machine Learning · Computer Science 2025-10-28 Amal Abed , Ivan Lukic , Jörg K. H. Franke , Frank Hutter