Related papers: ControlMath: Controllable Data Generation Promotes…

Controlled Text Generation via Language Model Arithmetic

As Large Language Models (LLMs) are deployed more widely, customization with respect to vocabulary, style, and character becomes more important. In this work, we introduce model arithmetic, a novel inference framework for composing and…

Computation and Language · Computer Science 2024-03-07 Jasper Dekoninck , Marc Fischer , Luca Beurer-Kellner , Martin Vechev

Neuro-Symbolic Data Generation for Math Reasoning

A critical question about Large Language Models (LLMs) is whether their apparent deficiency in mathematical reasoning is inherent, or merely a result of insufficient exposure to high-quality mathematical data. To explore this, we developed…

Artificial Intelligence · Computer Science 2024-12-09 Zenan Li , Zhi Zhou , Yuan Yao , Yu-Feng Li , Chun Cao , Fan Yang , Xian Zhang , Xiaoxing Ma

LLM-Agent-Controller: A Universal Multi-Agent Large Language Model System as a Control Engineer

This study presents the LLM-Agent-Controller, a multi-agent large language model (LLM) system developed to address a wide range of problems in control engineering (Control Theory). The system integrates a central controller agent with…

Artificial Intelligence · Computer Science 2025-05-27 Rasoul Zahedifar , Sayyed Ali Mirghasemi , Mahdieh Soleymani Baghshah , Alireza Taheri

Synthesis by Design: Controlled Data Generation via Structural Guidance

Mathematical reasoning remains challenging for LLMs due to complex logic and the need for precise computation. Existing methods enhance LLM reasoning by synthesizing datasets through problem rephrasing, but face issues with generation…

Computation and Language · Computer Science 2025-06-12 Lei Xu , Sirui Chen , Yuxuan Huang , Chaochao Lu

Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models

The instruction-following ability of large language models enables humans to interact with AI agents in a natural way. However, when required to generate responses of a specific length, large language models often struggle to meet users'…

Computation and Language · Computer Science 2024-10-02 Jiaming Li , Lei Zhang , Yunshui Li , Ziqiang Liu , yuelin bai , Run Luo , Longze Chen , Min Yang

AgenticMath: Enhancing LLM Reasoning via Agentic-based Math Data Generation

The creation of high-quality datasets to improve Large Language Model (LLM) reasoning remains a significant challenge, as current methods often suffer from generating low-quality/incorrect answers and limited information richness from…

Computation and Language · Computer Science 2026-01-09 Xianyang Liu , Yilin Liu , Shuai Wang , Hao Cheng , Andrew Estornell , Yuzhi Zhao , Jun Shu , Jiaheng Wei

Seeking Diverse Reasoning Logic: Controlled Equation Expression Generation for Solving Math Word Problems

To solve Math Word Problems, human students leverage diverse reasoning logic that reaches different possible equation solutions. However, the mainstream sequence-to-sequence approach of automatic solvers aims to decode a fixed solution…

Computation and Language · Computer Science 2022-12-01 Yibin Shen , Qianying Liu , Zhuoyuan Mao , Zhen Wan , Fei Cheng , Sadao Kurohashi

Generating Realistic Tabular Data with Large Language Models

While most generative models show achievements in image data generation, few are developed for tabular data generation. Recently, due to success of large language models (LLM) in diverse tasks, they have also been used for tabular data…

Machine Learning · Computer Science 2024-10-30 Dang Nguyen , Sunil Gupta , Kien Do , Thin Nguyen , Svetha Venkatesh

Diffusion-LM Improves Controllable Text Generation

Controlling the behavior of language models (LMs) without re-training is a major open problem in natural language generation. While recent works have demonstrated successes on controlling simple sentence attributes (e.g., sentiment), there…

Computation and Language · Computer Science 2022-05-31 Xiang Lisa Li , John Thickstun , Ishaan Gulrajani , Percy Liang , Tatsunori B. Hashimoto

Prompt-Based Length Controlled Generation with Reinforcement Learning

Large language models (LLMs) like ChatGPT and GPT-4 have attracted great attention given their surprising performance on a wide range of NLP tasks. Length controlled generation of LLMs emerges as an important topic, which enables users to…

Computation and Language · Computer Science 2023-10-03 Renlong Jie , Xiaojun Meng , Lifeng Shang , Xin Jiang , Qun Liu

Logic Contrastive Reasoning with Lightweight Large Language Model for Math Word Problems

This study focuses on improving the performance of lightweight Large Language Models (LLMs) in mathematical reasoning tasks. We introduce a novel method for measuring mathematical logic similarity and design an automatic screening mechanism…

Computation and Language · Computer Science 2024-09-04 Ding Kai , Ma Zhenguo , Yan Xiaoran

Training and Evaluating Language Models with Template-based Data Generation

The rapid advancement of large language models (LLMs) such as GPT-3, PaLM, and Llama has significantly transformed natural language processing, showcasing remarkable capabilities in understanding and generating language. However, a…

Computation and Language · Computer Science 2026-05-15 Yifan Zhang

A Lightweight Multi Aspect Controlled Text Generation Solution For Large Language Models

Large language models (LLMs) show remarkable abilities with instruction tuning. However, they fail to achieve ideal tasks when lacking high-quality instruction tuning data on target tasks. Multi-Aspect Controllable Text Generation (MCTG) is…

Computation and Language · Computer Science 2024-10-21 Chenyang Zhang , Jiayi Lin , Haibo Tong , Bingxuan Hou , Dongyu Zhang , Jialin Li , Junli Wang

Improving Small Language Models on PubMedQA via Generative Data Augmentation

Large Language Models (LLMs) have made remarkable advancements in the field of natural language processing. However, their increasing size poses challenges in terms of computational cost. On the other hand, Small Language Models (SLMs) are…

Computation and Language · Computer Science 2023-08-03 Zhen Guo , Peiqi Wang , Yanwei Wang , Shangdi Yu

Math Multiple Choice Question Generation via Human-Large Language Model Collaboration

Multiple choice questions (MCQs) are a popular method for evaluating students' knowledge due to their efficiency in administration and grading. Crafting high-quality math MCQs is a labor-intensive process that requires educators to…

Computation and Language · Computer Science 2024-05-03 Jaewook Lee , Digory Smith , Simon Woodhead , Andrew Lan

ControllableGPT: A Ground-Up Designed Controllable GPT for Molecule Optimization

Large Language Models (LLMs) employ three popular training approaches: Masked Language Models (MLM), Causal Language Models (CLM), and Sequence-to-Sequence Models (seq2seq). However, each approach has its strengths and limitations, and…

Machine Learning · Computer Science 2025-02-18 Xuefeng Liu , Songhao Jiang , Bo Li , Rick Stevens

Solving Math Word Problems by Combining Language Models With Symbolic Solvers

Automatically generating high-quality step-by-step solutions to math word problems has many applications in education. Recently, combining large language models (LLMs) with external tools to perform complex reasoning and calculation has…

Computation and Language · Computer Science 2023-04-19 Joy He-Yueya , Gabriel Poesia , Rose E. Wang , Noah D. Goodman

MuggleMath: Assessing the Impact of Query and Response Augmentation on Math Reasoning

In math reasoning with large language models (LLMs), fine-tuning data augmentation by query evolution and diverse reasoning paths is empirically verified effective, profoundly narrowing the gap between open-sourced LLMs and cutting-edge…

Computation and Language · Computer Science 2024-07-18 Chengpeng Li , Zheng Yuan , Hongyi Yuan , Guanting Dong , Keming Lu , Jiancan Wu , Chuanqi Tan , Xiang Wang , Chang Zhou

Memory-Augmented LLM-based Multi-Agent System for Automated Feature Generation on Tabular Data

Automated feature generation extracts informative features from raw tabular data without manual intervention and is crucial for accurate, generalizable machine learning. Traditional methods rely on predefined operator libraries and cannot…

Artificial Intelligence · Computer Science 2026-04-23 Fengxian Dong , Zhi Zheng , Xiao Han , Wei Chen , Jingqing Ruan , Tong Xu , Yong Chen , Enhong Chen

Curated LLM: Synergy of LLMs and Data Curation for tabular augmentation in low-data regimes

Machine Learning (ML) in low-data settings remains an underappreciated yet crucial problem. Hence, data augmentation methods to increase the sample size of datasets needed for ML are key to unlocking the transformative potential of ML in…

Machine Learning · Computer Science 2024-07-02 Nabeel Seedat , Nicolas Huynh , Boris van Breugel , Mihaela van der Schaar