Related papers: Rethinking Scientific Modeling: Toward Physically …

Modeling and Simulation Based Engineering in the Context of Cyber-Physical Systems

Cyber-Physical Systems (CPS) produce behavior through execution on substrates coupling computation with physical processes. However, usual engineering approaches do not treat execution semantics as first-class engineering entities. Formal…

Software Engineering · Computer Science 2026-04-16 Alexandre Muzy

EngTrace: A Symbolic Benchmark for Verifiable Process Supervision of Engineering Reasoning

Large Language Models (LLMs) are increasingly entering specialized, safety-critical engineering workflows governed by strict quantitative standards and immutable physical laws, making rigorous evaluation of their reasoning capabilities…

Computation and Language · Computer Science 2026-01-08 Ayesha Gull , Muhammad Usman Safder , Rania Elbadry , Fan Zhang , Veselin Stoyanov , Preslav Nakov , Zhuohan Xie

StructEval: Benchmarking LLMs' Capabilities to Generate Structural Outputs

As Large Language Models (LLMs) become integral to software development workflows, their ability to generate structured outputs has become critically important. We introduce StructEval, a comprehensive benchmark for evaluating LLMs'…

Software Engineering · Computer Science 2026-04-06 Jialin Yang , Dongfu Jiang , Lipeng He , Sherman Siu , Yuxuan Zhang , Disen Liao , Zhuofeng Li , Huaye Zeng , Yiming Jia , Haozhe Wang , Benjamin Schneider , Chi Ruan , Wentao Ma , Zhiheng Lyu , Yifei Wang , Yi Lu , Quy Duc Do , Ziyan Jiang , Ping Nie , Wenhu Chen

Self-consistent Validation for Machine Learning Electronic Structure

Machine learning has emerged as a significant approach to efficiently tackle electronic structure problems. Despite its potential, there is less guarantee for the model to generalize to unseen data that hinders its application in real-world…

Machine Learning · Computer Science 2024-02-16 Gengyuan Hu , Gengchen Wei , Zekun Lou , Philip H. S. Torr , Wanli Ouyang , Han-sen Zhong , Chen Lin

Causal Consistency of Structural Equation Models

Complex systems can be modelled at various levels of detail. Ideally, causal models of the same system should be consistent with one another in the sense that they agree in their predictions of the effects of interventions. We formalise…

Machine Learning · Statistics 2020-09-03 Paul K. Rubenstein , Sebastian Weichwald , Stephan Bongers , Joris M. Mooij , Dominik Janzing , Moritz Grosse-Wentrup , Bernhard Schölkopf

What You See Is What It Does: A Structural Pattern for Legible Software

The opportunities offered by LLM coders (and their current limitations) demand a reevaluation of how software is structured. Software today is often "illegible" - lacking a direct correspondence between code and observed behavior - and…

Software Engineering · Computer Science 2025-08-29 Eagon Meng , Daniel Jackson

The Benchmarking Epistemology: Construct Validity for Evaluating Machine Learning Models

Predictive benchmarking, the evaluation of machine learning models based on predictive performance and competitive ranking, is a central epistemic practice in machine learning research and an increasingly prominent method for scientific…

Machine Learning · Computer Science 2025-10-28 Timo Freiesleben , Sebastian Zezulka

Image-Based Structural Analysis Using Computer Vision and LLMs: PhotoBeamSolver

This paper presents the development of a documented program capable of solving idealized beam models, such as those commonly used in textbooks and academic exercises, from drawings made by a person. The system is based on computer vision…

Computer Vision and Pattern Recognition · Computer Science 2026-03-24 Altamirano-Muñiz Emilio Fernando

LexInstructEval: Lexical Instruction Following Evaluation for Large Language Models

The ability of Large Language Models (LLMs) to precisely follow complex and fine-grained lexical instructions is a cornerstone of their utility and controllability. However, evaluating this capability remains a significant challenge.…

Computation and Language · Computer Science 2026-03-24 Huimin Ren , Yan Liang , Baiqiao Su , Chaobo Sun , Hengtong Lu , Kaike Zhang , Chen Wei

StructTest: Benchmarking LLMs' Reasoning through Compositional Structured Outputs

The rapid advancement of large language models (LLMs) demands robust, unbiased, and scalable evaluation methods. However, human annotations are costly to scale, model-based evaluations are susceptible to stylistic biases, and…

Computation and Language · Computer Science 2025-03-21 Hailin Chen , Fangkai Jiao , Mathieu Ravaut , Nawshad Farruque , Xuan Phi Nguyen , Chengwei Qin , Manan Dey , Bosheng Ding , Caiming Xiong , Shafiq Joty , Yingbo Zhou

Formal Model-Driven Engineering: Generating Data and Behavioural Components

Model-driven engineering is the automatic production of software artefacts from abstract models of structure and functionality. By targeting a specific class of system, it is possible to automate aspects of the development process, using…

Software Engineering · Computer Science 2013-01-03 Chen-Wei Wang , Jim Davies

Toward Explaining Large Language Models in Software Engineering Tasks

Recent progress in Large Language Models (LLMs) has substantially advanced the automation of software engineering (SE) tasks, enabling complex activities such as code generation and code summarization. However, the black-box nature of LLMs…

Software Engineering · Computer Science 2025-12-24 Antonio Vitale , Khai-Nguyen Nguyen , Denys Poshyvanyk , Rocco Oliveto , Simone Scalabrino , Antonio Mastropaolo

Building Information Modeling Using Constraint Logic Programming

Building Information Modeling (BIM) produces three-dimensional models of buildings combining the geometrical information with a wide range of properties. BIM is slowly but inevitably revolutionizing the architecture, engineering, and…

Logic in Computer Science · Computer Science 2022-05-19 Joaquín Arias , Seppo Törmä , Manuel Carro , Gopal Gupta

PCEval: A Benchmark for Evaluating Physical Computing Capabilities of Large Language Models

Large Language Models (LLMs) have demonstrated remarkable capabilities across various domains, including software development, education, and technical assistance. Among these, software development is one of the key areas where LLMs are…

Computation and Language · Computer Science 2026-01-07 Inpyo Song , Eunji Jeon , Jangwon Lee

FEM-Bench: A Structured Scientific Reasoning Benchmark for Evaluating Code-Generating LLMs

As LLMs advance their reasoning capabilities about the physical world, the absence of rigorous benchmarks for evaluating their ability to generate scientifically valid physical models has become a critical gap. Computational mechanics,…

Machine Learning · Computer Science 2025-12-25 Saeed Mohammadzadeh , Erfan Hamdi , Joel Shor , Emma Lejeune

FEABench: Evaluating Language Models on Multiphysics Reasoning Ability

Building precise simulations of the real world and invoking numerical solvers to answer quantitative problems is an essential requirement in engineering and science. We present FEABench, a benchmark to evaluate the ability of large language…

Artificial Intelligence · Computer Science 2025-04-09 Nayantara Mudur , Hao Cui , Subhashini Venugopalan , Paul Raccuglia , Michael P. Brenner , Peter Norgaard

MathConstruct: Challenging LLM Reasoning with Constructive Proofs

While Large Language Models (LLMs) demonstrate impressive performance in mathematics, existing math benchmarks come with significant limitations. Many focus on problems with fixed ground-truth answers, and are often saturated due to problem…

Artificial Intelligence · Computer Science 2025-10-02 Mislav Balunović , Jasper Dekoninck , Nikola Jovanović , Ivo Petrov , Martin Vechev

A Large Language Model-Empowered Agent for Reliable and Robust Structural Analysis

Large language models (LLMs) have exhibited remarkable capabilities across diverse open-domain tasks, yet their application in specialized domains such as civil engineering remains largely unexplored. This paper starts bridging this gap by…

Computation and Language · Computer Science 2025-07-08 Jiachen Liu , Ziheng Geng , Ran Cao , Lu Cheng , Paolo Bocchini , Minghui Cheng

Medical Large Language Model Benchmarks Should Prioritize Construct Validity

Medical large language models (LLMs) research often makes bold claims, from encoding clinical knowledge to reasoning like a physician. These claims are usually backed by evaluation on competitive benchmarks; a tradition inherited from…

Computation and Language · Computer Science 2025-03-17 Ahmed Alaa , Thomas Hartvigsen , Niloufar Golchini , Shiladitya Dutta , Frances Dean , Inioluwa Deborah Raji , Travis Zack

SimBench: Benchmarking the Ability of Large Language Models to Simulate Human Behaviors

Large language model (LLM) simulations of human behavior have the potential to revolutionize the social and behavioral sciences, if and only if they faithfully reflect real human behaviors. Current evaluations of simulation fidelity are…

Computation and Language · Computer Science 2026-04-14 Tiancheng Hu , Joachim Baumann , Lorenzo Lupo , Nigel Collier , Dirk Hovy , Paul Röttger