English
Related papers

Related papers: A Benchmarking Framework for Model Datasets

200 papers

In the last couple of years, Model Driven Engineering (MDE) gained a prominent role in the context of software engineering. In the MDE paradigm, models are considered first level artifacts which are iteratively developed by teams of…

Software Engineering · Computer Science 2014-08-26 Pit Pietsch , Klaus Müller , Bernhard Rumpe

In machine learning research, it is common to evaluate algorithms via their performance on standard benchmark datasets. While a growing body of work establishes guidelines for -- and levies criticisms at -- data and benchmarking practices…

Machine Learning · Computer Science 2024-11-01 Rachel Longjohn , Markelle Kelly , Sameer Singh , Padhraic Smyth

Commonly, AI or machine learning (ML) models are evaluated on benchmark datasets. This practice supports innovative methodological research, but benchmark performance can be poorly correlated with performance in real-world applications -- a…

Machine Learning · Computer Science 2024-06-18 Olivier Binette , Jerome P. Reiter

The advancement of large language models (LLMs) has led to a greater challenge of having a rigorous and systematic evaluation of complex tasks performed, especially in enterprise applications. Therefore, LLMs need to be able to benchmark…

Computation and Language · Computer Science 2024-10-18 Bing Zhang , Mikio Takeuchi , Ryo Kawahara , Shubhi Asthana , Md. Maruf Hossain , Guang-Jie Ren , Kate Soule , Yada Zhu

Machine Learning (ML) and Deep Learning (DL) innovations are being introduced at such a rapid pace that researchers are hard-pressed to analyze and study them. The complicated procedures for evaluating innovations, along with the lack of…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-20 Abdul Dakkak , Cheng Li , Jinjun Xiong , Wen-mei Hwu

The selection, development, or comparison of machine learning methods in data mining can be a difficult task based on the target problem and goals of a particular study. Numerous publicly available real-world and simulated benchmark…

Machine Learning · Computer Science 2017-03-03 Randal S. Olson , William La Cava , Patryk Orzechowski , Ryan J. Urbanowicz , Jason H. Moore

We introduce a machine-learning (ML) framework for high-throughput benchmarking of diverse representations of chemical systems against datasets of materials and molecules. The guiding principle underlying the benchmarking approach is to…

Machine Learning · Computer Science 2021-12-07 Carl Poelking , Felix A. Faber , Bingqing Cheng

Modern language models (LMs) pose a new challenge in capability assessment. Static benchmarks inevitably saturate without providing confidence in the deployment tolerances of LM-based systems, but developers nonetheless claim that their…

Software Engineering · Computer Science 2024-07-31 Michael Saxon , Ari Holtzman , Peter West , William Yang Wang , Naomi Saphra

There is a rapidly growing number of open-source Large Language Models (LLMs) and benchmark datasets to compare them. While some models dominate these benchmarks, no single model typically achieves the best accuracy in all tasks and use…

Computation and Language · Computer Science 2023-09-28 Tal Shnitzer , Anthony Ou , Mírian Silva , Kate Soule , Yuekai Sun , Justin Solomon , Neil Thompson , Mikhail Yurochkin

Large language models for code are advancing fast, yet our ability to evaluate them lags behind. Current benchmarks focus on narrow tasks and single metrics, which hide critical gaps in robustness, interpretability, fairness, efficiency,…

We present a benchmark for large language models designed to tackle one of the most knowledge-intensive tasks in data science: writing feature engineering code, which requires domain knowledge in addition to a deep understanding of the…

Computation and Language · Computer Science 2024-11-01 Michał Pietruszka , Łukasz Borchmann , Aleksander Jędrosz , Paweł Morawiecki

Machine Learning for Software Engineering (ML4SE) is an actively growing research area that focuses on methods that help programmers in their work. In order to apply the developed methods in practice, they need to achieve reasonable quality…

Software Engineering · Computer Science 2022-06-08 Egor Bogomolov , Sergey Zhuravlev , Egor Spirin , Timofey Bryksin

As large language models (LLMs) continue to advance, the need for up-to-date and well-organized benchmarks becomes increasingly critical. However, many existing datasets are scattered, difficult to manage, and make it challenging to perform…

Machine Learning · Computer Science 2025-06-03 Eunsu Kim , Haneul Yoo , Guijin Son , Hitesh Patel , Amit Agarwal , Alice Oh

Data is a cornerstone of empirical software engineering (ESE) research and practice. Data underpin numerous process and project management activities, including the estimation of development effort and the prediction of the likely location…

Software Engineering · Computer Science 2020-12-22 Michael F. Bosu , Stephen G. MacDonell

By virtue of its great utility in solving real-world problems, optimization modeling has been widely employed for optimal decision-making across various sectors, but it requires substantial expertise from operations research professionals.…

The rise of large language models (LLMs) has introduced transformative potential in automated code generation, addressing a wide range of software engineering challenges. However, empirical evaluation of LLM-based code generation lacks…

Software Engineering · Computer Science 2025-10-07 Nathalia Nascimento , Everton Guimaraes , Paulo Alencar

Large language models (LLMs) are powerful tools capable of handling diverse tasks. Comparing and selecting appropriate LLMs for specific tasks requires systematic evaluation methods, as models exhibit varying capabilities across different…

Computation and Language · Computer Science 2025-06-04 Anna Sokol , Elizabeth Daly , Michael Hind , David Piorkowski , Xiangliang Zhang , Nuno Moniz , Nitesh Chawla

Excel is a pervasive yet often complex tool, particularly for novice users, where runtime errors arising from logical mistakes or misinterpretations of functions pose a significant challenge. While large language models (LLMs) offer…

Data-driven science is an emerging paradigm where scientific discoveries depend on the execution of computational AI models against rich, discipline-specific datasets. With modern machine learning frameworks, anyone can develop and execute…

Machine Learning · Computer Science 2022-08-09 Seth Ockerman , John Wu , Christopher Stewart

Predictive benchmarking, the evaluation of machine learning models based on predictive performance and competitive ranking, is a central epistemic practice in machine learning research and an increasingly prominent method for scientific…

Machine Learning · Computer Science 2025-10-28 Timo Freiesleben , Sebastian Zezulka
‹ Prev 1 2 3 10 Next ›