Related papers: A consistent multi-user framework for assessing sy…

Armor: A Benchmark for Meta-evaluation of Artificial Music

Objective evaluation (OE) is essential to artificial music, but it's often very hard to determine the quality of OEs. Hitherto, subjective evaluation (SE) remains reliable and prevailing but suffers inevitable disadvantages that OEs may…

Sound · Computer Science 2021-08-31 Songhe Wang , Zheng Bao , Jingtong E

Spot the Difference: Accuracy of Numerical Simulations via the Human Visual System

Comparative evaluation lies at the heart of science, and determining the accuracy of a computational method is crucial for evaluating its potential as well as for guiding future efforts. However, metrics that are typically used have…

Data Analysis, Statistics and Probability · Physics 2019-07-10 Kiwon Um , Xiangyu Hu , Bing Wang , Nils Thuerey

MOSS: Multi-Objective Optimization for Stable Rule Sets

We present MOSS, a multi-objective optimization framework for constructing stable sets of decision rules. MOSS incorporates three important criteria for interpretability: sparsity, accuracy, and stability, into a single multi-objective…

Optimization and Control · Mathematics 2025-07-31 Brian Liu , Rahul Mazumder

MoEScore: Mixture-of-Experts-Based Text-Audio Relevance Score Prediction for Text-to-Audio System Evaluation

Recent advances in generative models have enabled modern Text-to-Audio (TTA) systems to synthesize audio with high perceptual quality. However, TTA systems often struggle to maintain semantic consistency with the input text, leading to…

Sound · Computer Science 2026-01-13 Bochao Sun , Yang Xiao , Han Yin

QA-MoE: Towards a Continuous Reliability Spectrum with Quality-Aware Mixture of Experts for Robust Multimodal Sentiment Analysis

Multimodal Sentiment Analysis (MSA) aims to infer human sentiment from textual, acoustic, and visual signals. In real-world scenarios, however, multimodal inputs are often compromised by dynamic noise or modality missingness. Existing…

Artificial Intelligence · Computer Science 2026-04-09 Yitong Zhu , Yuxuan Jiang , Guanxuan Jiang , Bojing Hou , Peng Yuan Zhou , Ge Lin Kan , Yuyang Wang

Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms

Mixture-of-Experts (MoE) architectures have emerged as a promising direction, offering efficiency and scalability by activating only a subset of parameters during inference. However, current research remains largely performance-centric,…

Machine Learning · Computer Science 2025-09-30 Jiahao Ying , Mingbao Lin , Qianru Sun , Yixin Cao

MONCE Tracking Metrics: a comprehensive quantitative performance evaluation methodology for object tracking

Evaluating tracking model performance is a complicated task, particularly for non-contiguous, multi-object trackers that are crucial in defense applications. While there are various excellent tracking benchmarks available, this work expands…

Computer Vision and Pattern Recognition · Computer Science 2022-06-16 Kenneth Rapko , Wanlin Xie , Andrew Walsh

Selecting a classification performance measure: matching the measure to the problem

The problem of identifying to which of a given set of classes objects belong is ubiquitous, occurring in many research domains and application areas, including medical diagnosis, financial decision making, online commerce, and national…

Machine Learning · Computer Science 2024-09-20 David J. Hand , Peter Christen , Sumayya Ziyad

Multi-Objective Multi-Agent Decision Making: A Utility-based Analysis and Survey

The majority of multi-agent system (MAS) implementations aim to optimise agents' policies with respect to a single objective, despite the fact that many real-world problem domains are inherently multi-objective in nature. Multi-objective…

Multiagent Systems · Computer Science 2020-11-17 Roxana Rădulescu , Patrick Mannion , Diederik M. Roijers , Ann Nowé

Design and Implementation of Performance Metrics for Evaluation of Assessments Data

The objective of this paper is to design performance metrics and respective formulas to quantitatively evaluate the achievement of set objectives and expected outcomes both at the course and program levels. Evaluation is defined as one or…

Physics Education · Physics 2015-09-16 Irfan Ahmed , Arif Bhatti

MOTChallenge 2015: Towards a Benchmark for Multi-Target Tracking

In the recent past, the computer vision community has developed centralized benchmarks for the performance evaluation of a variety of tasks, including generic object and pedestrian detection, 3D reconstruction, optical flow, single-object…

Computer Vision and Pattern Recognition · Computer Science 2015-04-09 Laura Leal-Taixé , Anton Milan , Ian Reid , Stefan Roth , Konrad Schindler

On Statistical Analysis of MOEAs with Multiple Performance Indicators

Assessing the empirical performance of Multi-Objective Evolutionary Algorithms (MOEAs) is vital when we extensively test a set of MOEAs and aim to determine a proper ranking thereof. Multiple performance indicators, e.g., the generational…

Neural and Evolutionary Computing · Computer Science 2020-12-03 Hao Wang , Carlos Igncio Hernández Castellanos , Tome Eftimov

MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems

The sparse Mixture-of-Experts (MoE) architecture is increasingly favored for scaling Large Language Models (LLMs) efficiently, but it depends on heterogeneous compute and memory resources. These factors jointly affect system Cost, Accuracy,…

Machine Learning · Computer Science 2025-05-22 Yinsicheng Jiang , Yao Fu , Yeqi Huang , Ping Nie , Zhan Lu , Leyang Xue , Congjie He , Man-Kit Sit , Jilong Xue , Li Dong , Ziming Miao , Dayou Du , Tairan Xu , Kai Zou , Edoardo Ponti , Luo Mai

MoE-CAP: Benchmarking Cost, Accuracy and Performance of Sparse Mixture-of-Experts Systems

The sparse Mixture-of-Experts (MoE) architecture is increasingly favored for scaling Large Language Models (LLMs) efficiently, but it depends on heterogeneous compute and memory resources. These factors jointly affect system Cost, Accuracy,…

Machine Learning · Computer Science 2025-11-21 Yinsicheng Jiang , Yao Fu , Yeqi Huang , Ping Nie , Zhan Lu , Leyang Xue , Congjie He , Man-Kit Sit , Jilong Xue , Li Dong , Ziming Miao , Dayou Du , Tairan Xu , Kai Zou , Edoardo Ponti , Luo Mai

Unsatisfied Today, Satisfied Tomorrow: a simulation framework for performance evaluation of crowdsourcing-based network monitoring

Network operators need to continuosly upgrade their infrastructures in order to keep their customer satisfaction levels high. Crowdsourcing-based approaches are generally adopted, where customers are directly asked to answer surveys about…

Networking and Internet Architecture · Computer Science 2020-11-02 Andrea Pimpinella , Marianna Repossi , Alessandro Enrico Cesare Redondi

Evaluating prediction systems in software project estimation

Context: Software engineering has a problem in that when we empirically evaluate competing prediction systems we obtain conflicting results. Objective: To reduce the inconsistency amongst validation study results and provide a more formal…

Software Engineering · Computer Science 2021-01-15 Martin Shepperd , Stephen G. MacDonell

A performance spectrum for parallel computational frameworks that solve PDEs

Important computational physics problems are often large-scale in nature, and it is highly desirable to have robust and high performing computational frameworks that can quickly address these problems. However, it is no trivial task to…

Mathematical Software · Computer Science 2017-09-18 J. Chang , K. B. Nakshatrala , M. G. Knepley , L. Johnsson

A Human and Group Behaviour Simulation Evaluation Framework utilising Composition and Video Analysis

In this work we present the modular Crowd Simulation Evaluation through Composition framework (CSEC) which provides a quantitative comparison between different pedestrian and crowd simulation approaches. Evaluation is made based on the…

Computer Vision and Pattern Recognition · Computer Science 2018-11-27 Rob Dupre , Vasileios Argyriou

From QoS Distributions to QoE Distributions: a System's Perspective

In the context of QoE management, network and service providers commonly rely on models that map system QoS conditions (e.g., system response time, paket loss, etc.) to estimated end user QoE values. Observable QoS conditions in the system…

Multimedia · Computer Science 2020-03-31 Tobias Hossfeld , Poul E. Heegaard , Martin Varela , Lea Skorin-Kapov , Markus Fiedler

Testing Compositionality

Compositionality supports the manipulation of large systems by working on their components. For model-based testing, this means that large systems can be tested by modelling and testing their components: passing tests for all components…

Software Engineering · Computer Science 2025-08-01 Gijs van Cuyck , Lars van Arragon , Jan Tretmans