Related papers: A consistent multi-user framework for assessing sy…
Objective evaluation (OE) is essential to artificial music, but it's often very hard to determine the quality of OEs. Hitherto, subjective evaluation (SE) remains reliable and prevailing but suffers inevitable disadvantages that OEs may…
Comparative evaluation lies at the heart of science, and determining the accuracy of a computational method is crucial for evaluating its potential as well as for guiding future efforts. However, metrics that are typically used have…
We present MOSS, a multi-objective optimization framework for constructing stable sets of decision rules. MOSS incorporates three important criteria for interpretability: sparsity, accuracy, and stability, into a single multi-objective…
Recent advances in generative models have enabled modern Text-to-Audio (TTA) systems to synthesize audio with high perceptual quality. However, TTA systems often struggle to maintain semantic consistency with the input text, leading to…
Multimodal Sentiment Analysis (MSA) aims to infer human sentiment from textual, acoustic, and visual signals. In real-world scenarios, however, multimodal inputs are often compromised by dynamic noise or modality missingness. Existing…
Mixture-of-Experts (MoE) architectures have emerged as a promising direction, offering efficiency and scalability by activating only a subset of parameters during inference. However, current research remains largely performance-centric,…
Evaluating tracking model performance is a complicated task, particularly for non-contiguous, multi-object trackers that are crucial in defense applications. While there are various excellent tracking benchmarks available, this work expands…
The problem of identifying to which of a given set of classes objects belong is ubiquitous, occurring in many research domains and application areas, including medical diagnosis, financial decision making, online commerce, and national…
The majority of multi-agent system (MAS) implementations aim to optimise agents' policies with respect to a single objective, despite the fact that many real-world problem domains are inherently multi-objective in nature. Multi-objective…
The objective of this paper is to design performance metrics and respective formulas to quantitatively evaluate the achievement of set objectives and expected outcomes both at the course and program levels. Evaluation is defined as one or…
In the recent past, the computer vision community has developed centralized benchmarks for the performance evaluation of a variety of tasks, including generic object and pedestrian detection, 3D reconstruction, optical flow, single-object…
Assessing the empirical performance of Multi-Objective Evolutionary Algorithms (MOEAs) is vital when we extensively test a set of MOEAs and aim to determine a proper ranking thereof. Multiple performance indicators, e.g., the generational…
The sparse Mixture-of-Experts (MoE) architecture is increasingly favored for scaling Large Language Models (LLMs) efficiently, but it depends on heterogeneous compute and memory resources. These factors jointly affect system Cost, Accuracy,…
The sparse Mixture-of-Experts (MoE) architecture is increasingly favored for scaling Large Language Models (LLMs) efficiently, but it depends on heterogeneous compute and memory resources. These factors jointly affect system Cost, Accuracy,…
Network operators need to continuosly upgrade their infrastructures in order to keep their customer satisfaction levels high. Crowdsourcing-based approaches are generally adopted, where customers are directly asked to answer surveys about…
Context: Software engineering has a problem in that when we empirically evaluate competing prediction systems we obtain conflicting results. Objective: To reduce the inconsistency amongst validation study results and provide a more formal…
Important computational physics problems are often large-scale in nature, and it is highly desirable to have robust and high performing computational frameworks that can quickly address these problems. However, it is no trivial task to…
In this work we present the modular Crowd Simulation Evaluation through Composition framework (CSEC) which provides a quantitative comparison between different pedestrian and crowd simulation approaches. Evaluation is made based on the…
In the context of QoE management, network and service providers commonly rely on models that map system QoS conditions (e.g., system response time, paket loss, etc.) to estimated end user QoE values. Observable QoS conditions in the system…
Compositionality supports the manipulation of large systems by working on their components. For model-based testing, this means that large systems can be tested by modelling and testing their components: passing tests for all components…