Related papers: Benchmarking Simulation-Based Inference
Test functions are important to validate and compare the performance of optimization algorithms. There have been many test or benchmark functions reported in the literature; however, there is no standard list or set of benchmark functions.…
Probabilistic programming is a rapidly developing programming paradigm which enables the formulation of Bayesian models as programs and the automation of posterior inference. It facilitates the development of models and conducting Bayesian…
Scaling up data, parameters, and test-time computation has been the mainstream methods to improve LLM systems (LLMsys), but their upper bounds are almost reached due to the gradual depletion of high-quality data and marginal gains obtained…
Procedures in assessing the impact of serial dependency on performance analysis are usually built on parametrically specified models. In this paper, we propose a robust, nonparametric approach to carry out this assessment, by computing the…
We propose network benchmarking: a procedure to efficiently benchmark the quality of a quantum network link connecting quantum processors in a quantum network. This procedure is based on the standard randomized benchmarking protocol and…
Modern real-world application scenarios like Internet services consist of a diversity of AI and non-AI modules with huge code sizes and long and complicated execution paths, which raises serious benchmarking or evaluating challenges. Using…
Complex phenomena in engineering and the sciences are often modeled with computationally intensive feed-forward simulations for which a tractable analytic likelihood does not exist. In these cases, it is sometimes necessary to estimate an…
In many statistical problems, the data distribution is specified through a generative process for which the likelihood function is analytically intractable, yet inference on the associated model parameters remains of primary interest. We…
Deep Bayesian neural network has aroused a great attention in recent years since it combines the benefits of deep neural network and probability theory. Because of this, the network can make predictions and quantify the uncertainty of the…
Benchmarks are standards that allow to identify opportunities for improvement among comparable units. This study suggests a 2-step methodology for calculating probabilistic benchmarks in noisy data sets: (i) double-hyperbolic undersampling…
We propose an computational framework for real-time risk assessment and prioritizing for random outcomes without prior information on probability distributions. The basic model is built based on satisficing measure (SM) which yields a…
Probabilistic programming is a growing area that strives to make statistical analysis more accessible, by separating probabilistic modelling from probabilistic inference. In practice this decoupling is difficult. No single inference…
Quantitative Artificial Intelligence (AI) Benchmarks have emerged as fundamental tools for evaluating the performance, capability, and safety of AI models and systems. Currently, they shape the direction of AI development and are playing an…
In the NLP community, recent years have seen a surge of research activities that address machines' ability to perform deep language understanding which goes beyond what is explicitly stated in text, rather relying on reasoning and knowledge…
Clinical researchers often select among and evaluate risk prediction models using standard machine learning metrics based on confusion matrices. However, if these models are used to allocate interventions to patients, standard metrics…
Inverse problems and, in particular, inferring unknown or latent parameters from data are ubiquitous in engineering simulations. A predominant viewpoint in identifying unknown parameters is Bayesian inference where both prior information…
The evaluation of clustering algorithms can involve running them on a variety of benchmark problems, and comparing their outputs to the reference, ground-truth groupings provided by experts. Unfortunately, many research papers and graduate…
This paper studies decision-making and statistical inference for two-sided matching markets via matrix completion. In contrast to the independent sampling assumed in classical matrix completion literature, the observed entries, which arise…
As models become increasingly sophisticated, conventional algorithm benchmarks are increasingly saturated, underscoring the need for more challenging benchmarks to guide future improvements in algorithmic reasoning. This paper introduces…
During the past sixty years, a lot of effort has been made regarding the productive efficiency. Such endeavours provided an extensive bibliography on this subject, culminating in two main methods, named the Stochastic Frontier Analysis…