Computer Science
Safety applications in vehicle-to-everything communications and Cooperative Intelligent Transport Systems rely on reliable and timely message exchange, which in turn depends on accurate modeling of wireless signal propagation. Simulation…
Adjacent GEMM problems that differ by a single 128-element step in N can show 30% different throughput on the same GPU. This pervasive performance ruggedness - invisible to roofline analysis and peak-FLOPs intuition, yet dominant for every…
Large language models have achieved remarkable capabilities through scaling, and this paper does not challenge that. It instead investigates a different question: once large models already exist, can they become more accessible to…
Half precision (FP16) promises to double FFT throughput on GPUs, but the prevailing view is that its 10-bit mantissa makes it unsuitable for radar-grade signal processing. We show this framing is wrong on Apple Silicon: the binding…
In a computer system, multiple indispensable components-such as the CPU, memory, and others-work together with other essential components to produce an overall effect, which can only be measured on an independently running system. Since the…
We consider the problem of computing sample points in each connected component of a semi-algebraic set defined by the non-vanishing or the positivity of an n-variate polynomial of degree d, with rational coefficients of bit size bounded by…
Large Language Models (LLMs) have demonstrated impressive progress in complex reasoning tasks, largely driven by the Chain-of-Thought (CoT) paradigm, which decomposes difficult problems into intermediate steps. However, CoT reasoning…
Recurring industrial analytics and machine-learning workflows are becoming a major computational burden in modern engineering practice. Large parametric database generation, scheduled model retraining, repeated evaluation pipelines, and…
We study the problem of computing the isolated regular solutions of a system \((f_1,\ldots,f_n)\) of \(n\) polynomial equations in \(n\) variables \((X_1, \dots, X_n)\) over a field of characteristic zero \(k\). We focus on systems with a…
Modern computing systems process jobs with resource requirements such as CPU and memory, which are described by multiresource jobs (MRJ) queueing models. In practice, job resource requirements are spread out over so many values, that it is…
We present a new algorithm for fast matrix multiplication using tensor decompositions which have special features. Thanks to these features we obtain exponents lower than what the rank of the tensor decomposition suggests. In particular for…
This paper presents a generalised symbolic algorithm for solving systems of linear algebraic equations with multi-diagonal coefficient matrices. The algorithm is given in a pseudocode. A theorem which gives the condition for correctness of…
JPEG decode is routine ML infrastructure, but Python decoder choices are often justified by single-process, single-thread microbenchmarks. We audit this evaluation assumption with thirteen Python-accessible JPEG decode paths on five matched…
We use discrete-event simulation to quantify the impact of fiber latency on the efficacy of geo-distributed AI model training with data parallelism. We conclude that the optimum distances between two AI clusters is 10-100km, over which…
Medical tourists face a scheduling problem that differs from that of local patients. Treatment delays extend not just care delivery time, but also accommodation and travel costs. This study develops a hybrid agent-based and discrete-event…
A new symbolic algorithm to compute sums of squares multipliers (certificates) to witness the membership of non-negative univariate polynomials in a saturated univariate quadratic module is presented. Certificates are first computed in…
Scalable vector instruction sets such as Arm SVE enable vector-length-agnostic (VLA) execution, allowing a single implementation to adapt across hardware with different vector lengths. However, they complicate compiler code generation, as…
Objective: Acute mountain sickness (AMS) is the most prevalent altitude illness, affecting unacclimatized individuals ascending above 2,500 m and potentially escalating to life threatening cerebral or pulmonary edema. Conventional machine…
This work extends a framework for predicting the performance of High-Performance Computing (HPC) workloads using Machine Learning (ML). A common limitation in performance modeling is the restricted number of hardware counters that can be…
The positivity of the Gram-Charlier probability density function has been a subject of extensive study for decades. Since Barton and Dennis (1952) introduced numerical positivity conditions, no analytic closed-form expression was available…