Computer Science
Artificial intelligence assistants deployed in online learning environments create new opportunities to collect large volumes of learner interaction data and generate insights to improve student outcomes. Architecture for AI-Augmented…
We present RaFI, a CUDA and MPI based software framework that simplifies the task of building GPU-enabled data-parallel software where rays or similar work items need to migrate between different GPUs. RaFI provides a simple interface for…
As autonomous language model agents proliferate, forming an emerging agentic web with real-world consequences, what credibility signals can you use to decide whether to trust an unfamiliar agent in the wild and delegate to it? A natural…
We present and show how to implement a non-trivial all-to-all communication algorithm for arbitrary $d$-dimensional tori effectively in MPI. Given a factorization of the number of processes $p$ into $d$ factors that can be mapped onto a…
AI researchers have been advancing socially intelligent AI agents (Social-AI) across embodiments, from chatbots to physical robots. As Social-AI is increasingly deployed in everyday settings, decisions about the roles these agents should…
In recent years, HPC systems and CPU architectures as their central components, have become increasingly complex, making application development and optimization quite challenging. In this respect, intuitive performance models like the…
Sparse tensors are the most used representation of sparse multidimensional data. Operations that decompose them, selecting their most important features while reducing their dimension, have become prevalent procedures in machine learning.…
Pipeline parallelism is essential for large-scale model training, but existing asynchronous approaches often degrade convergence due to parameter mismatch between forward and backward passes. We propose Asynchronous Multi-Directional…
We examine the information security practices of Ugandan climate activists protesting the development of the East African Crude Oil Pipeline (EACOP). We conducted five-week fieldwork in Kampala, Uganda, which included interviews with 13…
Maximal Independent Set (MIS) in a graph is a fundamental problem with applications in resource allocation, scheduling, and network optimization. Although graphs are inherently un-structured and challenging for GPU parallelism due to…
Modern logistics systems tend to generate continuous streams of data from sources such as GPS, IoT sensors, and logistics management systems. The aggregation, processing, and analysis of data have become vital for monitoring operations,…
The trend of increasing cluster sizes of supercomputers leads to a growing susceptibility to Silent Data Corruption (SDC) that can invalidate program results. A common strategy for SDC protection is replication, where the computation is…
Since public access to generative AI tools became widespread, federal civil litigation has seen a marked increase in pro se (self-represented) plaintiffs. This paper analyzes that shift using ~2.8 million filings, asking whether the…
Compute governance proposals often rely on the assumption that frontier AI training requires large, detectable computing clusters. However, recent advances in distributed training algorithms could allow developers to conduct frontier-scale…
Modern deep learning workloads increasingly exhibit dynamic, metadata-driven execution, where runtime-generated information determines memory provisioning and kernel launch decisions. In sampling-based graph neural network (GNN) training,…
As AI systems increasingly shape political views, defining and evaluating AI political neutrality is an urgent problem. Here, we propose a new definition of AI political neutrality and design a large-scale user study to test it, releasing a…
Device-aware quantum simulation increasingly requires HPC-scale accelerators, yet secure supercomputers expose batch-scheduled execution environments rather than the interactive, backend-oriented interfaces expected by quantum software. The…
The Monte Cimone project provides a RISC-V testbed for High-Performacne Computing cluster. This paper presents Monte Cimone v3 (MCv3), the third iteration of the Monte Cimone RISC-V HPC cluster, integrating the SOPHGO Sophon SG2044…
Speculative Decoding (SD) has emerged as a critical technique for accelerating Large Language Model (LLM) inference. Unlike deterministic system optimizations, SD performance is inherently data-dependent, meaning that diverse and…
Gaussian processes are widely used in machine learning domains but remain computationally demanding, limiting their efficient scalability across emerging hardware platforms. The GPRat library addresses these challenges using the HPX…