Related papers: OMI4papps: Optimisation, Modelling and Implementat…
Monolithic Active Pixel Sensors (MAPS) combine the sensing part and the front-end electronics in the same silicon layer, making use of CMOS technology. Profiting from the progresses of this commercial process, MAPS have been undergoing…
Exascale computing systems will exhibit high degrees of hierarchical parallelism, with thousands of computing nodes and hundreds of cores per node. Efficiently exploiting hierarchical parallelism is challenging due to load imbalance that…
Reactive molecular dynamics simulations are computationally demanding. Reaching spatial and temporal scales where interesting scientific phenomena can be observed requires efficient and scalable implementations on modern hardware. In this…
Nowadays, latency-critical, high-performance applications are parallelized even on power-constrained client systems to improve performance. However, an important scenario of fine-grained tasking on simultaneous multithreading CPU cores in…
The Intel Xeon Phi manycore processor is designed to provide high performance matrix computations of the type often performed in data analysis. Common data analysis environments include Matlab, GNU Octave, Julia, Python, and R. Achieving…
Efficient implementations of the classical molecular dynamics (MD) method for Lennard-Jones particle systems are considered. Not only general algorithms but also techniques that are efficient for some specific CPU architectures are also…
Heterogeneous computing is emerging as a mandatory requirement for power-efficient system design. With this aim, modern heterogeneous platforms like Zynq All-Programmable SoC, that integrates ARM-based SMP and programmable logic, have been…
Finding the sparsest solution to the underdetermined system $\mathbf{y}=\mathbf{Ax}$, given a tolerance, is known to be NP-hard. Many approximate solutions to this problem exist, and Orthogonal Matching Pursuit (OMP) is one of the most…
This paper presents implementation details and empirical results for a hybrid message passing and shared memory paralleliziation of the adaptive integral method (AIM). AIM is implemented on a (near) petaflop supercomputing cluster of…
Mixed-precision algorithms have been proposed as a way for scientific computing to benefit from some of the gains seen for artificial intelligence (AI) on recent high performance computing (HPC) platforms. A few applications dominated by…
Kernel matrix-vector product is ubiquitous in many science and engineering applications. However, a naive method requires $O(N^2)$ operations, which becomes prohibitive for large-scale problems. We introduce a parallel method that provably…
GPUs and other accelerators are increasingly used for scientific computing. In the future, we want to add GPU support to parallel adaptive mesh refinement (AMR) codes written in Fortran. To understand which changes are necessary to obtain…
Recent advances in machine learning force fields (MLFF) have significantly extended the reach of atomistic simulations. Continuous progress in this field requires reliable reference datasets, accurate MLFF architectures, and efficient…
As fusion energy devices advance, plasma simulations are crucial for reactor design. Our work extends BIT1 hybrid parallelization by integrating MPI with OpenMP and OpenACC, focusing on asynchronous multi-GPU programming. Results show…
Heterogeneous computing is becoming mainstream in all scopes. This new era in computer architecture brings a new paradigm called Accelerator Level Parallelism (ALP). In ALP, accelerators are used concurrently to provide unprecedented levels…
Machine-learned interatomic potentials can offer near first-principles accuracy but are computationally expensive, limiting their application to large-scale molecular dynamics simulations. Inspired by quantum mechanics/molecular mechanics…
The trend towards highly parallel multi-processing is ubiquitous in all modern computer architectures, ranging from handheld devices to large-scale HPC systems; yet many applications are struggling to fully utilise the multiple levels of…
Scientific computing in the exascale era demands increased computational power to solve complex problems across various domains. With the rise of heterogeneous computing architectures the need for vendor-agnostic, performance portability…
The Simplex tableau has been broadly used and investigated in the industry and academia. With the advent of the big data era, ever larger problems are posed to be solved in ever larger machines whose architecture type did not exist in the…
Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and…