Related papers: Large scale ab initio calculations based on three …
The plane wave method is most widely used for solving the Kohn-Sham equations in first-principles materials science computations. In this procedure, the three-dimensional (3-dim) trial wave functions' fast Fourier transform (FFT) is a…
We report on the GPU port of the Abinit high-performance simulation code for plane-wave DFT calculations. Large-scale electronic structure calculations require computing the electronic wave function by solving the Kohn-Sham equations…
Current algorithms for large-scale industrial optimization problems typically face a trade-off: they either require exponential time to reach optimal solutions, or employ problem-specific heuristics. To overcome these limitations, we…
We present a parallel algorithm for computing the approximate factorization of an $N$-by-$N$ kernel matrix. Once this factorization has been constructed (with $N \log^2 N $ work), we can solve linear systems with this matrix with $N \log N…
The fast Fourier transform (FFT) is a primitive kernel in numerous fields of science and engineering. OpenFFT is an open-source parallel package for 3-D FFTs, built on a communication-optimal domain decomposition method for achieving…
Radiative transfer modelling is part of many astrophysical simulations and is used to make synthetic observations and to assist analysis of observations. We concentrate on the modelling of the radio lines emitted by the interstellar medium.…
Large language models (LLMs), with their billions of parameters, pose substantial challenges for deployment on edge devices, straining both memory capacity and computational resources. Block Floating Point (BFP) quantisation reduces memory…
Balanced partitioning is often a crucial first step in solving large-scale graph optimization problems, e.g., in some cases, a big graph can be chopped into pieces that fit on one machine to be processed independently before stitching the…
Specialized function gradient computing hardware could greatly improve the performance of state-of-the-art optimization algorithms, e.g., based on gradient descent or conjugate gradient methods that are at the core of control, machine…
In the field of High Performance Computing, communications among processes represent a typical bottleneck for massively parallel scientific applications. Object of this research is the development of a network interface card with specific…
This research work focuses on the design of a high-resolution fast Fourier transform (FFT) /inverse fast Fourier transform (IFFT) processors for constraints analysis purpose. Amongst the major setbacks associated with such high resolution,…
Large language models (LLMs) have achieved near-human performance across diverse reasoning tasks, yet their deployment on resource-constrained Internet-of-Things (IoT) devices remains impractical due to massive parameter footprints and…
We consider the problem of parallelizing electronic structure computations in plane-wave Density Functional Theory. Because of the limited scalability of Fourier transforms, parallelism has to be found at the eigensolver level. We show how…
We present a shared-memory parallelization of flow-based refinement, which is considered the most powerful iterative improvement technique for hypergraph partitioning at the moment. Flow-based refinement works on bipartitions, so current…
Parallel algorithms for ab initio calculations of vibrations modes of solids are presented and implemented under PVM. Load balancing and communication problems are dealt with in order to increase parallelism efficiency. For accurate time…
The precise analysis and accurate measurement of harmonic provides a reliable scientific industrial application. However, the high-performance DSP processor is the important method of electrical harmonic analysis. Hence, in this research…
We introduce a novel superpoint-based transformer architecture for efficient semantic segmentation of large-scale 3D scenes. Our method incorporates a fast algorithm to partition point clouds into a hierarchical superpoint structure, which…
Due to the great difficulty in scalability, quantum computers are limited in the number of qubits during the early stages of the quantum computing regime. In addition to the required qubits for storing the corresponding eigenvector, suppose…
Among the objectives toward large-scale quantum computation is the quantum interconnect: a device which uses photons to interface qubits that otherwise could not interact. However, current approaches require photons indistinguishable in…
A finite element method is presented to compute time harmonic microwave fields in three dimensional configurations. Nodal-based finite elements have been coupled with an absorbing boundary condition to solve open boundary problems. This…