Related papers: Twinkle: A GPU-based binary-lens microlensing code…
To assess how future progress in gravitational microlensing computation at high optical depth will rely on both hardware and software solutions, we compare a direct inverse ray-shooting code implemented on a graphics processing unit (GPU)…
Strong gravitational lensing is a powerful probe of cosmology and the dark matter distribution. Efficient lensing software is already a necessity to fully use its potential and the performance demands will only increase with the upcoming…
We present an improved inverse ray-shooting code based on GPUs for generating microlensing magnification maps. In addition to introducing GPUs for acceleration, we put the efforts in two aspects: (i) A standard circular lens plane is…
Galaxy-galaxy strong lensing provides a powerful probe of galaxy formation, evolution, and the properties of dark matter and dark energy. However, conventional lens-modeling approaches are computationally expensive and require fine-tuning…
Robust modelling of strong lensing systems is fundamental to exploit the information they contain about the distribution of matter in galaxies and clusters. In this work, we present Lensed, a new code which performs forward parametric…
The impending discovery and monitoring of hundreds of new gravitationally lensed quasars and supernovae from upcoming ground and space based large area surveys such as LSST, \textit{Euclid}, and \textit{Roman} necessitates the development…
We introduce Mirage, the first multi-level superoptimizer for tensor programs. A key idea in Mirage is $\mu$Graphs, a uniform representation of tensor programs at the kernel, thread block, and thread levels of the GPU compute hierarchy.…
A modern graphics processing unit (GPU) is able to perform massively parallel scientific computations at low cost. We extend our implementation of the checkerboard algorithm for the two dimensional Ising model [T. Preis et al., J. Comp.…
Recent studies have shown that Binary Graph Neural Networks (GNNs) are promising for saving computations of GNNs through binarized tensors. Prior work, however, mainly focused on algorithm designs or training techniques, leaving it open to…
Document understanding and GUI interaction are among the highest-value applications of Vision-Language Models (VLMs), yet they impose exceptionally heavy computational burden: fine-grained text and small UI elements demand high-resolution…
Recently, deep learning has been an area of intense research. However, as a kind of computing-intensive task, deep learning highly relies on the scale of GPU memory, which is usually prohibitive and scarce. Although some extensive works…
The conservative Post-Newtonian (PN) Hamiltonian formulation of spinning compact binaries has six integrals of motion including the total energy, the total angular momentum and the constant unit lengths of spins. The manifold correction…
Convolutional neural networks have recently achieved significant breakthroughs in various image classification tasks. However, they are computationally expensive,which can make their feasible mplementation on embedded and low-power devices…
Single-cell sequencing technologies reveal cellular heterogeneity at high resolution, advancing our understanding of biological complexity. As datasets start to scale to tens of millions of cells, computational workflows face substantial…
The rapid evolution of programming languages and software systems has necessitated the implementation of multilingual and scalable clone detection tools. However, it is difficult to achieve the above requirements at the same time. Most…
Binarization is an attractive strategy for implementing lightweight Deep Convolutional Neural Networks (CNNs). Despite the unquestionable savings offered, memory footprint above all, it may induce an excessive accuracy loss that prevents a…
Modern computer systems typically conbine multicore CPUs with accelerators like GPUs for inproved performance and energy efficiency. However, these sys- tems suffer from poor performance portability, code tuned for one device must be…
In the domain of recommendation and collaborative filtering, Graph Contrastive Learning (GCL) has become an influential approach. Nevertheless, the reasons for the effectiveness of contrastive learning are still not well understood. In this…
With the upcoming generation of telescopes, cluster scale strong gravitational lenses will act as an increasingly relevant probe of cosmology and dark matter. The better resolved data produced by current and future facilities requires…
This paper studies the problem of designing compact binary architectures for vision multi-layer perceptrons (MLPs). We provide extensive analysis on the difficulty of binarizing vision MLPs and find that previous binarization methods…