Related papers: Scalable Stellar Parameter Inference Using Python-…
To improve the accuracy and efficiency of high-dimensional stellar parameter inference in large spectroscopic datasets, we propose a projection-assisted parameter-inference framework -- Projected-Space Inference of Stellar Parameters…
We introduce the LAMOST Stellar Parameter Pipeline at Peking University --- LSP3, developed and implemented for the determinations of radial velocity $V_{\rm r}$ and stellar atmospheric parameters (effective temperature $T_{\rm eff}$,…
The Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) project performed its five year formal survey since Sep. 2012, already fulfilled the pilot survey and the 1st two years general survey with an output - spectroscopic…
Integer Linear Programming (ILP) is widely used for solving real-world optimization problems, including network routing, map routing, and traffic scheduling. However, ILP algorithms are sparse and branch-intensive, making them inefficient…
The accuracy of the estimated stellar atmospheric parameter decreases evidently with the decreasing of spectral signal-to-noise ratio (SNR) and there are a huge amount of this kind observations, especially in case of SNR$<$30. Therefore, it…
We present a versatile GPU-based parallel version of Logistic Regression (LR), aiming to address the increasing demand for faster algorithms in binary classification due to large data sets. Our implementation is a direct translation of the…
Sequence parallelism (SP) serves as a prevalent strategy to handle long sequences that exceed the memory limit of a single device. However, for linear sequence modeling methods like linear attention, existing SP approaches do not take…
This work introduces an ensemble parameter estimation framework that enables the Lumped Parameter Linear Superposition (LPLSP) method to generate reduced order thermal models from a single transient dataset. Unlike earlier implementations…
Large language models (LLMs) are popular around the world due to their powerful understanding capabilities. As the core component of LLMs, accelerating Transformer through parallelization has gradually become a hot research topic. Mask…
Speculative decoding is an effective and lossless method for Large Language Model (LLM) inference acceleration. It employs a smaller model to generate a draft token sequence, which is then verified by the original base model. In multi-GPU…
Sparse fine-tuning techniques adapt LLMs to downstream tasks by only tuning a sparse subset of model parameters. However, the effectiveness of sparse adaptation depends on optimally selecting the model parameters to be fine-tuned. In this…
Scientific software is often driven by multiple parameters that affect both accuracy and performance. Since finding the optimal configuration of these parameters is a highly complex task, it extremely common that the software is used…
Large Sky Area Multi-Object Fiber Spectroscopic Telescope (LAMOST) acquired tens of millions of low-resolution stellar spectra. The large amount of the spectra result in the urgency to explore automatic atmospheric parameter estimation…
Large-scale linear programs (LPs) arise in many decision systems, including ranking, allocation, and matching problems that must be solved repeatedly at massive scale. Prior work such as ECLIPSE and LinkedIn's open-source DuaLip showed that…
Accelerating large language model (LLM) inference is critical for real-world deployments requiring high throughput and low latency. Contextual sparsity, where each token dynamically activates only a small subset of the model parameters,…
Bayesian inference tasks continue to pose a computational challenge. This especially holds for spatial-temporal modeling where high-dimensional latent parameter spaces are ubiquitous. The methodology of integrated nested Laplace…
Fine-tuning large language models (LLMs) requires significant memory, often exceeding the capacity of a single GPU. A common solution to this memory challenge is offloading compute and data from the GPU to the CPU. However, this approach is…
Sequential robot manipulation tasks require finding collision-free trajectories that satisfy geometric constraints across multiple object interactions in potentially high-dimensional configuration spaces. Solving these problems in real-time…
The increasing scale and complexity of large language models (LLMs) pose significant inference latency challenges, primarily due to their autoregressive decoding paradigm characterized by the sequential nature of next-token prediction. By…
Inference of Large Language Models (LLMs) across computer clusters has become a focal point of research in recent times, with many acceleration techniques taking inspiration from CPU speculative execution. These techniques reduce…