数学软件
We present CppSim, a C++/GPU 2D Ising simulator for Heisenberg-picture tensor network time evolution on GPUs. The key computational contributions are: first, a zero-malloc GPU workspace that pre-allocates all buffers at startup; second, a…
Ozaki scheme II emulates high-precision matrix multiplication using low-precision integer matrix operations based on the Chinese remainder theorem (CRT). It first scales the high-precision matrices to convert them into integer matrices. For…
Heavy-tailed distributions are increasingly found to better fit empirical data in engineering, finance, physics, network science, and related fields. Among them, $\alpha$-stable distributions play a central role being limiting laws in the…
We describe libhmm, a C++20 library for Hidden Markov Model parameter estimation, sequence decoding, and model selection. libhmm addresses two gaps in existing software: the absence of a well-maintained, zero-dependency C++ HMM library…
Neural networks are increasingly deployed in scientific, safety critical, and mission critical pipelines, yet verification and analysis are often performed outside the programming environment that defines and runs the model. This creates a…
Efficient solutions of large-scale, ill-conditioned and indefinite algebraic equations are ubiquitously needed in numerous computational fields, including multiphysics simulations, machine learning, and data science. Because of their…
A formulation of elliptic boundary value problems is used to develop the first discrete exterior calculus (DEC) library for massively parallel computations with 3D domains. This can be used for steady-state analysis of any physical process…
This paper presents an experimental performance study of implementations of three symbolic algorithms for solving band matrix systems of linear algebraic equations with heptadiagonal, pentadiagonal, and tridiagonal coefficient matrices. The…
We present the Matlab toolbox MacaulayLab, which implements numerical linear algebra algorithms for solving multivariate polynomial systems and rectangular multiparameter eigenvalue problems. Its structure and functionality are the result…
We describe a C implementation of the Las Vegas algorithm of Birmpilis, Labahn and Storjohann from 2020 for computing the Smith normal form of a nonsingular integer matrix. The algorithm computes a Smith massager for the input matrix using…
Multimodal density estimation is a fundamental problem in scientific computing. Determining the number of modes in a distribution is a core numerical challenge with applications across ecology, economics, genomics, and astronomy. While the…
The upcoming IEEE-P3109 standard for low-precision floating-point arithmetic can become the foundation of future machine learning hardware and software. Unlike IEEE-754, P3109 introduces a parametric framework defined by bitwidth,…
Following recent interest in correctly rounded math library functions (as currently recommended by the IEEE 754 standard), we have designed several SIMD algorithms for one-input single precision functions and integrated them into our CPU…
This paper proposes sufficient, yet more general conditions for applying FastTwoSum as an error-free transformation (EFT) under all faithful rounding modes. Additionally, it also identifies guarantees tailored to round-to-odd for…
While interior point methods have been the centerpiece of nonlinear programming tools used in science and engineering, their reliance on linear solvers that can tackle sparse symmetric indefinite and highly ill-conditioned problems made it…
Most numerical solvers and libraries nowadays are implemented to use mathematical models created with language-specific built-in data types (e.g. real in Fortran or double in C) and their respective elementary algebra implementations.…
Optimal transport (OT) has emerged as a fundamental tool in modern machine learning, yet its computational cost remains a significant bottleneck for large-scale applications. While harnessing the massive parallelism of modern GPU hardware…
Multiple-precision floating-point branch-free algorithms can significantly accelerate multi-component arithmetic implemented by combining hardware-based binary64 and binary32, particularly for triple- and quadruple-precision computations.…
We present CombOL (Combinatorial Objects Library), an open-source library for the enumeration and Boltzmann sampling of combinatorial classes. Classes can be specified by a concise string syntax, and may depend on an arbitrary number of…
The factorization of skew-symmetric matrices is a critically understudied area of dense linear algebra, particularly in comparison to that of general and symmetric matrices. While some algorithms can be adapted from the symmetric case, the…