English
Related papers

Related papers: Accelerating Machine Learning Primitives on Commod…

200 papers

Sliding window sums are widely used for string indexing, hashing and time series analysis. We have developed a family of the generic vectorized sliding sum algorithms that provide speedup of O(P/w) for window size $w$ and number of…

Machine Learning · Computer Science 2023-05-29 Roman Snytsar

Deep neural networks (DNNs) require very large amounts of computation both for training and for inference when deployed in the field. A common approach to implementing DNNs is to recast the most computationally expensive operations as…

Computer Vision and Pattern Recognition · Computer Science 2017-09-12 Andrew Anderson , Aravind Vasudevan , Cormac Keane , David Gregg

Sliding window convolutional networks (ConvNets) have become a popular approach to computer vision problems such as image segmentation, and object detection and localization. Here we consider the problem of inference, the application of a…

Distributed, Parallel, and Cluster Computing · Computer Science 2016-06-21 Aleksandar Zlateski , Kisuk Lee , H. Sebastian Seung

This paper addresses the problem of efficiently classifying high-dimensional data over decentralized networks. Penalized support vector machines (SVMs) are widely used for high-dimensional classification tasks. However, the double…

Machine Learning · Statistics 2025-03-11 Canyi Chen , Nan Qiao , Liping Zhu

Quantization has emerged to be an effective way to significantly boost the performance of deep neural networks (DNNs) by utilizing low-bit computations. Despite having lower numerical precision, quantized DNNs are able to reduce both memory…

Machine Learning · Computer Science 2019-11-15 Wenlei Bao , Li-Wen Chang , Yang Chen , Ke Deng , Amit Agarwal , Emad Barsoum , Abe Taha

Sliding window sums are widely used in bioinformatics applications, including sequence assembly, k-mer generation, hashing and compression. New vector algorithms which utilize the advanced vector extension (AVX) instructions available on…

Data Structures and Algorithms · Computer Science 2019-09-04 Roman Snytsar , Yatish Turakhia

Deep Neural Networks (DNNs) have transformed the field of machine learning and are widely deployed in many applications involving image, video, speech and natural language processing. The increasing compute demands of DNNs have been widely…

Machine Learning · Computer Science 2021-08-17 Sourjya Roy , Mustafa Ali , Anand Raghunathan

Deep learning frameworks commonly implement convolution operators with GEMM-based algorithms. In these algorithms, convolution is implemented on top of matrix-matrix multiplication (GEMM) functions, provided by highly optimized BLAS…

Computer Vision and Pattern Recognition · Computer Science 2019-07-05 Marat Dukhan

Generative neural network is a new category of neural networks and it has been widely utilized in applications such as content generation, unsupervised learning, segmentation and pose estimation. It typically involves massive…

Machine Learning · Computer Science 2020-04-30 Dawen Xu , Ying Wang , Kaijie Tu , Cheng Liu , Bingsheng He , Lei Zhang

We present a novel approach for accelerating convolutions during inference for CPU-based architectures. The most common method of computation involves packing the image into the columns of a matrix (im2col) and performing general matrix…

Computer Vision and Pattern Recognition · Computer Science 2024-11-26 Amir Ofir , Gil Ben-Artzi

Convolutional neural networks (CNNs) have found many applications in tasks involving two-dimensional (2D) data, such as image classification and image processing. Therefore, 2D convolution layers have been heavily optimized on CPUs and…

Depth-wise pruning accelerates LLM inference in resource-constrained scenarios but suffers from performance degradation due to direct removal of entire Transformer layers. This paper reveals ``Patch-like'' redundancy across layers via…

Computer Vision and Pattern Recognition · Computer Science 2026-03-20 Xuan Ding , Rui Sun , Yunjian Zhang , Xiu Yan , Yueqi Zhou , Kaihao Huang , Suzhong Fu , Angelica I Aviles-Rivero , Chuanlong Xie , Yao Zhu

Deploying neural networks on constrained hardware platforms such as 32-bit microcontrollers is a challenging task because of the large memory, computing and energy requirements of their inference process. To tackle these issues, several…

Machine Learning · Computer Science 2023-03-21 Baptiste Nguyen , Pierre-Alain Moellic , Sylvain Blayac

An important linear algebra routine, GEneral Matrix Multiplication (GEMM), is a fundamental operator in deep learning. Compilers need to translate these routines into low-level code optimized for specific hardware. Compiler-level…

Machine Learning · Computer Science 2019-09-25 Huaqing Zhang , Xiaolin Cheng , Hui Zang , Dae Hoon Park

Convolution is the most time-consuming operation in deep neural network operations, so its performance is critical to the overall performance of the neural network. The commonly used methods for convolution on GPU include the general matrix…

Neural and Evolutionary Computing · Computer Science 2023-06-27 Shuai Lu , Jun Chu , Luanzheng Guo , Xu T. Liu

With their high energy efficiency, processing-in-memory (PIM) arrays are increasingly used for convolutional neural network (CNN) inference. In PIM-based CNN inference, the computational latency and energy are dependent on how the CNN…

Machine Learning · Computer Science 2021-12-22 Johnny Rhe , Sungmin Moon , Jong Hwan Ko

Deep learning (DL) is one of the most prominent branches of machine learning. Due to the immense computational cost of DL workloads, industry and academia have developed DL libraries with highly-specialized kernels for each…

The generic matrix multiply (GEMM) function is the core element of high-performance linear algebra libraries used in many computationally-demanding digital signal processing (DSP) systems. We propose an acceleration technique for GEMM based…

Mathematical Software · Computer Science 2015-05-30 Davide Anastasia , Yiannis Andreopoulos

The GEneral Matrix Multiplication (GEMM) is one of the essential algorithms in scientific computing. Single-thread GEMM implementations are well-optimised with techniques like blocking and autotuning. However, due to the complexity of…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-01-15 Yufan Xia , Marco De La Pierre , Amanda S. Barnard , Giuseppe Maria Junior Barca

We show how to utilize machine learning approaches to improve sliding window algorithms for approximate frequency estimation problems, under the ``algorithms with predictions'' framework. In this dynamic environment, previous…

Data Structures and Algorithms · Computer Science 2024-09-19 Rana Shahout , Ibrahim Sabek , Michael Mitzenmacher
‹ Prev 1 2 3 10 Next ›