English
Related papers

Related papers: SMaLL: A Software Framework for portable Machine L…

200 papers

Deep Learning Library (DLL) is a new library for machine learning with deep neural networks that focuses on speed. It supports feed-forward neural networks such as fully-connected Artificial Neural Networks (ANNs) and Convolutional Neural…

Machine Learning · Computer Science 2018-04-15 Baptiste Wicht , Jean Hennebert , Andreas Fischer

Large language models (LLMs) have demonstrated exceptional performance across a variety of tasks. However, their substantial scale leads to significant computational resource consumption during inference, resulting in high costs.…

Machine Learning · Computer Science 2025-06-13 Zhaode Wang , Jingbang Yang , Xinyu Qian , Shiwen Xing , Xiaotang Jiang , Chengfei Lv , Shengyu Zhang

Deep Neural Networks (DNNs) have emerged as the core enabler of many major applications on mobile devices. To achieve high accuracy, DNN models have become increasingly deep with hundreds or even thousands of operator layers, leading to…

Machine Learning · Computer Science 2021-12-02 Wei Niu , Jiexiong Guan , Yanzhi Wang , Gagan Agrawal , Bin Ren

Deep learning inference on embedded devices is a burgeoning field with myriad applications because tiny embedded devices are omnipresent. But we must overcome major challenges before we can benefit from this opportunity. Embedded processors…

To break the bottlenecks of mainstream cloud-based machine learning (ML) paradigm, we adopt device-cloud collaborative ML and build the first end-to-end and general-purpose system, called Walle, as the foundation. Walle consists of a…

Deep Neural Network (DNN) based inference at the edge is challenging as these compute and data-intensive algorithms need to be implemented at low cost and low power while meeting the latency constraints of the target applications. Sparsity,…

Neural and Evolutionary Computing · Computer Science 2023-06-13 Adithya Krishna , Srikanth Rohit Nudurupati , Chandana D G , Pritesh Dwivedi , André van Schaik , Mahesh Mehendale , Chetan Singh Thakur

With the rapid emergence of a spectrum of high-end mobile devices, many applications that required desktop-level computation capability formerly can now run on these devices without any problem. However, without a careful optimization,…

Machine Learning · Computer Science 2019-05-03 Wei Niu , Xiaolong Ma , Yanzhi Wang , Bin Ren

Advancing research in the emerging field of deep graph learning requires new tools to support tensor computation over graphs. In this paper, we present the design principles and implementation of Deep Graph Library (DGL). DGL distills the…

The scaling of large language models (LLMs) is currently bottlenecked by the rigidity of distributed programming. While high-performance libraries like CuBLAS and NCCL provide optimized primitives, they lack the flexibility required for…

Deep neural networks (DNNs) form the cornerstone of modern AI services, supporting a wide range of applications, including autonomous driving, chatbots, and recommendation systems. As models increase in size and complexity, DNN workloads…

Machine Learning · Computer Science 2025-11-14 Xiaokai Wang , Shaoyuan Huang , Yuting Li , Xiaofei Wang

Remarkable achievements have been attained by deep neural networks in various applications. However, the increasing depth and width of such models also lead to explosive growth in both storage and computation, which has restricted the…

Machine Learning · Computer Science 2019-06-11 Linfeng Zhang , Zhanhong Tan , Jiebo Song , Jingwei Chen , Chenglong Bao , Kaisheng Ma

Large language models (LLMs) have demonstrated exceptional proficiency in understanding and generating human language, but efficient inference on resource-constrained embedded devices remains challenging due to large model sizes and…

Hardware Architecture · Computer Science 2025-07-15 Weihong Xu , Haein Choi , Po-kai Hsu , Shimeng Yu , Tajana Rosing

Datacenters are increasingly becoming heterogeneous, and are starting to include specialized hardware for networking, video processing, and especially deep learning. To leverage the heterogeneous compute capability of modern datacenters, we…

Machine Learning · Computer Science 2023-08-03 Yassine Ghannane , Mohamed S. Abdelfattah

Currently, Machine Learning (ML) is becoming ubiquitous in everyday life. Deep Learning (DL) is already present in many applications ranging from computer vision for medicine to autonomous driving of modern cars as well as other sectors in…

Hardware Architecture · Computer Science 2020-12-22 Maurizio Capra , Beatrice Bussolino , Alberto Marchisio , Guido Masera , Maurizio Martina , Muhammad Shafique

High-level synthesis (HLS) has been widely adopted as it significantly improves the hardware design productivity and enables efficient design space exploration (DSE). Existing HLS tools are built using compiler infrastructures largely based…

Programming Languages · Computer Science 2021-12-23 Hanchen Ye , Cong Hao , Jianyi Cheng , Hyunmin Jeong , Jack Huang , Stephen Neuendorffer , Deming Chen

Mapping deep neural networks (DNNs) to hardware is critical for optimizing latency, energy consumption, and resource utilization, making it a cornerstone of high-performance accelerator design. Due to the vast and complex mapping space,…

We present a versatile open-source framework designed to facilitate efficient, numerically-tailored Matrix-Matrix Multiplications (MMMs). The framework offers two primary contributions: first, a fine-tuned, automated pipeline for arithmetic…

Mathematical Software · Computer Science 2024-06-06 Louis Ledoux , Marc Casas

Deep Neural Networks (DNNs) have achieved great success in a variety of machine learning (ML) applications, delivering high-quality inferencing solutions in computer vision, natural language processing, and virtual reality, etc. However,…

Machine Learning · Computer Science 2022-08-29 Xiaofan Zhang , Yao Chen , Cong Hao , Sitao Huang , Yuhong Li , Deming Chen

We describe a new software framework for fast training of generalized linear models. The framework, named Snap Machine Learning (Snap ML), combines recent advances in machine learning systems and algorithms in a nested manner to reflect the…

The research interest in specialized hardware accelerators for deep neural networks (DNN) spikes recently owing to their superior performance and efficiency. However, today's DNN accelerators primarily focus on accelerating specific…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-11 Cong Guo , Yangjie Zhou , Jingwen Leng , Yuhao Zhu , Zidong Du , Quan Chen , Chao Li , Bin Yao , Minyi Guo
‹ Prev 1 2 3 10 Next ›