English
Related papers

Related papers: TENET: A Framework for Modeling Tensor Dataflow Ba…

200 papers

Ternary quantization has emerged as a powerful technique for reducing both computational and memory footprint of large language models (LLM), enabling efficient real-time inference deployment without significantly compromising model…

Hardware Architecture · Computer Science 2025-09-18 Zhirui Huang , Rui Ma , Shijie Cao , Ran Shu , Ian Wang , Ting Cao , Chixiao Chen , Yongqiang Xiong

Efficient execution of deep learning workloads on dataflow architectures is crucial for overcoming memory bottlenecks and maximizing performance. While streaming intermediate results between computation kernels can significantly improve…

Hardware Architecture · Computer Science 2025-09-24 Hanchen Ye , Deming Chen

Modern machine learning accelerators are designed to efficiently execute deep neural networks (DNNs) by optimizing data movement, memory hierarchy, and compute throughput. However, emerging DNN models such as large language models, state…

Hardware Architecture · Computer Science 2025-09-03 Shubham Negi , Manik Singhal , Aayush Ankit , Sudeep Bhoja , Kaushik Roy

Tensor computations, with matrix multiplication being the primary operation, serve as the fundamental basis for data analysis, physics, machine learning, and deep learning. As the scale and complexity of data continue to grow rapidly, the…

Hardware Architecture · Computer Science 2024-10-24 Qizhe Wu , Yuchen Gui , Zhichen Zeng , Xiaotian Wang , Huawen Liang , Xi Jin

While hardware-software co-design has significantly improved the efficiency of neural network inference, modeling the training phase remains a critical yet underexplored challenge. Training workloads impose distinct constraints,…

Machine Learning · Computer Science 2026-03-17 Jérémy Morlier , Robin Geens , Stef Cuyckens , Arne Symons , Marian Verhelst , Vincent Gripon , Mathieu Léonardon

The development of efficient machine learning models for molecular systems representation is becoming crucial in scientific research. We introduce TensorNet, an innovative O(3)-equivariant message-passing neural network architecture that…

Machine Learning · Computer Science 2023-10-31 Guillem Simeon , Gianni de Fabritiis

TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. TensorFlow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of…

We present "GEMM-like Tensor-Tensor multiplication" (GETT), a novel approach to tensor contractions that mirrors the design of a high-performance general matrix-matrix multiplication (GEMM). The critical insight behind GETT is the…

Mathematical Software · Computer Science 2017-11-08 Paul Springer , Paolo Bientinesi

Transformers have attained superior performance in natural language processing and computer vision. Their self-attention and feedforward layers are overparameterized, limiting inference speed and energy efficiency. Tensor decomposition is a…

Machine Learning · Computer Science 2022-12-01 Jiaqi Gu , Ben Keller , Jean Kossaifi , Anima Anandkumar , Brucek Khailany , David Z. Pan

Accurate prediction of resource consumption and runtime for cloud workflow jobs is critical for scheduling efficiency, yet remains challenging due to the semi-structured nature of job configurations -- comprising shell commands,…

Machine Learning · Computer Science 2026-05-18 Yuxuan Yin , Shengke Zhou , Yunjie Zhang , Ajay Mohindra , Boxun Xu , Peng Li

This technical report presents an effective method for motion prediction in autonomous driving. We develop a Transformer-based method for input encoding and trajectory prediction. Besides, we propose the Temporal Flow Header to enhance the…

Computer Vision and Pattern Recognition · Computer Science 2022-07-04 Yuting Wang , Hangning Zhou , Zhigang Zhang , Chen Feng , Huadong Lin , Chaofei Gao , Yizhi Tang , Zhenting Zhao , Shiyu Zhang , Jie Guo , Xuefeng Wang , Ziyao Xu , Chi Zhang

In this research, we propose a new low-precision framework, TENT, to leverage the benefits of a tapered fixed-point numerical format in TinyML models. We introduce a tapered fixed-point quantization algorithm that matches the numerical…

Machine Learning · Computer Science 2021-04-07 Hamed F. Langroudi , Vedant Karia , Tej Pandit , Dhireesha Kudithipudi

Training neural network often uses a machine learning framework such as TensorFlow and Caffe2. These frameworks employ a dataflow model where the NN training is modeled as a directed graph composed of a set of nodes. Operations in neural…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-02-20 Jiawen Liu , Dong Li , Gokcen Kestor , Jeffrey Vetter

Reliable weather forecasting is of great importance in science, business, and society. The best performing data-driven models for weather prediction tasks rely on recurrent or convolutional neural networks, where some of which incorporate…

Machine Learning · Computer Science 2022-02-23 Onur Bilgin , Paweł Mąka , Thomas Vergutz , Siamak Mehrkanoon

As deep learning models nowadays are widely adopted by both cloud services and edge devices, reducing the latency of deep learning model inferences becomes crucial to provide efficient model serving. However, it is challenging to develop…

Machine Learning · Computer Science 2023-02-16 Yaoyao Ding , Cody Hao Yu , Bojian Zheng , Yizhi Liu , Yida Wang , Gennady Pekhimenko

Computationally intensive deep neural networks (DNNs) are well-suited to run on GPUs, but newly developed algorithms usually require the heavily optimized DNN routines to work efficiently, and this problem could be even more difficult for…

Distributed, Parallel, and Cluster Computing · Computer Science 2019-11-12 Yu-Sheng Lin , Wei-Chao Chen , Shao-Yi Chien

In order to enhance the real-time performance of convolutional neural networks(CNNs), more and more researchers are focusing on improving the efficiency of CNN. Based on the analysis of some CNN architectures, such as ResNet, DenseNet,…

Computer Vision and Pattern Recognition · Computer Science 2018-03-16 Qiuyu Zhu , Ruixin Zhang

Tensor algebra finds applications in various domains, and these applications, especially when accelerated on spatial hardware accelerators, can deliver high performance and low power. Spatial hardware accelerator exhibits complex design…

Hardware Architecture · Computer Science 2021-04-27 Liancheng Jia , Zizhang Luo , Liqiang Lu , Yun Liang

Modern sensing and metrology systems now stream terabytes of heterogeneous, high-dimensional (HD) data profiles, images, and dense point clouds, whose natural representation is multi-way tensors. Understanding such data requires regression…

Machine Learning · Computer Science 2025-10-08 Qian Wang , Mohammad N. Bisheh , Kamran Paynabar

Engagement analysis finds various applications in healthcare, education, advertisement, services. Deep Neural Networks, used for analysis, possess complex architecture and need large amounts of input data, computational power, inference…

Computer Vision and Pattern Recognition · Computer Science 2024-05-15 Alexander Vedernikov , Puneet Kumar , Haoyu Chen , Tapio Seppanen , Xiaobai Li
‹ Prev 1 2 3 10 Next ›