English
Related papers

Related papers: Holonorm

200 papers

Normalization layers are ubiquitous in modern neural networks and have long been considered essential. This work demonstrates that Transformers without normalization can achieve the same or better performance using a remarkably simple…

Machine Learning · Computer Science 2025-06-17 Jiachen Zhu , Xinlei Chen , Kaiming He , Yann LeCun , Zhuang Liu

Pre-Layer Normalization (Pre-LN) is the de facto choice for large language models (LLMs) and is crucial for stable pretraining and effective transfer learning. However, Pre-LN is inefficient due to repeated statistical calculations and…

Computation and Language · Computer Science 2026-02-04 Hoyoon Byun , Youngjun Choi , Taero Kim , Sungrae Park , Kyungwoo Song

Layer normalization (LN) is an essential component of modern neural networks. While many alternative techniques have been proposed, none of them have succeeded in replacing LN so far. The latest suggestion in this line of research is a…

Machine Learning · Computer Science 2026-04-15 Felix Stollenwerk

This work analyzes the training dynamics of Image Restoration (IR) Transformers and uncovers a critical yet overlooked issue: conventional LayerNorm (LN) drives feature magnitudes to diverge to a million scale and collapses channel-wise…

Computer Vision and Pattern Recognition · Computer Science 2026-02-23 MinKyu Lee , Sangeek Hyun , Woojin Jun , Hyunjun Kim , Jiwoo Chung , Jae-Pil Heo

Large language models (LLMs) have revolutionized natural language processing (NLP) tasks by achieving state-of-the-art performance across a range of benchmarks. Central to the success of these models is the integration of sophisticated…

Hardware Architecture · Computer Science 2025-02-18 Tianfan Peng , Jiajun Qin , Tianhua Xia , Sai Qian Zhang

Remote sensing image restoration aims to reconstruct missing or corrupted areas within images. To date, low-rank based models have garnered significant interest in this field. This paper proposes a novel low-rank regularization term, named…

Image and Video Processing · Electrical Eng. & Systems 2024-12-17 Shuang Xu , Chang Yu , Jiangjun Peng , Xiangyong Cao , Deyu Meng

In this paper, we propose a simple yet effective method to stabilize extremely deep Transformers. Specifically, we introduce a new normalization function (DeepNorm) to modify the residual connection in Transformer, accompanying with…

Computation and Language · Computer Science 2022-03-02 Hongyu Wang , Shuming Ma , Li Dong , Shaohan Huang , Dongdong Zhang , Furu Wei

Integral transforms are invaluable mathematical tools to map functions into spaces where they are easier to characterize. We introduce the hyperdimensional transform as a new kind of integral transform. It converts square-integrable…

Machine Learning · Computer Science 2023-10-26 Pieter Dewulf , Michiel Stock , Bernard De Baets

Although normalization layers have long been viewed as indispensable components of deep learning architectures, the recent introduction of Dynamic Tanh (DyT) has demonstrated that alternatives are possible. The point-wise function DyT…

Machine Learning · Computer Science 2026-04-01 Mingzhi Chen , Taiming Lu , Jiachen Zhu , Mingjie Sun , Zhuang Liu

Generalisation of a deep neural network (DNN) is one major concern when employing the deep learning approach for solving practical problems. In this paper we propose a new technique, named approximated orthonormal normalisation (AON), to…

Machine Learning · Computer Science 2020-01-15 Guoqiang Zhang , Kenta Niwa , W. B. Kleijn

Transformers have become the de facto architecture for a wide range of machine learning tasks, particularly in large language models (LLMs). Despite their remarkable performance, many challenges remain in training deep transformer networks,…

Computation and Language · Computer Science 2025-12-09 Zhijian Zhuo , Yutao Zeng , Ya Wang , Sijun Zhang , Jian Yang , Xiaoqing Li , Xun Zhou , Jinwen Ma

Dynamic Tanh (DyT) removes LayerNorm by bounding activations with a learned tanh(alpha x). We show that this bounding is a regime-dependent implicit regularizer, not a uniformly beneficial replacement. Across GPT-2-family models spanning…

Machine Learning · Computer Science 2026-04-28 Lucky Verma

Low-rankness is important in the hyperspectral image (HSI) denoising tasks. The tensor nuclear norm (TNN), defined based on the tensor singular value decomposition, is a state-of-the-art method to describe the low-rankness of HSI. However,…

Image and Video Processing · Electrical Eng. & Systems 2022-06-22 Xiaozhen Xie , Sheng Liu

Leveled Homomorphic Encryption (LHE) offers a potential solution that could allow sectors with sensitive data to utilize the cloud and securely deploy their models for remote inference with Deep Neural Networks (DNN). However, this…

Machine Learning · Computer Science 2019-02-07 Moustafa AboulAtta , Matthias Ossadnik , Seyed-Ahmad Ahmadi

Deep learning at its core, contains functions that are composition of a linear transformation with a non-linear function known as activation function. In past few years, there is an increasing interest in construction of novel activation…

Neural and Evolutionary Computing · Computer Science 2020-09-09 Koushik Biswas , Sandeep Kumar , Shilpak Banerjee , Ashish Kumar Pandey

Vision Transformer (ViT) and its variants (e.g., Swin, PVT) have achieved great success in various computer vision tasks, owing to their capability to learn long-range contextual information. Layer Normalization (LN) is an essential…

Computer Vision and Pattern Recognition · Computer Science 2022-10-17 Wenqi Shao , Yixiao Ge , Zhaoyang Zhang , Xuyuan Xu , Xiaogang Wang , Ying Shan , Ping Luo

Transformers have achieved remarkable success in a wide range of natural language processing and computer vision applications. However, the representation capacity of a deep transformer model is degraded due to the over-smoothing issue in…

Computation and Language · Computer Science 2023-12-04 Tam Nguyen , Tan M. Nguyen , Richard G. Baraniuk

Batch Normalization (BN) has been proven to be quite effective at accelerating and improving the training of deep neural networks (DNNs). However, BN brings additional computation, consumes more memory and generally slows down the training…

Machine Learning · Computer Science 2019-05-23 Shuang Wu , Guoqi Li , Lei Deng , Liu Liu , Yuan Xie , Luping Shi

The success of Large Language Models (LLMs) hinges on the stable training of deep Transformer architectures. A critical design choice is the placement of normalization layers, leading to a fundamental trade-off: the ``PreNorm'' architecture…

Computation and Language · Computer Science 2026-02-02 Chao Wang , Bei Li , Jiaqi Zhang , Xinyu Liu , Yuchun Fan , Linkun Lyu , Xin Chen , Jingang Wang , Tong Xiao , Peng Pei , Xunliang Cai

Higher-Order Hypergraph Learning (HOHL) was recently introduced as a principled alternative to classical hypergraph regularization, enforcing higher-order smoothness via powers of multiscale Laplacians induced by the hypergraph structure.…

Machine Learning · Computer Science 2025-11-25 Adrien Weihs , Andrea L. Bertozzi , Matthew Thorpe
‹ Prev 1 2 3 10 Next ›