Related papers: Hysteresis Activation Function for Efficient Infer…
We propose the Hyperbolic Tangent Exponential Linear Unit (TeLU), a neural network hidden activation function defined as TeLU(x)=xtanh(exp(x)). TeLU's design is grounded in the core principles of key activation functions, achieving strong…
The application of the deep learning model in classification plays an important role in the accurate detection of the target objects. However, the accuracy is affected by the activation function in the hidden and output layer. In this…
Large Language Models (LLMs) with billions of parameters have drastically transformed AI applications. However, their demanding computation during inference has raised significant challenges for deployment on resource-constrained devices.…
Rectified linear unit (ReLU) is a widely used activation function for deep convolutional neural networks. However, because of the zero-hard rectification, ReLU networks miss the benefits from negative values. In this paper, we propose a…
Selecting the most suitable activation function is a critical factor in the effectiveness of deep learning models, as it influences their learning capacity, stability, and computational efficiency. In recent years, the Gaussian Error Linear…
In past few years, linear rectified unit activation functions have shown its significance in the neural networks, surpassing the performance of sigmoid activations. RELU (Nair & Hinton, 2010), ELU (Clevert et al., 2015), PRELU (He et al.,…
Activation functions play a key role in providing remarkable performance in deep neural networks, and the rectified linear unit (ReLU) is one of the most widely used activation functions. Various new activation functions and improvements on…
Activation functions influence behavior and performance of DNNs. Nonlinear activation functions, like Rectified Linear Units (ReLU), Exponential Linear Units (ELU) and Scaled Exponential Linear Units (SELU), outperform the linear…
In this paper, we introduce the Hyperbolic Tangent Exponential Linear Unit (TeLU), a novel neural network activation function, represented as $f(x) = x{\cdot}tanh(e^x)$. TeLU is designed to overcome the limitations of conventional…
We propose the Gaussian Error Linear Unit (GELU), a high-performing neural network activation function. The GELU activation function is $x\Phi(x)$, where $\Phi(x)$ the standard Gaussian cumulative distribution function. The GELU…
Sparse computation offers a compelling solution for the inference of Large Language Models (LLMs) in low-resource scenarios by dynamically skipping the computation of inactive neurons. While traditional approaches focus on ReLU-based LLMs,…
ReLU, a commonly used activation function in deep neural networks, is prone to the issue of "Dying ReLU". Several enhanced versions, such as ELU, SeLU, and Swish, have been introduced and are considered to be less commonly utilized.…
Deep networks are gradually penetrating almost every domain in our lives due to their amazing success. However, with substantive performance accuracy improvements comes the price of \emph{irreproducibility}. Two identical models, trained on…
Successive linear transforms followed by nonlinear "activation" functions can approximate nonlinear functions to arbitrary precision given sufficient layers. The number of necessary layers is dependent on, in part, by the nature of the…
Activation functions are fundamental elements of deep learning architectures as they significantly influence training dynamics. ReLU, while widely used, is prone to the dying neuron problem, which has been mitigated by variants such as…
Appropriate weight initialization settings, along with the ReLU activation function, have become cornerstones of modern deep learning, enabling the training and deployment of highly effective and efficient neural network models across…
The nonlinearity of activation functions used in deep learning models are crucial for the success of predictive models. There are several commonly used simple nonlinear functions, including Rectified Linear Unit (ReLU) and Leaky-ReLU…
Rectified Linear Units (ReLU) are the default choice for activation functions in deep neural networks. While they demonstrate excellent empirical performance, ReLU activations can fall victim to the dead neuron problem. In these cases, the…
Amongst others, the adoption of Rectified Linear Units (ReLUs) is regarded as one of the ingredients of the success of deep learning. ReLU activation has been shown to mitigate the vanishing gradient issue, to encourage sparsity in the…
Our work proposes a novel approach to designing activation functions by focusing on their gradients and deriving the corresponding activation functions using integration. We introduce the Expanded Integral of the Exponential Linear Unit…