English
Related papers

Related papers: Binary Hypothesis Testing for Softmax Models and L…

200 papers

Large language models (LLMs) have made transformed changes for human society. One of the key computation in LLMs is the softmax unit. This operation is important in LLMs because it allows the model to generate a distribution over possible…

Machine Learning · Computer Science 2023-04-27 Yichuan Deng , Zhihang Li , Zhao Song

Binary classification involves predicting the label of an instance based on whether the model score for the positive class exceeds a threshold chosen based on the application requirements (e.g., maximizing recall for a precision bound).…

Machine Learning · Computer Science 2023-11-21 Gundeep Arora , Srujana Merugu , Anoop Saladi , Rajeev Rastogi

Large language models rely on attention mechanisms with a softmax activation. Yet the dominance of softmax over alternatives (e.g., component-wise or linear) remains poorly understood, and many theoretical works have focused on the…

Machine Learning · Computer Science 2026-02-27 O. Duranthon , P. Marion , C. Boyer , B. Loureiro , L. Zdeborová

Sampling-based methods, e.g., Deep Ensembles and Bayesian Neural Nets have become promising approaches to improve the quality of uncertainty estimation and robust generalization. However, they suffer from a large model size and high latency…

Machine Learning · Computer Science 2024-05-29 Ha Manh Bui , Anqi Liu

The softmax function is widely used in artificial neural networks for the multiclass classification problems, where the softmax transformation enforces the output to be positive and sum to one, and the corresponding loss function allows to…

Machine Learning · Computer Science 2021-12-24 Shaoshi Sun , Zhenyuan Zhang , BoCheng Huang , Pengbin Lei , Jianlin Su , Shengfeng Pan , Jiarun Cao

Softmax is the most commonly used output function for multiclass problems and is widely used in areas such as vision, natural language processing, and recommendation. A softmax model has linear costs in the number of classes which makes it…

Machine Learning · Computer Science 2018-08-03 Guy Blanc , Steffen Rendle

The problem of robust binary hypothesis testing is studied. Under both hypotheses, the data-generating distributions are assumed to belong to uncertainty sets constructed through moments; in particular, the sets contain distributions whose…

Statistics Theory · Mathematics 2024-01-09 Akshayaa Magesh , Zhongchang Sun , Venugopal V. Veeravalli , Shaofeng Zou

The softmax representation of probabilities for categorical variables plays a prominent role in modern machine learning with numerous applications in areas such as large scale classification, neural language modeling and recommendation…

Machine Learning · Statistics 2016-11-01 Michalis K. Titsias

The soft-margin support vector machine (SVM) is a ubiquitous tool for prediction of binary-response data. However, the SVM is characterized entirely via a numerical optimization problem, rather than a probability model, and thus does not…

Methodology · Statistics 2020-07-24 Hien D Nguyen , Daniel V Fryer

There has long been debates on how we could interpret neural networks and understand the decisions our models make. Specifically, why deep neural networks tend to be error-prone when dealing with samples that output low softmax scores. We…

Computer Vision and Pattern Recognition · Computer Science 2018-12-04 Simiao Zuo , Jialin Wu

Out-of-vocabulary words account for a large proportion of errors in machine translation systems, especially when the system is used on a different domain than the one where it was trained. In order to alleviate the problem, we propose to…

Computation and Language · Computer Science 2016-08-08 Pranava Swaroop Madhyastha , Cristina España-Bonet

Recently, the robustness of deep learning models has received widespread attention, and various methods for improving model robustness have been proposed, including adversarial training, model architecture modification, design of loss…

Machine Learning · Computer Science 2023-03-23 Hao Wang , Chen Li , Jinzhe Jiang , Xin Zhang , Yaqian Zhao , Weifeng Gong

Large language models (LLMs) have brought significant and transformative changes in human society. These models have demonstrated remarkable capabilities in natural language understanding and generation, leading to various advancements and…

Machine Learning · Computer Science 2023-07-06 Yeqi Gao , Zhao Song , Shenghao Xie

Binary classification is a common statistical learning problem in which a model is estimated on a set of covariates for some outcome indicating the membership of one of two classes. In the literature, there exists a distinction between hard…

Machine Learning · Statistics 2014-11-20 Patrick K. Kimes , D. Neil Hayes , J. S. Marron , Yufeng Liu

The Softmax function on top of a final linear layer is the de facto method to output probability distributions in neural networks. In many applications such as language models or text generation, this model has to produce distributions over…

Machine Learning · Computer Science 2019-05-15 Octavian-Eugen Ganea , Sylvain Gelly , Gary Bécigneul , Aliaksei Severyn

Score-based statistical models play an important role in modern machine learning, statistics, and signal processing. For hypothesis testing, a score-based hypothesis test is proposed in \cite{wu2022score}. We analyze the performance of this…

Signal Processing · Electrical Eng. & Systems 2024-02-06 Enmao Diao , Taposh Banerjee , Vahid Tarokh

While the Large Language Models (LLMs) dominate a majority of language understanding tasks, previous work shows that some of these results are supported by modelling spurious correlations of training datasets. Authors commonly assess model…

Computation and Language · Computer Science 2024-02-07 Lukáš Mikula , Michal Štefánik , Marek Petrovič , Petr Sojka

Widely adopted in modern Vision Transformer designs, Softmax attention can effectively capture long-range visual information; however, it incurs excessive computational cost when dealing with high-resolution inputs. In contrast, linear…

Computer Vision and Pattern Recognition · Computer Science 2024-12-10 Dongchen Han , Yifan Pu , Zhuofan Xia , Yizeng Han , Xuran Pan , Xiu Li , Jiwen Lu , Shiji Song , Gao Huang

We consider Bayesian multiple statistical classification problem in the case where the unknown source distributions are estimated from the labeled training sequences, then the estimates are used as nominal distributions in a robust…

Information Theory · Computer Science 2021-10-11 Hüseyin Afşer

The effectiveness of large language models (LLMs) is often hindered by duplicated data in their extensive pre-training datasets. Current approaches primarily focus on detecting and removing duplicates, which risks the loss of valuable…

Computation and Language · Computer Science 2024-07-10 Nan He , Weichen Xiong , Hanwen Liu , Yi Liao , Lei Ding , Kai Zhang , Guohua Tang , Xiao Han , Wei Yang
‹ Prev 1 2 3 10 Next ›