English
Related papers

Related papers: TabNSA: Native Sparse Attention for Efficient Tabu…

200 papers

Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges. Sparse attention offers a promising direction for improving…

Tabular data, widely used in industries like healthcare, finance, and transportation, presents unique challenges for deep learning due to its heterogeneous nature and lack of spatial structure. This survey reviews the evolution of deep…

Machine Learning · Computer Science 2024-10-17 Shriyank Somvanshi , Subasish Das , Syed Aaqib Javed , Gian Antariksa , Ahmed Hossain

Tabular datasets are widely used in scientific disciplines such as biology. While these disciplines have already adopted AI methods to enhance their findings and analysis, they mainly use tree-based methods due to their interpretability. At…

Machine Learning · Computer Science 2025-04-16 Salvatore Raieli , Nathalie Jeanray , Stéphane Gerart , Sebastien Vachenc , Abdulrahman Altahhan

In this work, we conduct a systematic analysis of Native Sparse Attention (NSA) and propose targeted improvements that enhance long-context modeling. A key insight is that alternating between local (sliding-window) and global (compression,…

Computation and Language · Computer Science 2025-11-04 Yuxuan Hu , Jianchao Tan , Jiaqi Zhang , Wen Zan , Pingwei Sun , Yifan Lu , Yerui Sun , Yuchen Xie , Xunliang Cai , Jing Zhang

Recent advance in sparse attention mechanisms has demonstrated strong potential for reducing the computational cost of long-context training and inference in large language models (LLMs). Native Sparse Attention (NSA), one state-of-the-art…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-14 Ran Yan , Youhe Jiang , Zhuoming Chen , Haohui Mai , Beidi Chen , Binhang Yuan

Medical data analysis often combines both imaging and tabular data processing using machine learning algorithms. While previous studies have investigated the impact of attention mechanisms on deep learning models, few have explored…

Although Transformers-based architectures excel at processing textual information, their naive adaptation for tabular data often involves flattening the table structure. This simplification can lead to the loss of essential…

Computation and Language · Computer Science 2025-03-04 Raphaël Mouravieff , Benjamin Piwowarski , Sylvain Lamprier

Tabular datasets with low-sample-size or many variables are prevalent in biomedicine. Practitioners in this domain prefer linear or tree-based models over neural networks since the latter are harder to interpret and tend to overfit when…

Machine Learning · Computer Science 2022-02-09 Junchen Yang , Ofir Lindenbaum , Yuval Kluger

Effective analysis of tabular data still poses a significant problem in deep learning, mainly because features in tabular datasets are often heterogeneous and have different levels of relevance. This work introduces TabSeq, a novel…

Machine Learning · Computer Science 2024-10-22 Al Zadid Sultan Bin Habib , Kesheng Wang , Mary-Anne Hartley , Gianfranco Doretto , Donald A. Adjeroh

We propose a novel high-performance and interpretable canonical deep tabular data learning architecture, TabNet. TabNet uses sequential attention to choose which features to reason from at each decision step, enabling interpretability and…

Machine Learning · Computer Science 2020-12-10 Sercan O. Arik , Tomas Pfister

The quadratic cost of attention limits the scalability of long-context LLMs, especially under limited hardware memory budgets. While attention is often sparse, existing static sparse methods cannot adapt to task- or input-dependent…

Computation and Language · Computer Science 2026-05-29 Siheng Xiong , Joe Zou , Faramarz Fekri , Yae Jee Cho

Neural networks equipped with self-attention have parallelizable computation, light-weight structure, and the ability to capture both long-range and local dependencies. Further, their expressive power and performance can be boosted by using…

Computation and Language · Computer Science 2019-03-27 Tao Shen , Tianyi Zhou , Guodong Long , Jing Jiang , Chengqi Zhang

Sparse attention reduces the quadratic complexity of full self-attention but faces two challenges: (1) an attention gap, where applying sparse attention to full-attention-trained models causes performance degradation due to train-inference…

Computation and Language · Computer Science 2026-02-02 Zhenyi Shen , Junru Lu , Lin Gui , Jiazheng Li , Yulan He , Di Yin , Xing Sun

Transformers are the mainstream of NLP applications and are becoming increasingly popular in other domains such as Computer Vision. Despite the improvements in model quality, the enormous computation costs make Transformers difficult at…

Machine Learning · Computer Science 2021-10-22 Liu Liu , Zheng Qu , Zhaodong Chen , Yufei Ding , Yuan Xie

Despite their dominance in vision and language, deep neural networks often underperform relative to tree-based models on tabular data. To bridge this gap, we incorporate five key inductive biases into deep learning: robustness to irrelevant…

Machine Learning · Statistics 2026-03-24 Kry Yik Chau Lui , Cheng Chi , Kishore Basu , Yanshuai Cao

Handling heterogeneous data in tabular datasets poses a significant challenge for deep learning models. While attention-based architectures and self-supervised learning have achieved notable success, their application to tabular data…

Machine Learning · Computer Science 2025-02-27 Anay Majee , Maria Xenochristou , Wei-Peng Chen

Dynamic sparse attention (DSA) reduces the per-token attention bandwidth by restricting computation to a top-k subset of cached key-value (KV) entries, but its token-dependent selection pattern introduces a system-level challenge: the KV…

Hardware Architecture · Computer Science 2026-03-17 Noam Levy

A key advantage of Recurrent Neural Networks (RNNs) over Transformers is their linear computational and space complexity enables faster training and inference for long sequences. However, RNNs are fundamentally unable to randomly access…

Computation and Language · Computer Science 2025-11-04 Xiang Hu , Jiaqi Leng , Jun Zhao , Kewei Tu , Wei Wu

We present TabMixNN, a flexible PyTorch-based deep learning framework that synthesizes classical mixed-effects modeling with modern neural network architectures for tabular data analysis. TabMixNN addresses the growing need for methods that…

Machine Learning · Computer Science 2026-01-01 Deniz Akdemir

We have described a novel approach for training tabular data using the TabTransformer model with self-supervised learning. Traditional machine learning models for tabular data, such as GBDT are being widely used though our paper examines…

Machine Learning · Computer Science 2024-01-30 Tirth Kiranbhai Vyas
‹ Prev 1 2 3 10 Next ›