Related papers: TabNSA: Native Sparse Attention for Efficient Tabu…

Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention

Long-context modeling is crucial for next-generation language models, yet the high computational cost of standard attention mechanisms poses significant computational challenges. Sparse attention offers a promising direction for improving…

Computation and Language · Computer Science 2025-02-28 Jingyang Yuan , Huazuo Gao , Damai Dai , Junyu Luo , Liang Zhao , Zhengyan Zhang , Zhenda Xie , Y. X. Wei , Lean Wang , Zhiping Xiao , Yuqing Wang , Chong Ruan , Ming Zhang , Wenfeng Liang , Wangding Zeng

A Survey on Deep Tabular Learning

Tabular data, widely used in industries like healthcare, finance, and transportation, presents unique challenges for deep learning due to its heterogeneous nature and lack of spatial structure. This survey reviews the evolution of deep…

Machine Learning · Computer Science 2024-10-17 Shriyank Somvanshi , Subasish Das , Syed Aaqib Javed , Gian Antariksa , Ahmed Hossain

A Neural Network Alternative to Tree-based Models

Tabular datasets are widely used in scientific disciplines such as biology. While these disciplines have already adopted AI methods to enhance their findings and analysis, they mainly use tree-based methods due to their interpretability. At…

Machine Learning · Computer Science 2025-04-16 Salvatore Raieli , Nathalie Jeanray , Stéphane Gerart , Sebastien Vachenc , Abdulrahman Altahhan

Optimizing Native Sparse Attention with Latent Attention and Local Global Alternating Strategies

In this work, we conduct a systematic analysis of Native Sparse Attention (NSA) and propose targeted improvements that enhance long-context modeling. A key insight is that alternating between local (sliding-window) and global (compression,…

Computation and Language · Computer Science 2025-11-04 Yuxuan Hu , Jianchao Tan , Jiaqi Zhang , Wen Zan , Pingwei Sun , Yifan Lu , Yerui Sun , Yuchen Xie , Xunliang Cai , Jing Zhang

FSA: An Alternative Efficient Implementation of Native Sparse Attention Kernel

Recent advance in sparse attention mechanisms has demonstrated strong potential for reducing the computational cost of long-context training and inference in large language models (LLMs). Native Sparse Attention (NSA), one state-of-the-art…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-10-14 Ran Yan , Youhe Jiang , Zhuoming Chen , Haohui Mai , Beidi Chen , Binhang Yuan

TabAttention: Learning Attention Conditionally on Tabular Data

Medical data analysis often combines both imaging and tabular data processing using machine learning algorithms. While previous studies have investigated the impact of attention mechanisms on deep learning models, few have explored…

Image and Video Processing · Electrical Eng. & Systems 2023-10-30 Michal K. Grzeszczyk , Szymon Płotka , Beata Rebizant , Katarzyna Kosińska-Kaczyńska , Michał Lipa , Robert Brawura-Biskupski-Samaha , Przemysław Korzeniowski , Tomasz Trzciński , Arkadiusz Sitek

Structural Deep Encoding for Table Question Answering

Although Transformers-based architectures excel at processing textual information, their naive adaptation for tabular data often involves flattening the table structure. This simplification can lead to the loss of essential…

Computation and Language · Computer Science 2025-03-04 Raphaël Mouravieff , Benjamin Piwowarski , Sylvain Lamprier

Locally Sparse Neural Networks for Tabular Biomedical Data

Tabular datasets with low-sample-size or many variables are prevalent in biomedicine. Practitioners in this domain prefer linear or tree-based models over neural networks since the latter are harder to interpret and tend to overfit when…

Machine Learning · Computer Science 2022-02-09 Junchen Yang , Ofir Lindenbaum , Yuval Kluger

TabSeq: A Framework for Deep Learning on Tabular Data via Sequential Ordering

Effective analysis of tabular data still poses a significant problem in deep learning, mainly because features in tabular datasets are often heterogeneous and have different levels of relevance. This work introduces TabSeq, a novel…

Machine Learning · Computer Science 2024-10-22 Al Zadid Sultan Bin Habib , Kesheng Wang , Mary-Anne Hartley , Gianfranco Doretto , Donald A. Adjeroh

TabNet: Attentive Interpretable Tabular Learning

We propose a novel high-performance and interpretable canonical deep tabular data learning architecture, TabNet. TabNet uses sequential attention to choose which features to reason from at each decision step, enabling interpretability and…

Machine Learning · Computer Science 2020-12-10 Sercan O. Arik , Tomas Pfister

Long-Context Modeling with Dynamic Hierarchical Sparse Attention for Memory-Constrained LLM Inference

The quadratic cost of attention limits the scalability of long-context LLMs, especially under limited hardware memory budgets. While attention is often sparse, existing static sparse methods cannot adapt to task- or input-dependent…

Computation and Language · Computer Science 2026-05-29 Siheng Xiong , Joe Zou , Faramarz Fekri , Yae Jee Cho

Tensorized Self-Attention: Efficiently Modeling Pairwise and Global Dependencies Together

Neural networks equipped with self-attention have parallelizable computation, light-weight structure, and the ability to capture both long-range and local dependencies. Further, their expressive power and performance can be boosted by using…

Computation and Language · Computer Science 2019-03-27 Tao Shen , Tianyi Zhou , Guodong Long , Jing Jiang , Chengqi Zhang

SSA: Sparse Sparse Attention by Aligning Full and Sparse Attention Outputs in Feature Space

Sparse attention reduces the quadratic complexity of full self-attention but faces two challenges: (1) an attention gap, where applying sparse attention to full-attention-trained models causes performance degradation due to train-inference…

Computation and Language · Computer Science 2026-02-02 Zhenyi Shen , Junru Lu , Lin Gui , Jiazheng Li , Yulan He , Di Yin , Xing Sun

Transformer Acceleration with Dynamic Sparse Attention

Transformers are the mainstream of NLP applications and are becoming increasingly popular in other domains such as Computer Vision. Despite the improvements in model quality, the enormous computation costs make Transformers difficult at…

Machine Learning · Computer Science 2021-10-22 Liu Liu , Zheng Qu , Zhaodong Chen , Yufei Ding , Yuan Xie

LassoFlexNet: Flexible Neural Architecture for Tabular Data

Despite their dominance in vision and language, deep neural networks often underperform relative to tree-based models on tabular data. To bridge this gap, we incorporate five key inductive biases into deep learning: robustness to irrelevant…

Machine Learning · Statistics 2026-03-24 Kry Yik Chau Lui , Cheng Chi , Kishore Basu , Yanshuai Cao

TabGLM: Tabular Graph Language Model for Learning Transferable Representations Through Multi-Modal Consistency Minimization

Handling heterogeneous data in tabular datasets poses a significant challenge for deep learning models. While attention-based architectures and self-supervised learning have achieved notable success, their application to tabular data…

Machine Learning · Computer Science 2025-02-27 Anay Majee , Maria Xenochristou , Wei-Peng Chen

Dynamic Sparse Attention: Access Patterns and Architecture

Dynamic sparse attention (DSA) reduces the per-token attention bandwidth by restricting computation to a top-k subset of cached key-value (KV) entries, but its token-dependent selection pattern introduces a system-level challenge: the KV…

Hardware Architecture · Computer Science 2026-03-17 Noam Levy

Hardware-aligned Hierarchical Sparse Attention for Efficient Long-term Memory Access

A key advantage of Recurrent Neural Networks (RNNs) over Transformers is their linear computational and space complexity enables faster training and inference for long sequences. However, RNNs are fundamentally unable to randomly access…

Computation and Language · Computer Science 2025-11-04 Xiang Hu , Jiaqi Leng , Jun Zhao , Kewei Tu , Wei Wu

TabMixNN: A Unified Deep Learning Framework for Structural Mixed Effects Modeling on Tabular Data

We present TabMixNN, a flexible PyTorch-based deep learning framework that synthesizes classical mixed-effects modeling with modern neural network architectures for tabular data analysis. TabMixNN addresses the growing need for methods that…

Machine Learning · Computer Science 2026-01-01 Deniz Akdemir

Deep Learning with Tabular Data: A Self-supervised Approach

We have described a novel approach for training tabular data using the TabTransformer model with self-supervised learning. Traditional machine learning models for tabular data, such as GBDT are being widely used though our paper examines…

Machine Learning · Computer Science 2024-01-30 Tirth Kiranbhai Vyas