Related papers: A Transformer-based Approach for Source Code Summa…

Transformers in Time-series Analysis: A Tutorial

Transformer architecture has widespread applications, particularly in Natural Language Processing and computer vision. Recently Transformers have been employed in various aspects of time-series analysis. This tutorial provides an overview…

Machine Learning · Computer Science 2023-07-27 Sabeen Ahmed , Ian E. Nielsen , Aakash Tripathi , Shamoon Siddiqui , Ghulam Rasool , Ravi P. Ramachandran

Text Summarization with Pretrained Encoders

Bidirectional Encoder Representations from Transformers (BERT) represents the latest incarnation of pretrained language models which have recently advanced a wide range of natural language processing tasks. In this paper, we showcase how…

Computation and Language · Computer Science 2019-09-06 Yang Liu , Mirella Lapata

LongCoder: A Long-Range Pre-trained Language Model for Code Completion

In this paper, we introduce a new task for code completion that focuses on handling long code input and propose a sparse Transformer model, called LongCoder, to address this task. LongCoder employs a sliding window mechanism for…

Software Engineering · Computer Science 2023-06-27 Daya Guo , Canwen Xu , Nan Duan , Jian Yin , Julian McAuley

Enriching Source Code with Contextual Data for Code Completion Models: An Empirical Study

Transformer-based pre-trained models have recently achieved great results in solving many software engineering tasks including automatic code completion which is a staple in a developer's toolkit. While many have striven to improve the…

Computation and Language · Computer Science 2023-04-25 Tim van Dam , Maliheh Izadi , Arie van Deursen

An Extractive-and-Abstractive Framework for Source Code Summarization

(Source) Code summarization aims to automatically generate summaries/comments for a given code snippet in the form of natural language. Such summaries play a key role in helping developers understand and maintain source code. Existing code…

Software Engineering · Computer Science 2023-11-07 Weisong Sun , Chunrong Fang , Yuchen Chen , Quanjun Zhang , Guanhong Tao , Tingxu Han , Yifei Ge , Yudu You , Bin Luo

Z-Code++: A Pre-trained Language Model Optimized for Abstractive Summarization

This paper presents Z-Code++, a new pre-trained language model optimized for abstractive text summarization. The model extends the state of the art encoder-decoder model using three techniques. First, we use a two-phase pre-training process…

Computation and Language · Computer Science 2023-06-08 Pengcheng He , Baolin Peng , Liyang Lu , Song Wang , Jie Mei , Yang Liu , Ruochen Xu , Hany Hassan Awadalla , Yu Shi , Chenguang Zhu , Wayne Xiong , Michael Zeng , Jianfeng Gao , Xuedong Huang

Sketchformer: Transformer-based Representation for Sketched Structure

Sketchformer is a novel transformer-based representation for encoding free-hand sketches input in a vector form, i.e. as a sequence of strokes. Sketchformer effectively addresses multiple tasks: sketch classification, sketch based image…

Computer Vision and Pattern Recognition · Computer Science 2020-02-25 Leo Sampaio Ferraz Ribeiro , Tu Bui , John Collomosse , Moacir Ponti

Enhanced Graph Transformer with Serialized Graph Tokens

Transformers have demonstrated success in graph learning, particularly for node-level tasks. However, existing methods encounter an information bottleneck when generating graph-level representations. The prevalent single token paradigm…

Machine Learning · Computer Science 2026-02-11 Ruixiang Wang , Yuyang Hong , Shiming Xiang , Chunhong Pan

A Formal Framework for Understanding Length Generalization in Transformers

A major challenge for transformers is generalizing to sequences longer than those observed during training. While previous works have empirically shown that transformers can either succeed or fail at length generalization depending on the…

Machine Learning · Computer Science 2025-05-01 Xinting Huang , Andy Yang , Satwik Bhattamishra , Yash Sarrof , Andreas Krebs , Hattie Zhou , Preetum Nakkiran , Michael Hahn

Investigating Text Shortening Strategy in BERT: Truncation vs Summarization

The parallelism of Transformer-based models comes at the cost of their input max-length. Some studies proposed methods to overcome this limitation, but none of them reported the effectiveness of summarization as an alternative. In this…

Computation and Language · Computer Science 2024-03-20 Mirza Alim Mutasodirin , Radityo Eko Prasojo

Exploring Length Generalization For Transformer-based Speech Enhancement

Transformer network architecture has proven effective in speech enhancement. However, as its core module, self-attention suffers from quadratic complexity, making it infeasible for training on long speech utterances. In practical scenarios,…

Audio and Speech Processing · Electrical Eng. & Systems 2025-06-10 Qiquan Zhang , Hongxu Zhu , Xinyuan Qian , Eliathamby Ambikairajah , Haizhou Li

Applying Transformer-based Text Summarization for Keyphrase Generation

Keyphrases are crucial for searching and systematizing scholarly documents. Most current methods for keyphrase extraction are aimed at the extraction of the most significant words in the text. But in practice, the list of keyphrases often…

Computation and Language · Computer Science 2024-10-23 Anna Glazkova , Dmitry Morozov

Investigating Efficiently Extending Transformers for Long Input Summarization

While large pretrained Transformer models have proven highly capable at tackling natural language tasks, handling long sequence inputs continues to be a significant challenge. One such task is long input summarization, where inputs are…

Computation and Language · Computer Science 2022-08-10 Jason Phang , Yao Zhao , Peter J. Liu

Compositional Generalization and Decomposition in Neural Program Synthesis

When writing programs, people have the ability to tackle a new complex task by decomposing it into smaller and more familiar subtasks. While it is difficult to measure whether neural program synthesis methods have similar capabilities, what…

Machine Learning · Computer Science 2023-10-31 Kensen Shi , Joey Hong , Manzil Zaheer , Pengcheng Yin , Charles Sutton

Transformers are efficient hierarchical chemical graph learners

Transformers, adapted from natural language processing, are emerging as a leading approach for graph representation learning. Contemporary graph transformers often treat nodes or edges as separate tokens. This approach leads to…

Machine Learning · Computer Science 2023-10-04 Zihan Pengmei , Zimu Li , Chih-chan Tien , Risi Kondor , Aaron R. Dinner

ScaleFormer: Span Representation Cumulation for Long-Context Transformer

The quadratic complexity of standard self-attention severely limits the application of Transformer-based models to long-context tasks. While efficient Transformer variants exist, they often require architectural changes and costly…

Computation and Language · Computer Science 2025-11-14 Jiangshu Du , Wenpeng Yin , Philip Yu

Revisiting Transformers with Insights from Image Filtering and Boosting

The self-attention mechanism, a cornerstone of Transformer-based state-of-the-art deep learning architectures, is largely heuristic-driven and fundamentally challenging to interpret. Establishing a robust theoretical foundation to explain…

Computer Vision and Pattern Recognition · Computer Science 2026-02-10 Laziz U. Abdullaev , Maksim Tkachenko , Tan M. Nguyen

Transformers and Slot Encoding for Sample Efficient Physical World Modelling

World modelling, i.e. building a representation of the rules that govern the world so as to predict its evolution, is an essential ability for any agent interacting with the physical world. Recent applications of the Transformer…

Machine Learning · Computer Science 2024-05-31 Francesco Petri , Luigi Asprino , Aldo Gangemi

Contextually Structured Token Dependency Encoding for Large Language Models

Token representation strategies within large-scale neural architectures often rely on contextually refined embeddings, yet conventional approaches seldom encode structured relationships explicitly within token interactions. Self-attention…

Computation and Language · Computer Science 2025-03-27 James Blades , Frederick Somerfield , William Langley , Susan Everingham , Maurice Witherington

A Meta-Learning Perspective on Transformers for Causal Language Modeling

The Transformer architecture has become prominent in developing large causal language models. However, mechanisms to explain its capabilities are not well understood. Focused on the training process, here we establish a meta-learning view…

Machine Learning · Computer Science 2024-03-26 Xinbo Wu , Lav R. Varshney