English
Related papers

Related papers: TreeBERT: A Tree-Based Pre-Trained Model for Progr…

200 papers

Pre-trained models for programming language have achieved dramatic empirical improvements on a variety of code-related tasks such as code search, code completion, code summarization, etc. However, existing pre-trained models regard a code…

Pre-trained language models like BERT achieve superior performances in various NLP tasks without explicit consideration of syntactic information. Meanwhile, syntactic information has been proved to be crucial for the success of NLP…

Computation and Language · Computer Science 2021-03-09 Jiangang Bai , Yujing Wang , Yiren Chen , Yaming Yang , Jing Bai , Jing Yu , Yunhai Tong

Transformer-based models have demonstrated significant success in various source code representation tasks. Nonetheless, traditional positional embeddings employed by these models inadequately capture the hierarchical structure intrinsic to…

Machine Learning · Computer Science 2025-07-08 Patryk Bartkowiak , Filip Graliński

Recently, many pre-trained language models for source code have been proposed to model the context of code and serve as a basis for downstream code intelligence tasks such as code completion, code search, and code summarization. These…

Software Engineering · Computer Science 2022-02-15 Yao Wan , Wei Zhao , Hongyu Zhang , Yulei Sui , Guandong Xu , Hai Jin

Using the pre-trained language models to understand source codes has attracted increasing attention from financial institutions owing to the great potential to uncover financial risks. However, there are several challenges in applying these…

Artificial Intelligence · Computer Science 2022-10-12 Rong Liang , Tiehua Zhang , Yujie Lu , Yuze Liu , Zhen Huang , Xin Chen

Pre-trained Programming Language Models (PPLMs) achieved many recent states of the art results for many code-related software engineering tasks. Though some studies use data flow or propose tree-based models that utilize Abstract Syntax…

Software Engineering · Computer Science 2023-03-14 Iman Saberi , Fatemeh H. Fard

Large-scale pre-trained models like BERT, have obtained a great success in various Natural Language Processing (NLP) tasks, while it is still a challenge to adapt them to the math-related tasks. Current pre-trained models neglect the…

Computation and Language · Computer Science 2021-05-04 Shuai Peng , Ke Yuan , Liangcai Gao , Zhi Tang

We present CodeBERT, a bimodal pre-trained model for programming language (PL) and nat-ural language (NL). CodeBERT learns general-purpose representations that support downstream NL-PL applications such as natural language codesearch, code…

Computation and Language · Computer Science 2020-09-21 Zhangyin Feng , Daya Guo , Duyu Tang , Nan Duan , Xiaocheng Feng , Ming Gong , Linjun Shou , Bing Qin , Ting Liu , Daxin Jiang , Ming Zhou

Code representation learning, which aims to encode the semantics of source code into distributed vectors, plays an important role in recent deep-learning-based models for code intelligence. Recently, many pre-trained language models for…

Computation and Language · Computer Science 2021-09-10 Xin Wang , Yasheng Wang , Fei Mi , Pingyi Zhou , Yao Wan , Xiao Liu , Li Li , Hao Wu , Jin Liu , Xin Jiang

Current language models tailored for code tasks often adopt the pre-training-then-fine-tuning paradigm from natural language processing, modeling source code as plain text. This approach, however, overlooks the unambiguous structures…

Computation and Language · Computer Science 2024-01-22 Mayank Agarwal , Yikang Shen , Bailin Wang , Yoon Kim , Jie Chen

Learning vector representations for programs is a critical step in applying deep learning techniques for program understanding tasks. Various neural network models are proposed to learn from tree-structured program representations, e.g.,…

Software Engineering · Computer Science 2023-01-10 Wenhan Wang , Kechi Zhang , Ge Li , Shangqing Liu , Anran Li , Zhi Jin , Yang Liu

Pre-trained models are widely used in the tasks of natural language processing nowadays. However, in the specific field of text simplification, the research on improving pre-trained models is still blank. In this work, we propose a…

Computation and Language · Computer Science 2022-04-19 Renliang Sun , Xiaojun Wan

Recently, the pre-trained language model, BERT (and its robustly optimized version RoBERTa), has attracted a lot of attention in natural language understanding (NLU), and achieved state-of-the-art accuracy in various NLU tasks, such as…

Computation and Language · Computer Science 2019-09-30 Wei Wang , Bin Bi , Ming Yan , Chen Wu , Zuyi Bao , Jiangnan Xia , Liwei Peng , Luo Si

Programming language understanding and representation (a.k.a code representation learning) has always been a hot and challenging task in software engineering. It aims to apply deep learning techniques to produce numerical representations of…

Software Engineering · Computer Science 2023-12-04 Weisong Sun , Chunrong Fang , Yun Miao , Yudu You , Mengzhe Yuan , Yuchen Chen , Quanjun Zhang , An Guo , Xiang Chen , Yang Liu , Zhenyu Chen

Neural networks with tree-based sentence encoders have shown better results on many downstream tasks. Most of existing tree-based encoders adopt syntactic parsing trees as the explicit structure prior. To study the effectiveness of…

Computation and Language · Computer Science 2018-08-30 Haoyue Shi , Hao Zhou , Jiaze Chen , Lei Li

A code generation system generates programming language code based on an input natural language description. State-of-the-art approaches rely on neural networks for code generation. However, these code generators suffer from two problems.…

Machine Learning · Computer Science 2019-12-02 Zeyu Sun , Qihao Zhu , Yingfei Xiong , Yican Sun , Lili Mou , Lu Zhang

Many common sequential data sources, such as source code and natural language, have a natural tree-structured representation. These trees can be generated by fitting a sequence to a grammar, yielding a hierarchical ordering of the tokens in…

Machine Learning · Computer Science 2019-08-02 Jacob Harer , Chris Reale , Peter Chin

Learning representation for source code is a foundation of many program analysis tasks. In recent years, neural networks have already shown success in this area, but most existing models did not make full use of the unique structural…

Software Engineering · Computer Science 2021-04-02 Wenhan Wang , Ge Li , Sijie Shen , Xin Xia , Zhi Jin

Recent advances in Neural Machine Translation (NMT) show that adding syntactic information to NMT systems can improve the quality of their translations. Most existing work utilizes some specific types of linguistically-inspired tree…

Computation and Language · Computer Science 2018-08-29 Xinyi Wang , Hieu Pham , Pengcheng Yin , Graham Neubig

Language models generate reasoning sequentially, preventing them from decoupling irrelevant exploration paths during search. We introduce Tree-Structured Language Modeling (TSLM), which uses special tokens to encode branching structure,…

Computation and Language · Computer Science 2026-02-02 Doyoung Kim , Jaehyeok Doo , Minjoon Seo
‹ Prev 1 2 3 10 Next ›