English
Related papers

Related papers: Text Classification for Task-based Source Code Rel…

200 papers

The ability to generate natural language sequences from source code snippets has a variety of applications such as code summarization, documentation, and retrieval. Sequence-to-sequence (seq2seq) models, adopted from neural machine…

Machine Learning · Computer Science 2019-02-22 Uri Alon , Shaked Brody , Omer Levy , Eran Yahav

Sequence-to-Sequence (S2S) models have achieved remarkable success on various text generation tasks. However, learning complex structures with S2S models remains challenging as external neural modules and additional lexicons are often…

Computation and Language · Computer Science 2023-02-07 Han He , Jinho D. Choi

Sequence-to-sequence (seq2seq) learning is a popular fashion for large-scale pretraining language models. However, the prior seq2seq pretraining models generally focus on reconstructive objectives on the decoder side and neglect the effect…

Computation and Language · Computer Science 2024-01-10 Qihuang Zhong , Liang Ding , Juhua Liu , Bo Du , Dacheng Tao

Training datasets for semantic parsing are typically small due to the higher expertise required for annotation than most other NLP tasks. As a result, models for this application usually need additional prior knowledge to be built into the…

Computation and Language · Computer Science 2021-06-11 Sajad Norouzi , Keyi Tang , Yanshuai Cao

A significant amount of the world's knowledge is stored in relational databases. However, the ability for users to retrieve facts from a database is limited due to a lack of understanding of query languages such as SQL. We propose Seq2SQL,…

Computation and Language · Computer Science 2017-11-13 Victor Zhong , Caiming Xiong , Richard Socher

Millions of repetitive code snippets are submitted to code repositories every day. To search from these large codebases using simple natural language queries would allow programmers to ideate, prototype, and develop easier and faster.…

Code search is a widely used technique by developers during software development. It provides semantically similar implementations from a large code corpus to developers based on their queries. Existing techniques leverage deep learning…

Software Engineering · Computer Science 2022-02-17 Weisong Sun , Chunrong Fang , Yuchen Chen , Guanhong Tao , Tingxu Han , Quanjun Zhang

Deep learning methods, which have found successful applications in fields like image classification and natural language processing, have recently been applied to source code analysis too, due to the enormous amount of freely available…

Software Engineering · Computer Science 2021-11-18 Rocìo Cabrera Lozoya , Arnaud Baumann , Antonino Sabetta , Michele Bezzi

In this paper, we introduce the use of Semantic Hashing as embedding for the task of Intent Classification and achieve state-of-the-art performance on three frequently used benchmarks. Intent Classification on a small dataset is a…

Semantic parsing has emerged as a significant and powerful paradigm for natural language interface and question answering systems. Traditional methods of building a semantic parser rely on high-quality lexicons, hand-crafted grammars and…

Computation and Language · Computer Science 2017-05-10 Liang Li , Pengyu Li , Yifan Liu , Tao Wan , Zengchang Qin

Semantic code search is the task of retrieving relevant code snippet given a natural language query. Different from typical information retrieval tasks, code search requires to bridge the semantic gap between the programming language and…

Computation and Language · Computer Science 2022-01-28 Chen Wu , Ming Yan

With the rapid development of Natural Language Processing (NLP) technology, the accuracy and efficiency of machine translation have become hot topics of research. This paper proposes a novel Seq2Seq model aimed at improving translation…

Computation and Language · Computer Science 2024-11-01 Yuxu Wu , Yiren Xing

Semantic sentence embedding models encode natural language sentences into vectors, such that closeness in embedding space indicates closeness in the semantics between the sentences. Bilingual data offers a useful signal for learning such…

Computation and Language · Computer Science 2020-11-20 John Wieting , Graham Neubig , Taylor Berg-Kirkpatrick

In this paper, we propose Cross-Thought, a novel approach to pre-training sequence encoder, which is instrumental in building reusable sequence embeddings for large-scale NLP tasks such as question answering. Instead of using the original…

Computation and Language · Computer Science 2020-10-09 Shuohang Wang , Yuwei Fang , Siqi Sun , Zhe Gan , Yu Cheng , Jing Jiang , Jingjing Liu

Retrieving binary code via natural language queries is a pivotal capability for downstream tasks in the software security domain, such as vulnerability detection and malware analysis. However, it is challenging to identify binary functions…

Software Engineering · Computer Science 2026-01-06 Guoqiang Chen , Lingyun Ying , Ziyang Song , Daguang Liu , Qiang Wang , Zhiqi Wang , Li Hu , Shaoyin Cheng , Weiming Zhang , Nenghai Yu

In this work, we begin to investigate the possibility of training a deep neural network on the task of binary code understanding. Specifically, the network would take, as input, features derived directly from binaries and output English…

Machine Learning · Computer Science 2024-05-01 Alexander Interrante-Grant , Andy Davis , Heather Preslier , Tim Leek

Many natural language generation tasks, such as abstractive summarization and text simplification, are paraphrase-orientated. In these tasks, copying and rewriting are two main writing modes. Most previous sequence-to-sequence (Seq2Seq)…

Computation and Language · Computer Science 2016-11-29 Ziqiang Cao , Chuwei Luo , Wenjie Li , Sujian Li

Much effort has been devoted to evaluate whether multi-task learning can be leveraged to learn rich representations that can be used in various Natural Language Processing (NLP) down-stream applications. However, there is still a lack of…

Computation and Language · Computer Science 2018-11-27 Victor Sanh , Thomas Wolf , Sebastian Ruder

Pre-trained encoder-only and sequence-to-sequence (seq2seq) models each have advantages, however training both model types from scratch is computationally expensive. We explore recipes to improve pre-training efficiency by initializing one…

Computation and Language · Computer Science 2023-06-16 Saleh Soltan , Andy Rosenbaum , Tobias Falke , Qin Lu , Anna Rumshisky , Wael Hamza

Detecting structural similarity between queries is essential for selecting examples in in-context learning models. However, assessing structural similarity based solely on the natural language expressions of queries, without considering SQL…

Computation and Language · Computer Science 2024-03-26 Mohammadreza Pourreza , Davood Rafiei , Yuxi Feng , Raymond Li , Zhenan Fan , Weiwei Zhang
‹ Prev 1 2 3 10 Next ›