Related papers: Text Classification for Task-based Source Code Rel…

code2seq: Generating Sequences from Structured Representations of Code

The ability to generate natural language sequences from source code snippets has a variety of applications such as code summarization, documentation, and retrieval. Sequence-to-sequence (seq2seq) models, adopted from neural machine…

Machine Learning · Computer Science 2019-02-22 Uri Alon , Shaked Brody , Omer Levy , Eran Yahav

Unleashing the True Potential of Sequence-to-Sequence Models for Sequence Tagging and Structure Parsing

Sequence-to-Sequence (S2S) models have achieved remarkable success on various text generation tasks. However, learning complex structures with S2S models remains challenging as external neural modules and additional lexicons are often…

Computation and Language · Computer Science 2023-02-07 Han He , Jinho D. Choi

E2S2: Encoding-Enhanced Sequence-to-Sequence Pretraining for Language Understanding and Generation

Sequence-to-sequence (seq2seq) learning is a popular fashion for large-scale pretraining language models. However, the prior seq2seq pretraining models generally focus on reconstructive objectives on the decoder side and neglect the effect…

Computation and Language · Computer Science 2024-01-10 Qihuang Zhong , Liang Ding , Juhua Liu , Bo Du , Dacheng Tao

Code Generation from Natural Language with Less Prior and More Monolingual Data

Training datasets for semantic parsing are typically small due to the higher expertise required for annotation than most other NLP tasks. As a result, models for this application usually need additional prior knowledge to be built into the…

Computation and Language · Computer Science 2021-06-11 Sajad Norouzi , Keyi Tang , Yanshuai Cao

Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning

A significant amount of the world's knowledge is stored in relational databases. However, the ability for users to retrieve facts from a database is limited due to a lack of understanding of query languages such as SQL. We propose Seq2SQL,…

Computation and Language · Computer Science 2017-11-13 Victor Zhong , Caiming Xiong , Richard Socher

BERT2Code: Can Pretrained Language Models be Leveraged for Code Search?

Millions of repetitive code snippets are submitted to code repositories every day. To search from these large codebases using simple natural language queries would allow programmers to ideate, prototype, and develop easier and faster.…

Software Engineering · Computer Science 2021-04-19 Abdullah Al Ishtiaq , Masum Hasan , Md. Mahim Anjum Haque , Kazi Sajeed Mehrab , Tanveer Muttaqueen , Tahmid Hasan , Anindya Iqbal , Rifat Shahriyar

Code Search based on Context-aware Code Translation

Code search is a widely used technique by developers during software development. It provides semantically similar implementations from a large code corpus to developers based on their queries. Existing techniques leverage deep learning…

Software Engineering · Computer Science 2022-02-17 Weisong Sun , Chunrong Fang , Yuchen Chen , Guanhong Tao , Tingxu Han , Quanjun Zhang

Commit2Vec: Learning Distributed Representations of Code Changes

Deep learning methods, which have found successful applications in fields like image classification and natural language processing, have recently been applied to source code analysis too, due to the enormous amount of freely available…

Software Engineering · Computer Science 2021-11-18 Rocìo Cabrera Lozoya , Arnaud Baumann , Antonino Sabetta , Michele Bezzi

Subword Semantic Hashing for Intent Classification on Small Datasets

In this paper, we introduce the use of Semantic Hashing as embedding for the task of Intent Classification and achieve state-of-the-art performance on three frequently used benchmarks. Intent Classification on a small dataset is a…

Computation and Language · Computer Science 2020-01-15 Kumar Shridhar , Ayushman Dash , Amit Sahu , Gustav Grund Pihlgren , Pedro Alonso , Vinaychandran Pondenkandath , Gyorgy Kovacs , Foteini Simistira , Marcus Liwicki

Logical Parsing from Natural Language Based on a Neural Translation Model

Semantic parsing has emerged as a significant and powerful paradigm for natural language interface and question answering systems. Traditional methods of building a semantic parser rely on high-quality lexicons, hand-crafted grammars and…

Computation and Language · Computer Science 2017-05-10 Liang Li , Pengyu Li , Yifan Liu , Tao Wan , Zengchang Qin

Learning Deep Semantic Model for Code Search using CodeSearchNet Corpus

Semantic code search is the task of retrieving relevant code snippet given a natural language query. Different from typical information retrieval tasks, code search requires to bridge the semantic gap between the programming language and…

Computation and Language · Computer Science 2022-01-28 Chen Wu , Ming Yan

Efficient Machine Translation with a BiLSTM-Attention Approach

With the rapid development of Natural Language Processing (NLP) technology, the accuracy and efficiency of machine translation have become hot topics of research. This paper proposes a novel Seq2Seq model aimed at improving translation…

Computation and Language · Computer Science 2024-11-01 Yuxu Wu , Yiren Xing

A Bilingual Generative Transformer for Semantic Sentence Embedding

Semantic sentence embedding models encode natural language sentences into vectors, such that closeness in embedding space indicates closeness in the semantics between the sentences. Bilingual data offers a useful signal for learning such…

Computation and Language · Computer Science 2020-11-20 John Wieting , Graham Neubig , Taylor Berg-Kirkpatrick

Cross-Thought for Sentence Encoder Pre-training

In this paper, we propose Cross-Thought, a novel approach to pre-training sequence encoder, which is instrumental in building reusable sequence embeddings for large-scale NLP tasks such as question answering. Instead of using the original…

Computation and Language · Computer Science 2020-10-09 Shuohang Wang , Yuwei Fang , Siqi Sun , Zhe Gan , Yu Cheng , Jing Jiang , Jingjing Liu

Cross-modal Retrieval Models for Stripped Binary Analysis

Retrieving binary code via natural language queries is a pivotal capability for downstream tasks in the software security domain, such as vulnerability detection and malware analysis. However, it is challenging to identify binary functions…

Software Engineering · Computer Science 2026-01-06 Guoqiang Chen , Lingyun Ying , Ziyang Song , Daguang Liu , Qiang Wang , Zhiqi Wang , Li Hu , Shaoyin Cheng , Weiming Zhang , Nenghai Yu

On Training a Neural Network to Explain Binaries

In this work, we begin to investigate the possibility of training a deep neural network on the task of binary code understanding. Specifically, the network would take, as input, features derived directly from binaries and output English…

Machine Learning · Computer Science 2024-05-01 Alexander Interrante-Grant , Andy Davis , Heather Preslier , Tim Leek

Joint Copying and Restricted Generation for Paraphrase

Many natural language generation tasks, such as abstractive summarization and text simplification, are paraphrase-orientated. In these tasks, copying and rewriting are two main writing modes. Most previous sequence-to-sequence (Seq2Seq)…

Computation and Language · Computer Science 2016-11-29 Ziqiang Cao , Chuwei Luo , Wenjie Li , Sujian Li

A Hierarchical Multi-task Approach for Learning Embeddings from Semantic Tasks

Much effort has been devoted to evaluate whether multi-task learning can be leveraged to learn rich representations that can be used in various Natural Language Processing (NLP) down-stream applications. However, there is still a lack of…

Computation and Language · Computer Science 2018-11-27 Victor Sanh , Thomas Wolf , Sebastian Ruder

Recipes for Sequential Pre-training of Multilingual Encoder and Seq2Seq Models

Pre-trained encoder-only and sequence-to-sequence (seq2seq) models each have advantages, however training both model types from scratch is computationally expensive. We explore recipes to improve pre-training efficiency by initializing one…

Computation and Language · Computer Science 2023-06-16 Saleh Soltan , Andy Rosenbaum , Tobias Falke , Qin Lu , Anna Rumshisky , Wael Hamza

SQL-Encoder: Improving NL2SQL In-Context Learning Through a Context-Aware Encoder

Detecting structural similarity between queries is essential for selecting examples in in-context learning models. However, assessing structural similarity based solely on the natural language expressions of queries, without considering SQL…

Computation and Language · Computer Science 2024-03-26 Mohammadreza Pourreza , Davood Rafiei , Yuxi Feng , Raymond Li , Zhenan Fan , Weiwei Zhang