Related papers: Contrastive Code Representation Learning

ContraBERT: Enhancing Code Pre-trained Models via Contrastive Learning

Large-scale pre-trained models such as CodeBERT, GraphCodeBERT have earned widespread attention from both academia and industry. Attributed to the superior ability in code representation, they have been further applied in multiple…

Software Engineering · Computer Science 2023-01-24 Shangqing Liu , Bozhi Wu , Xiaofei Xie , Guozhu Meng , Yang Liu

CONCORD: Clone-aware Contrastive Learning for Source Code

Deep Learning (DL) models to analyze source code have shown immense promise during the past few years. More recently, self-supervised pre-training has gained traction for learning generic code representations valuable for many downstream SE…

Software Engineering · Computer Science 2023-06-07 Yangruibo Ding , Saikat Chakraborty , Luca Buratti , Saurabh Pujar , Alessandro Morari , Gail Kaiser , Baishakhi Ray

Self-Supervised Contrastive Learning for Code Retrieval and Summarization via Semantic-Preserving Transformations

We propose Corder, a self-supervised contrastive learning framework for source code model. Corder is designed to alleviate the need of labeled data for code retrieval and code summarization tasks. The pre-trained model of Corder can be used…

Software Engineering · Computer Science 2021-05-25 Nghi D. Q. Bui , Yijun Yu , Lingxiao Jiang

SynCoBERT: Syntax-Guided Multi-Modal Contrastive Pre-Training for Code Representation

Code representation learning, which aims to encode the semantics of source code into distributed vectors, plays an important role in recent deep-learning-based models for code intelligence. Recently, many pre-trained language models for…

Computation and Language · Computer Science 2021-09-10 Xin Wang , Yasheng Wang , Fei Mi , Pingyi Zhou , Yao Wan , Xiao Liu , Li Li , Hao Wu , Jin Liu , Xin Jiang

CODE-MVP: Learning to Represent Source Code from Multiple Views with Contrastive Pre-Training

Recent years have witnessed increasing interest in code representation learning, which aims to represent the semantics of source code into distributed vectors. Currently, various works have been proposed to represent the complex semantics…

Programming Languages · Computer Science 2022-05-05 Xin Wang , Yasheng Wang , Yao Wan , Jiawei Wang , Pingyi Zhou , Li Li , Hao Wu , Jin Liu

ContraCLM: Contrastive Learning For Causal Language Model

Despite exciting progress in causal language models, the expressiveness of the representations is largely limited due to poor discrimination ability. To remedy this issue, we present ContraCLM, a novel contrastive learning framework at both…

Computation and Language · Computer Science 2023-05-04 Nihal Jain , Dejiao Zhang , Wasi Uddin Ahmad , Zijian Wang , Feng Nan , Xiaopeng Li , Ming Tan , Ramesh Nallapati , Baishakhi Ray , Parminder Bhatia , Xiaofei Ma , Bing Xiang

CERT: Contrastive Self-supervised Learning for Language Understanding

Pretrained language models such as BERT, GPT have shown great effectiveness in language understanding. The auxiliary predictive tasks in existing pretraining approaches are mostly defined on tokens, thus may not be able to capture…

Computation and Language · Computer Science 2020-06-19 Hongchao Fang , Sicheng Wang , Meng Zhou , Jiayuan Ding , Pengtao Xie

CoNT: Contrastive Neural Text Generation

Recently, contrastive learning attracts increasing interests in neural text generation as a new solution to alleviate the exposure bias problem. It introduces a sequence-level training signal which is crucial to generation tasks that always…

Computation and Language · Computer Science 2023-02-06 Chenxin An , Jiangtao Feng , Kai Lv , Lingpeng Kong , Xipeng Qiu , Xuanjing Huang

CAPT: Contrastive Pre-Training for Learning Denoised Sequence Representations

Pre-trained self-supervised models such as BERT have achieved striking success in learning sequence representations, especially for natural language processing. These models typically corrupt the given sequences with certain types of noise,…

Computation and Language · Computer Science 2020-11-02 Fuli Luo , Pengcheng Yang , Shicheng Li , Xuancheng Ren , Xu Sun

Pre-Training Representations of Binary Code Using Contrastive Learning

Binary code analysis and comprehension is critical to applications in reverse engineering and computer security tasks where source code is not available. Unfortunately, unlike source code, binary code lacks semantics and is more difficult…

Software Engineering · Computer Science 2025-09-29 Yifan Zhang , Chen Huang , Yueke Zhang , Huajie Shao , Kevin Leach , Yu Huang

Bridging Pre-trained Models and Downstream Tasks for Source Code Understanding

With the great success of pre-trained models, the pretrain-then-finetune paradigm has been widely adopted on downstream tasks for source code understanding. However, compared to costly training a large-scale model from scratch, how to…

Software Engineering · Computer Science 2022-03-16 Deze Wang , Zhouyang Jia , Shanshan Li , Yue Yu , Yun Xiong , Wei Dong , Xiangke Liao

Soft-Labeled Contrastive Pre-training for Function-level Code Representation

Code contrastive pre-training has recently achieved significant progress on code-related tasks. In this paper, we present \textbf{SCodeR}, a \textbf{S}oft-labeled contrastive pre-training framework with two positive sample construction…

Computation and Language · Computer Science 2022-10-27 Xiaonan Li , Daya Guo , Yeyun Gong , Yun Lin , Yelong Shen , Xipeng Qiu , Daxin Jiang , Weizhu Chen , Nan Duan

Code Representation Learning At Scale

Recent studies have shown that code language models at scale demonstrate significant performance gains on downstream tasks, i.e., code generation. However, most of the existing works on code representation learning train models at a hundred…

Computation and Language · Computer Science 2024-02-06 Dejiao Zhang , Wasi Ahmad , Ming Tan , Hantian Ding , Ramesh Nallapati , Dan Roth , Xiaofei Ma , Bing Xiang

Text and Code Embeddings by Contrastive Pre-Training

Text embeddings are useful features in many applications such as semantic search and computing text similarity. Previous work typically trains models customized for different use cases, varying in dataset choice, training objective and…

Computation and Language · Computer Science 2022-01-26 Arvind Neelakantan , Tao Xu , Raul Puri , Alec Radford , Jesse Michael Han , Jerry Tworek , Qiming Yuan , Nikolas Tezak , Jong Wook Kim , Chris Hallacy , Johannes Heidecke , Pranav Shyam , Boris Power , Tyna Eloundou Nekoul , Girish Sastry , Gretchen Krueger , David Schnurr , Felipe Petroski Such , Kenny Hsu , Madeleine Thompson , Tabarak Khan , Toki Sherbakov , Joanne Jang , Peter Welinder , Lilian Weng

Commit2Vec: Learning Distributed Representations of Code Changes

Deep learning methods, which have found successful applications in fields like image classification and natural language processing, have recently been applied to source code analysis too, due to the enormous amount of freely available…

Software Engineering · Computer Science 2021-11-18 Rocìo Cabrera Lozoya , Arnaud Baumann , Antonino Sabetta , Michele Bezzi

Supervised Contrastive Learning for Product Matching

Contrastive learning has moved the state of the art for many tasks in computer vision and information retrieval in recent years. This poster is the first work that applies supervised contrastive learning to the task of product matching in…

Machine Learning · Computer Science 2022-05-03 Ralph Peeters , Christian Bizer

Revisiting Contrastive Methods for Unsupervised Learning of Visual Representations

Contrastive self-supervised learning has outperformed supervised pretraining on many downstream tasks like segmentation and object detection. However, current methods are still primarily applied to curated datasets like ImageNet. In this…

Computer Vision and Pattern Recognition · Computer Science 2021-12-15 Wouter Van Gansbeke , Simon Vandenhende , Stamatios Georgoulis , Luc Van Gool

INSPECT: Intrinsic and Systematic Probing Evaluation for Code Transformers

Pre-trained models of source code have recently been successfully applied to a wide variety of Software Engineering tasks; they have also seen some practical adoption in practice, e.g. for code completion. Yet, we still know very little…

Software Engineering · Computer Science 2023-12-11 Anjan Karmakar , Romain Robbes

Contrastive Learning of Visual-Semantic Embeddings

Contrastive learning is a powerful technique to learn representations that are semantically distinctive and geometrically invariant. While most of the earlier approaches have demonstrated its effectiveness on single-modality learning tasks…

Computer Vision and Pattern Recognition · Computer Science 2021-10-19 Anurag Jain , Yashaswi Verma

CodeRetriever: Unimodal and Bimodal Contrastive Learning for Code Search

In this paper, we propose the CodeRetriever model, which learns the function-level code semantic representations through large-scale code-text contrastive pre-training. We adopt two contrastive learning schemes in CodeRetriever: unimodal…

Computation and Language · Computer Science 2022-10-27 Xiaonan Li , Yeyun Gong , Yelong Shen , Xipeng Qiu , Hang Zhang , Bolun Yao , Weizhen Qi , Daxin Jiang , Weizhu Chen , Nan Duan