Related papers: InferCode: Self-Supervised Learning of Code Repres…

TransformCode: A Contrastive Learning Framework for Code Embedding via Subtree Transformation

Artificial intelligence (AI) has revolutionized software engineering (SE) by enhancing software development efficiency. The advent of pre-trained models (PTMs) leveraging transfer learning has significantly advanced AI for SE. However,…

Software Engineering · Computer Science 2024-04-25 Zixiang Xian , Rubing Huang , Dave Towey , Chunrong Fang , Zhenyu Chen

A Survey on Self-supervised Pre-training for Sequential Transfer Learning in Neural Networks

Deep neural networks are typically trained under a supervised learning framework where a model learns a single task using labeled data. Instead of relying solely on labeled data, practitioners can harness unlabeled or related data to…

Machine Learning · Computer Science 2020-07-03 Huanru Henry Mao

Commit2Vec: Learning Distributed Representations of Code Changes

Deep learning methods, which have found successful applications in fields like image classification and natural language processing, have recently been applied to source code analysis too, due to the enormous amount of freely available…

Software Engineering · Computer Science 2021-11-18 Rocìo Cabrera Lozoya , Arnaud Baumann , Antonino Sabetta , Michele Bezzi

Code Representation Learning with Pr\"ufer Sequences

An effective and efficient encoding of the source code of a computer program is critical to the success of sequence-to-sequence deep neural network models for tasks in computer program comprehension, such as automated code summarization and…

Artificial Intelligence · Computer Science 2021-11-16 Tenzin Jinpa , Yong Gao

Self-Supervised Learning via Maximum Entropy Coding

A mainstream type of current self-supervised learning methods pursues a general-purpose representation that can be well transferred to downstream tasks, typically by optimizing on a given pretext task such as instance discrimination. In…

Computer Vision and Pattern Recognition · Computer Science 2022-10-21 Xin Liu , Zhongdao Wang , Yali Li , Shengjin Wang

Multi-Task Self-Supervised Pre-Training for Music Classification

Deep learning is very data hungry, and supervised learning especially requires massive labeled data to work well. Machine listening research often suffers from limited labeled data problem, as human annotations are costly to acquire, and…

Sound · Computer Science 2021-02-08 Ho-Hsiang Wu , Chieh-Chi Kao , Qingming Tang , Ming Sun , Brian McFee , Juan Pablo Bello , Chao Wang

Aligned Unsupervised Pretraining of Object Detectors with Self-training

The unsupervised pretraining of object detectors has recently become a key component of object detector training, as it leads to improved performance and faster convergence during the supervised fine-tuning stage. Existing unsupervised…

Computer Vision and Pattern Recognition · Computer Science 2024-07-09 Ioannis Maniadis Metaxas , Adrian Bulat , Ioannis Patras , Brais Martinez , Georgios Tzimiropoulos

Supervised Pretraining for Material Property Prediction

Accurate prediction of material properties facilitates the discovery of novel materials with tailored functionalities. Deep learning models have recently shown superior accuracy and flexibility in capturing structure-property relationships.…

Machine Learning · Computer Science 2025-04-30 Chowdhury Mohammad Abid Rahman , Aldo H. Romero , Prashnna K. Gyawali

Don't freeze: Finetune encoders for better Self-Supervised HAR

Recently self-supervised learning has been proposed in the field of human activity recognition as a solution to the labelled data availability problem. The idea being that by using pretext tasks such as reconstruction or contrastive…

Machine Learning · Computer Science 2023-07-04 Vitor Fortes Rey , Dominique Nshimyimana , Paul Lukowicz

Self-Training with Weak Supervision

State-of-the-art deep neural networks require large-scale labeled training data that is often expensive to obtain or not available for many tasks. Weak supervision in the form of domain-specific rules has been shown to be useful in such…

Computation and Language · Computer Science 2021-04-13 Giannis Karamanolakis , Subhabrata Mukherjee , Guoqing Zheng , Ahmed Hassan Awadallah

Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning

The success of deep learning in computer vision is rooted in the ability of deep networks to scale up model complexity as demanded by challenging visual tasks. As complexity is increased, so is the need for large amounts of labeled data to…

Computer Vision and Pattern Recognition · Computer Science 2017-08-22 Gustav Larsson

Cross-Language Source Code Clone Detection Using Deep Learning with InferCode

Software clones are beneficial to detect security gaps and software maintenance in one programming language or across multiple languages. The existing work on source clone detection performs well but in a single programming language.…

Software Engineering · Computer Science 2022-05-11 Mohammad A. Yahya , Dae-Kyoo Kim

Code Representation Learning At Scale

Recent studies have shown that code language models at scale demonstrate significant performance gains on downstream tasks, i.e., code generation. However, most of the existing works on code representation learning train models at a hundred…

Computation and Language · Computer Science 2024-02-06 Dejiao Zhang , Wasi Ahmad , Ming Tan , Hantian Ding , Ramesh Nallapati , Dan Roth , Xiaofei Ma , Bing Xiang

Learning Invariant World State Representations with Predictive Coding

Self-supervised learning methods overcome the key bottleneck for building more capable AI: limited availability of labeled data. However, one of the drawbacks of self-supervised architectures is that the representations that they learn are…

Machine Learning · Computer Science 2022-07-08 Avi Ziskind , Sujeong Kim , Giedrius T. Burachas

Bridging Pre-trained Models and Downstream Tasks for Source Code Understanding

With the great success of pre-trained models, the pretrain-then-finetune paradigm has been widely adopted on downstream tasks for source code understanding. However, compared to costly training a large-scale model from scratch, how to…

Software Engineering · Computer Science 2022-03-16 Deze Wang , Zhouyang Jia , Shanshan Li , Yue Yu , Yun Xiong , Wei Dong , Xiangke Liao

Effective training of deep convolutional neural networks for hyperspectral image classification through artificial labeling

Hyperspectral imaging is a rich source of data, allowing for multitude of effective applications. However, such imaging remains challenging because of large data dimension and, typically, small pool of available training examples. While…

Neural and Evolutionary Computing · Computer Science 2020-10-23 Wojciech Masarczyk , Przemysław Głomb , Bartosz Grabowski , Mateusz Ostaszewski

Self-supervised Pre-training of Text Recognizers

In this paper, we investigate self-supervised pre-training methods for document text recognition. Nowadays, large unlabeled datasets can be collected for many research tasks, including text recognition, but it is costly to annotate them.…

Computer Vision and Pattern Recognition · Computer Science 2024-05-02 Martin Kišš , Michal Hradiš

UniXcoder: Unified Cross-Modal Pre-training for Code Representation

Pre-trained models for programming languages have recently demonstrated great success on code intelligence. To support both code-related understanding and generation tasks, recent works attempt to pre-train unified encoder-decoder models.…

Computation and Language · Computer Science 2022-03-09 Daya Guo , Shuai Lu , Nan Duan , Yanlin Wang , Ming Zhou , Jian Yin

Deep Distributed Random Samplings for Supervised Learning: An Alternative to Random Forests?

In (\cite{zhang2014nonlinear,zhang2014nonlinear2}), we have viewed machine learning as a coding and dimensionality reduction problem, and further proposed a simple unsupervised dimensionality reduction method, entitled deep distributed…

Machine Learning · Computer Science 2015-01-29 Xiao-Lei Zhang

Analysing the Behaviour of Tree-Based Neural Networks in Regression Tasks

The landscape of deep learning has vastly expanded the frontiers of source code analysis, particularly through the utilization of structural representations such as Abstract Syntax Trees (ASTs). While these methodologies have demonstrated…

Machine Learning · Computer Science 2024-06-18 Peter Samoaa , Mehrdad Farahani , Antonio Longa , Philipp Leitner , Morteza Haghir Chehreghani