Related papers: CORAL: COde RepresentAtion Learning with Weakly-Su…

Weakly Supervised Representation Learning with Coarse Labels

With the development of computational power and techniques for data collection, deep learning demonstrates a superior performance over most existing algorithms on visual benchmark data sets. Many efforts have been devoted to studying the…

Computer Vision and Pattern Recognition · Computer Science 2021-08-25 Yuanhong Xu , Qi Qian , Hao Li , Rong Jin , Juhua Hu

Inferring Generative Model Structure with Static Analysis

Obtaining enough labeled data to robustly train complex discriminative models is a major bottleneck in the machine learning pipeline. A popular solution is combining multiple sources of weak supervision using generative models. The…

Machine Learning · Computer Science 2017-09-11 Paroma Varma , Bryan He , Payal Bajaj , Imon Banerjee , Nishith Khandwala , Daniel L. Rubin , Christopher Ré

Using Large-scale Heterogeneous Graph Representation Learning for Code Review Recommendations at Microsoft

Code review is an integral part of any mature software development process, and identifying the best reviewer for a code change is a well-accepted problem within the software engineering community. Selecting a reviewer who lacks expertise…

Software Engineering · Computer Science 2023-02-06 Jiyang Zhang , Chandra Maddila , Ram Bairi , Christian Bird , Ujjwal Raizada , Apoorva Agrawal , Yamini Jhawar , Kim Herzig , Arie van Deursen

Weakly supervised causal representation learning

Learning high-level causal representations together with a causal model from unstructured low-level data such as pixels is impossible from observational data alone. We prove under mild assumptions that this representation is however…

Machine Learning · Statistics 2022-10-12 Johann Brehmer , Pim de Haan , Phillip Lippe , Taco Cohen

Corpus Considerations for Annotator Modeling and Scaling

Recent trends in natural language processing research and annotation tasks affirm a paradigm shift from the traditional reliance on a single ground truth to a focus on individual perspectives, particularly in subjective tasks. In scenarios…

Computation and Language · Computer Science 2024-04-18 Olufunke O. Sarumi , Béla Neuendorf , Joan Plepi , Lucie Flek , Jörg Schlötterer , Charles Welch

Self-supervised Multi-scale Consistency for Weakly Supervised Segmentation Learning

Collecting large-scale medical datasets with fine-grained annotations is time-consuming and requires experts. For this reason, weakly supervised learning aims at optimising machine learning models using weaker forms of annotations, such as…

Computer Vision and Pattern Recognition · Computer Science 2021-08-27 Gabriele Valvano , Andrea Leo , Sotirios A. Tsaftaris

Scaling Experiments in Self-Supervised Cross-Table Representation Learning

To analyze the scaling potential of deep tabular representation learning models, we introduce a novel Transformer-based architecture specifically tailored to tabular data and cross-table representation learning by utilizing table-specific…

Machine Learning · Computer Science 2023-10-02 Maximilian Schambach , Dominique Paul , Johannes S. Otterbach

Extracting Fine-Grained Knowledge Graphs of Scientific Claims: Dataset and Transformer-Based Results

Recent transformer-based approaches demonstrate promising results on relational scientific information extraction. Existing datasets focus on high-level description of how research is carried out. Instead we focus on the subtleties of how…

Computation and Language · Computer Science 2021-09-23 Ian H. Magnusson , Scott E. Friedman

COLA: COarse LAbel pre-training for 3D semantic segmentation of sparse LiDAR datasets

Transfer learning is a proven technique in 2D computer vision to leverage the large amount of data available and achieve high performance with datasets limited in size due to the cost of acquisition or annotation. In 3D, annotation is known…

Computer Vision and Pattern Recognition · Computer Science 2023-03-22 Jules Sanchez , Jean-Emmanuel Deschaud , François Goulette

Weakly Supervised Cross-Lingual Named Entity Recognition via Effective Annotation and Representation Projection

The state-of-the-art named entity recognition (NER) systems are supervised machine learning models that require large amounts of manually annotated data to achieve high accuracy. However, annotating NER data by human is expensive and…

Computation and Language · Computer Science 2019-11-04 Jian Ni , Georgiana Dinu , Radu Florian

Observing Fine-Grained Changes in Jupyter Notebooks During Development Time

In software engineering, numerous studies have focused on the analysis of fine-grained logs, leading to significant innovations in areas such as refactoring, security, and code completion. However, no similar studies have been conducted for…

Software Engineering · Computer Science 2025-07-28 Sergey Titov , Konstantin Grotov , Cristina Sarasua , Yaroslav Golubev , Dhivyabharathi Ramasamy , Alberto Bacchelli , Abraham Bernstein , Timofey Bryksin

A Confidence-based Acquisition Model for Self-supervised Active Learning and Label Correction

Supervised neural approaches are hindered by their dependence on large, meticulously annotated datasets, a requirement that is particularly cumbersome for sequential tasks. The quality of annotations tends to deteriorate with the transition…

Computation and Language · Computer Science 2025-03-10 Carel van Niekerk , Christian Geishauser , Michael Heck , Shutong Feng , Hsien-chin Lin , Nurul Lubis , Benjamin Ruppik , Renato Vukovic , Milica Gašić

Self-Supervised Contrastive Learning for Code Retrieval and Summarization via Semantic-Preserving Transformations

We propose Corder, a self-supervised contrastive learning framework for source code model. Corder is designed to alleviate the need of labeled data for code retrieval and code summarization tasks. The pre-trained model of Corder can be used…

Software Engineering · Computer Science 2021-05-25 Nghi D. Q. Bui , Yijun Yu , Lingxiao Jiang

WHATSNEXT: Guidance-enriched Exploratory Data Analysis with Interactive, Low-Code Notebooks

Computational notebooks such as Jupyter are popular for exploratory data analysis and insight finding. Despite the module-based structure, notebooks visually appear as a single thread of interleaved cells containing text, code,…

Human-Computer Interaction · Computer Science 2023-08-22 Chen Chen , Jane Hoffswell , Shunan Guo , Ryan Rossi , Yeuk-Yin Chan , Fan Du , Eunyee Koh , Zhicheng Liu

Deep CORAL: Correlation Alignment for Deep Domain Adaptation

Deep neural networks are able to learn powerful representations from large quantities of labeled input data, however they cannot always generalize well across changes in input distributions. Domain adaptation algorithms have been proposed…

Computer Vision and Pattern Recognition · Computer Science 2016-07-07 Baochen Sun , Kate Saenko

Data Consistency for Weakly Supervised Learning

In many applications, training machine learning models involves using large amounts of human-annotated data. Obtaining precise labels for the data is expensive. Instead, training with weak supervision provides a low-cost alternative. We…

Machine Learning · Computer Science 2022-02-09 Chidubem Arachie , Bert Huang

Predicting the Understandability of Computational Notebooks through Code Metrics Analysis

Computational notebooks are the primary coding tools for data scientists, but their code quality remains understudied and often poor. Given the importance of maintainability and reusability, enhancing code understandability is essential.…

Software Engineering · Computer Science 2025-06-19 Mojtaba Mostafavi Ghahfarokhi , Alireza Asadi , Arash Asgari , Bardia Mohammadi , Abbas Heydarnoori , Masih Beigi Rizi

Named Entity Recognition with Small Strongly Labeled and Large Weakly Labeled Data

Weak supervision has shown promising results in many natural language processing tasks, such as Named Entity Recognition (NER). Existing work mainly focuses on learning deep NER models only with weak supervision, i.e., without any human…

Computation and Language · Computer Science 2021-08-03 Haoming Jiang , Danqing Zhang , Tianyu Cao , Bing Yin , Tuo Zhao

Semi-weakly Supervised Contrastive Representation Learning for Retinal Fundus Images

We explore the value of weak labels in learning transferable representations for medical images. Compared to hand-labeled datasets, weak or inexact labels can be acquired in large quantities at significantly lower cost and can provide…

Computer Vision and Pattern Recognition · Computer Science 2021-08-05 Boon Peng Yap , Beng Koon Ng

Weakly-Supervised Neural Text Classification

Deep neural networks are gaining increasing popularity for the classic text classification task, due to their strong expressive power and less requirement for feature engineering. Despite such attractiveness, neural text classification…

Information Retrieval · Computer Science 2018-09-13 Yu Meng , Jiaming Shen , Chao Zhang , Jiawei Han