English
Related papers

Related papers: CodeLabeller: A Web-based Code Annotation Tool for…

200 papers

Labeled datasets are essential for supervised machine learning. Various data labeling tools have been built to collect labels in different usage scenarios. However, developing labeling tools is time-consuming, costly, and…

Human-Computer Interaction · Computer Science 2022-03-29 Yu Zhang , Yun Wang , Haidong Zhang , Bin Zhu , Siming Chen , Dongmei Zhang

Despite rapid developments in the field of machine learning research, collecting high-quality labels for supervised learning remains a bottleneck for many applications. This difficulty is exacerbated by the fact that state-of-the-art models…

Computation and Language · Computer Science 2021-06-25 Dongjin Choi , Sara Evensen , Çağatay Demiralp , Estevam Hruschka

Code review is considered a key process in the software industry for minimizing bugs and improving code quality. Inspection of review process effectiveness and continuous improvement can boost development productivity. Such inspection is a…

Software Engineering · Computer Science 2023-07-11 Saifullah Mahbub , Md. Easin Arafat , Chowdhury Rafeed Rahman , Zannatul Ferdows , Masum Hasan

Comments are an integral part of software development; they are natural language descriptions associated with source code elements. Understanding explicit associations can be useful in improving code comprehensibility and maintaining the…

Computation and Language · Computer Science 2019-12-17 Sheena Panthaplackel , Milos Gligoric , Raymond J. Mooney , Junyi Jessy Li

Source Code Summarization is the task of writing short, natural language descriptions of source code. The main use for these descriptions is in software documentation e.g. the one-sentence Java method descriptions in JavaDocs. Code…

Computation and Language · Computer Science 2019-04-05 Alexander LeClair , Collin McMillan

Real-world data for classification is often labeled by multiple annotators. For analyzing such data, we introduce CROWDLAB, a straightforward approach to utilize any trained classifier to estimate: (1) A consensus label for each example…

Machine Learning · Computer Science 2023-01-30 Hui Wen Goh , Ulyana Tkachenko , Jonas Mueller

With the rapid accumulation of text data produced by data-driven techniques, the task of extracting "data annotations"--concise, high-quality data summaries from unstructured raw text--has become increasingly important. The recent advances…

Human-Computer Interaction · Computer Science 2023-04-03 Xiaoyu Zhang , Xiwei Xuan , Alden Dima , Thurston Sexton , Kwan-Liu Ma

This paper describes a new modelling language for the effective design of Java annotations. Since their inclusion in the 5th edition of Java, annotations have grown from a useful tool for the addition of meta-data to play a central role in…

Programming Languages · Computer Science 2019-10-02 Irene Córdoba , Juan de Lara

The paper presents the Source Code Analysis and Lexical Annotation Runtime (SCALAR), a tool specialized for mapping (annotating) source code identifier names to their corresponding part-of-speech tag sequence (grammar pattern). SCALAR's…

We propose Corder, a self-supervised contrastive learning framework for source code model. Corder is designed to alleviate the need of labeled data for code retrieval and code summarization tasks. The pre-trained model of Corder can be used…

Software Engineering · Computer Science 2021-05-25 Nghi D. Q. Bui , Yijun Yu , Lingxiao Jiang

In real-world data labeling applications, annotators often provide imperfect labels. It is thus common to employ multiple annotators to label data with some overlap between their examples. We study active learning in such settings, aiming…

Machine Learning · Computer Science 2024-07-29 Hui Wen Goh , Jonas Mueller

Foundation models (e.g., CodeBERT, GraphCodeBERT, CodeT5) work well for many software engineering tasks. These models are pre-trained (using self-supervision) with billions of code tokens, and then fine-tuned with hundreds of thousands of…

Software Engineering · Computer Science 2022-06-03 Toufique Ahmed , Premkumar Devanbu

Translation between natural language and source code can help software development by enabling developers to comprehend, ideate, search, and write computer programs in natural language. Despite growing interest from the industry and the…

Applying Machine learning to domains like Earth Sciences is impeded by the lack of labeled data, despite a large corpus of raw data available in such domains. For instance, training a wildfire classifier on satellite imagery requires…

Computer Vision and Pattern Recognition · Computer Science 2023-01-02 Tarun Narayanan , Ajay Krishnan , Anirudh Koul , Siddha Ganju

Representing source code in a generic input format is crucial to automate software engineering tasks, e.g., applying machine learning algorithms to extract information. Visualizing code representations can further enable human experts to…

Software Engineering · Computer Science 2023-07-28 Yuejun Guo , Seifeddine Bettaieb , Qiang Hu , Yves Le Traon , Qiang Tang

In the realm of document engineering and Natural Language Processing (NLP), the integration of digitally born catalogs into product design processes presents a novel avenue for enhancing information extraction and interoperability. This…

Systems and Control · Electrical Eng. & Systems 2024-08-16 Hasan Sinan Bank , Daniel R. Herber

Recent research in the field of computer vision strongly focuses on deep learning architectures to tackle image processing problems. Deep neural networks are often considered in complex image processing scenarios since traditional computer…

Computer Vision and Pattern Recognition · Computer Science 2021-11-30 Marcel P. Schilling , Luca Rettenberger , Friedrich Münke , Haijun Cui , Anna A. Popova , Pavel A. Levkin , Ralf Mikut , Markus Reischl

We introduce EduCoder, a domain-specialized tool designed to support utterance-level annotation of educational dialogue. While general-purpose text annotation tools for NLP and qualitative research abound, few address the complexities of…

Computation and Language · Computer Science 2026-05-06 Saad Ashraf , James Malamut , Vishal Kumar , Guanzhong Pan , Hyunji Nam , Mei Tan , Lucía Langlois , Liliana Deonizio , Helen Higgins , Dorottya Demszky

Curating high-quality, domain-specific datasets is a major bottleneck for deploying robust vision systems, requiring complex trade-offs between data quality, diversity, and cost when researching vast, unlabeled data lakes. We introduce…

Large scale analysis of source code, and in particular scientific source code, holds the promise of better understanding the data science process, identifying analytical best practices, and providing insights to the builders of scientific…

Machine Learning · Computer Science 2020-09-01 Ge Zhang , Mike A. Merrill , Yang Liu , Jeffrey Heer , Tim Althoff
‹ Prev 1 2 3 10 Next ›