Related papers: Extracting Label-specific Key Input Features for N…

Syntax-Guided Program Reduction for Understanding Neural Code Intelligence Models

Neural code intelligence (CI) models are opaque black-boxes and offer little insight on the features they use in making predictions. This opacity may lead to distrust in their prediction and hamper their wider adoption in safety-critical…

Software Engineering · Computer Science 2022-06-15 Md Rafiqul Islam Rabin , Aftab Hussain , Mohammad Amin Alipour

Understanding Neural Code Intelligence Through Program Simplification

A wide range of code intelligence (CI) tools, powered by deep neural networks, have been developed recently to improve programming productivity and perform program analysis. To reliably use such tools, developers often need to reason about…

Software Engineering · Computer Science 2021-09-10 Md Rafiqul Islam Rabin , Vincent J. Hellendoorn , Mohammad Amin Alipour

Fantastic Features and Where to Find Them: Detecting Cognitive Impairment with a Subsequence Classification Guided Approach

Despite the widely reported success of embedding-based machine learning methods on natural language processing tasks, the use of more easily interpreted engineered features remains common in fields such as cognitive impairment (CI)…

Machine Learning · Computer Science 2020-10-14 Benjamin Eyre , Aparna Balagopalan , Jekaterina Novikova

Study of Distractors in Neural Models of Code

Finding important features that contribute to the prediction of neural models is an active area of research in explainable AI. Neural models are opaque and finding such features sheds light on a better understanding of their predictions. In…

Machine Learning · Computer Science 2023-08-15 Md Rafiqul Islam Rabin , Aftab Hussain , Sahil Suneja , Mohammad Amin Alipour

Machine Unlearning of Features and Labels

Removing information from a machine learning model is a non-trivial task that requires to partially revert the training process. This task is unavoidable when sensitive data, such as credit card numbers or passwords, accidentally enter the…

Machine Learning · Computer Science 2023-08-08 Alexander Warnecke , Lukas Pirch , Christian Wressnegger , Konrad Rieck

DeepCodeProbe: Towards Understanding What Models Trained on Code Learn

Machine learning models trained on code and related artifacts offer valuable support for software maintenance but suffer from interpretability issues due to their complex internal variables. These concerns are particularly significant in…

Software Engineering · Computer Science 2024-07-15 Vahid Majdinasab , Amin Nikanjam , Foutse Khomh

Providing Information About Implemented Algorithms Improves Program Comprehension: A Controlled Experiment

Context: Various approaches aim to support program comprehension by automatically detecting algorithms in source code. However, no empirical evaluations of their helpfulness have been performed. Objective: To empirically evaluate how…

Software Engineering · Computer Science 2025-04-29 Denis Neumüller , Alexander Raschke , Matthias Tichy

Pathologies of Neural Models Make Interpretations Difficult

One way to interpret neural model predictions is to highlight the most important input features---for example, a heatmap visualization over the words in an input sentence. In existing interpretation methods for NLP, a word's importance is…

Computation and Language · Computer Science 2022-09-07 Shi Feng , Eric Wallace , Alvin Grissom , Mohit Iyyer , Pedro Rodriguez , Jordan Boyd-Graber

Unveiling Project-Specific Bias in Neural Code Models

Deep learning has introduced significant improvements in many software analysis tasks. Although the Large Language Models (LLMs) based neural code models demonstrate commendable performance when trained and tested within the intra-project…

Artificial Intelligence · Computer Science 2024-03-12 Zhiming Li , Yanzhou Li , Tianlin Li , Mengnan Du , Bozhi Wu , Yushi Cao , Junzhe Jiang , Yang Liu

ReDef: Do Code Language Models Truly Understand Code Changes for Just-in-Time Software Defect Prediction?

Just-in-Time software defect prediction (JIT-SDP) plays a critical role in prioritizing risky code changes during code review and continuous integration. However, existing datasets often suffer from noisy labels and low precision in…

Software Engineering · Computer Science 2026-04-06 Doha Nam , Taehyoun Kim , Duksan Ryu , Jongmoon Baik

Improving Deep Learning Interpretability by Saliency Guided Training

Saliency methods have been widely used to highlight important input features in model predictions. Most existing methods use backpropagation on a modified gradient function to generate saliency maps. Thus, noisy gradients can result in…

Computer Vision and Pattern Recognition · Computer Science 2021-11-30 Aya Abdelsalam Ismail , Héctor Corrada Bravo , Soheil Feizi

SLIP: Soft Label Mechanism and Key-Extraction-Guided CoT-based Defense Against Instruction Backdoor in APIs

Customized Large Language Model (LLM) agents face a critical security threat from black-box instruction backdoors, where malicious behaviors are covertly injected through hidden system instructions. Although existing prompt-based defenses…

Cryptography and Security · Computer Science 2026-04-17 Zhengxian Wu , Juan Wen , Wanli Peng , Haowei Chang , Yinghan Zhou , Yiming Xue

Redundancy and Concept Analysis for Code-trained Language Models

Code-trained language models have proven to be highly effective for various code intelligence tasks. However, they can be challenging to train and deploy for many software engineering applications due to computational bottlenecks and memory…

Software Engineering · Computer Science 2024-02-19 Arushi Sharma , Zefu Hu , Christopher Quinn , Ali Jannesari

An Empirical Study on Noisy Label Learning for Program Understanding

Recently, deep learning models have been widely applied in program understanding tasks, and these models achieve state-of-the-art results on many benchmark datasets. A major challenge of deep learning for program understanding is that the…

Software Engineering · Computer Science 2024-01-02 Wenhan Wang , Yanzhou Li , Anran Li , Jian Zhang , Wei Ma , Yang Liu

Inferring Input Grammars from Dynamic Control Flow

A program is characterized by its input model, and a formal input model can be of use in diverse areas including vulnerability analysis, reverse engineering, fuzzing and software testing, clone detection and refactoring. Unfortunately,…

Software Engineering · Computer Science 2019-12-13 Rahul Gopinath , Björn Mathis , Andreas Zeller

Programs as Black-Box Explanations

Recent work in model-agnostic explanations of black-box machine learning has demonstrated that interpretability of complex models does not have to come at the cost of accuracy or model flexibility. However, it is not clear what kind of…

Machine Learning · Statistics 2016-11-24 Sameer Singh , Marco Tulio Ribeiro , Carlos Guestrin

Learning Model Agnostic Explanations via Constraint Programming

Interpretable Machine Learning faces a recurring challenge of explaining the predictions made by opaque classifiers such as ensemble models, kernel methods, or neural networks in terms that are understandable to humans. When the model is…

Machine Learning · Computer Science 2024-11-14 Frederic Koriche , Jean-Marie Lagniez , Stefan Mengel , Chi Tran

CodeImprove: Program Adaptation for Deep Code Models

Leveraging deep learning (DL)-based code analysis tools to solve software engineering tasks is becoming increasingly popular. Code models often suffer performance degradation due to various reasons (e.g., code data shifts). Retraining is…

Software Engineering · Computer Science 2025-06-18 Ravishka Rathnasuriya , Zijie Zhao , Wei Yang

Beyond Label Attention: Transparency in Language Models for Automated Medical Coding via Dictionary Learning

Medical coding, the translation of unstructured clinical text into standardized medical codes, is a crucial but time-consuming healthcare practice. Though large language models (LLM) could automate the coding process and improve the…

Computation and Language · Computer Science 2025-03-25 John Wu , David Wu , Jimeng Sun

Learning to Retrieve with Weakened Labels: Robust Training under Label Noise

Neural Encoders are frequently used in the NLP domain to perform dense retrieval tasks, for instance, to generate the candidate documents for a given query in question-answering tasks. However, sparse annotation and label noise in the…

Machine Learning · Computer Science 2025-12-16 Arnab Sharma