Related papers: A Deep Dive into Function Inlining and its Securit…

1-to-1 or 1-to-n? Investigating the effect of function inlining on binary similarity analysis

Binary similarity analysis is critical to many code-reuse-related issues and "1-to-1" mechanism is widely applied, where one function in a binary file is matched against one function in a source file or binary file. However, we discover…

Software Engineering · Computer Science 2022-05-06 Ang Jia , Ming Fan , Wuxia Jin , Xi Xu , Zhaohui Zhou , Qiyi Tang , Sen Nie , Shi Wu , Ting Liu

Practical Inlining of Functions with Free Variables

A long-standing practical challenge in the optimization of higher-order languages is inlining functions with free variables. Inlining code statically at a function call site is safe if the compiler can guarantee that the free variables have…

Programming Languages · Computer Science 2013-06-11 Lars Bergstrom , Matthew Fluet , John Reppy , Nora Sandler

Cross-Inlining Binary Function Similarity Detection

Binary function similarity detection plays an important role in a wide range of security applications. Existing works usually assume that the query function and target function share equal semantics and compare their full semantics to…

Software Engineering · Computer Science 2024-01-12 Ang Jia , Ming Fan , Xi Xu , Wuxia Jin , Haijun Wang , Ting Liu

What Makes and Breaks Safety Fine-tuning? A Mechanistic Study

Safety fine-tuning helps align Large Language Models (LLMs) with human preferences for their safe deployment. To better understand the underlying factors that make models safe via safety fine-tuning, we design a synthetic data generation…

Machine Learning · Computer Science 2024-08-22 Samyak Jain , Ekdeep Singh Lubana , Kemal Oksuz , Tom Joy , Philip H. S. Torr , Amartya Sanyal , Puneet K. Dokania

Profile-Guided, Multi-Version Binary Rewriting

The static instrumentation of machine code, also known as binary rewriting, is a power technique, but suffers from high runtime overhead compared to compiler-level instrumentation. Recent research has shown that tools can achieve…

Cryptography and Security · Computer Science 2021-05-11 Xiaozhu Meng , Buddhika Chamith , Ryan Newton

Binary Diff Summarization using Large Language Models

Security of software supply chains is necessary to ensure that software updates do not contain maliciously injected code or introduce vulnerabilities that may compromise the integrity of critical infrastructure. Verifying the integrity of…

Cryptography and Security · Computer Science 2025-09-30 Meet Udeshi , Venkata Sai Charan Putrevu , Prashanth Krishnamurthy , Prashant Anantharaman , Sean Carrick , Ramesh Karri , Farshad Khorrami

How Different Tokenization Algorithms Impact LLMs and Transformer Models for Binary Code Analysis

Tokenization is fundamental in assembly code analysis, impacting intrinsic characteristics like vocabulary size, semantic coverage, and extrinsic performance in downstream tasks. Despite its significance, tokenization in the context of…

Artificial Intelligence · Computer Science 2025-11-07 Ahmed Mostafa , Raisul Arefin Nahid , Samuel Mulder

Improving Robustness of ML Classifiers against Realizable Evasion Attacks Using Conserved Features

Machine learning (ML) techniques are increasingly common in security applications, such as malware and intrusion detection. However, ML models are often susceptible to evasion attacks, in which an adversary makes changes to the input (such…

Cryptography and Security · Computer Science 2019-05-14 Liang Tong , Bo Li , Chen Hajaj , Chaowei Xiao , Ning Zhang , Yevgeniy Vorobeychik

An Empirical Study on the Effectiveness of Large Language Models for Binary Code Understanding

Binary code analysis plays a pivotal role in the field of software security and is widely used in tasks such as software maintenance, malware detection, software vulnerability discovery, patch analysis, etc. However, unlike source code,…

Software Engineering · Computer Science 2025-05-01 Xiuwei Shang , Zhenkan Fu , Shaoyin Cheng , Guoqiang Chen , Gangyang Li , Li Hu , Weiming Zhang , Nenghai Yu

Binary analysis is a core component of many critical security tasks, including reverse engineering, malware analysis, and vulnerability detection. Manual analysis is often time-consuming, but identifying commonly-used or previously-seen…

Machine Learning · Computer Science 2024-10-31 Rebecca Saul , Chang Liu , Noah Fleischmann , Richard Zak , Kristopher Micinski , Edward Raff , James Holt

MLGO: a Machine Learning Guided Compiler Optimizations Framework

Leveraging machine-learning (ML) techniques for compiler optimizations has been widely studied and explored in academia. However, the adoption of ML in general-purpose, industry strength compilers has yet to happen. We propose MLGO, a…

Programming Languages · Computer Science 2021-01-14 Mircea Trofin , Yundi Qian , Eugene Brevdo , Zinan Lin , Krzysztof Choromanski , David Li

Security for Machine Learning-based Systems: Attacks and Challenges during Training and Inference

The exponential increase in dependencies between the cyber and physical world leads to an enormous amount of data which must be efficiently processed and stored. Therefore, computing paradigms are evolving towards machine learning…

Machine Learning · Computer Science 2019-04-09 Faiq Khalid , Muhammad Abdullah Hanif , Semeen Rehman , Muhammad Shafique

MLGOPerf: An ML Guided Inliner to Optimize Performance

For the past 25 years, we have witnessed an extensive application of Machine Learning to the Compiler space; the selection and the phase-ordering problem. However, limited works have been upstreamed into the state-of-the-art compilers,…

Programming Languages · Computer Science 2023-01-18 Amir H. Ashouri , Mostafa Elhoushi , Yuzhe Hua , Xiang Wang , Muhammad Asif Manzoor , Bryan Chan , Yaoqing Gao

Trusted Weights, Treacherous Optimizations? Optimization-Triggered Backdoor Attacks on LLMs

Inference optimization is a vital technique for deploying LLMs at scale. Compilation is the most widely adopted optimization technique for LLMs. While it assumes semantic equivalence between the original and compiled graphs, we first…

Cryptography and Security · Computer Science 2026-05-21 Yifei Wang , Tianlin Li , Xiaohan Zhang , Yida Yang , Xiaoyu Zhang , Li Pan

Digging Into the Internal: Causality-Based Analysis of LLM Function Calling

Function calling (FC) has emerged as a powerful technique for facilitating large language models (LLMs) to interact with external systems and perform structured tasks. However, the mechanisms through which it influences model behavior…

Software Engineering · Computer Science 2025-09-23 Zhenlan Ji , Daoyuan Wu , Wenxuan Wang , Pingchuan Ma , Shuai Wang , Lei Ma

Optimizing Function Layout for Mobile Applications

Function layout, also referred to as function reordering or function placement, is one of the most effective profile-guided compiler optimizations. By reordering functions in a binary, compilers are able to greatly improve the performance…

Programming Languages · Computer Science 2022-11-18 Ellis Hoag , Kyungwoo Lee , Julián Mestre , Sergey Pupyrev

Fine-Tuning Lowers Safety and Disrupts Evaluation Consistency

Fine-tuning a general-purpose large language model (LLM) for a specific domain or task has become a routine procedure for ordinary users. However, fine-tuning is known to remove the safety alignment features of the model, even when the…

Computation and Language · Computer Science 2025-06-23 Kathleen C. Fraser , Hillary Dawkins , Isar Nejadgholi , Svetlana Kiritchenko

CyberLLMInstruct: A Pseudo-malicious Dataset Revealing Safety-performance Trade-offs in Cyber Security LLM Fine-tuning

The integration of large language models (LLMs) into cyber security applications presents both opportunities and critical safety risks. We introduce CyberLLMInstruct, a dataset of 54,928 pseudo-malicious instruction-response pairs spanning…

Cryptography and Security · Computer Science 2025-09-18 Adel ElZemity , Budi Arief , Shujun Li

MPC-Minimized Secure LLM Inference

Many inference services based on large language models (LLMs) pose a privacy concern, either revealing user prompts to the service or the proprietary weights to the user. Secure inference offers a solution to this problem through secure…

Cryptography and Security · Computer Science 2024-08-08 Deevashwer Rathee , Dacheng Li , Ion Stoica , Hao Zhang , Raluca Popa

Safety Monitoring of Machine Learning Perception Functions: a Survey

Machine Learning (ML) models, such as deep neural networks, are widely applied in autonomous systems to perform complex perception tasks. New dependability challenges arise when ML predictions are used in safety-critical applications, like…

Machine Learning · Computer Science 2024-12-11 Raul Sena Ferreira , Joris Guérin , Kevin Delmas , Jérémie Guiochet , Hélène Waeselynck