Related papers: CodeCipher: Learning to Obfuscate Source Code Agai…

Privacy in Cloud Computing through Immersion-based Coding

Cloud computing enables users to process and store data remotely on high-performance computers and servers by sharing data over the Internet. However, transferring data to clouds causes unavoidable privacy concerns. Here, we present a…

Cryptography and Security · Computer Science 2024-08-12 Haleh Hayati , Nathan van de Wouw , Carlos Murguia

Obfuscation using Encryption

Protecting source code against reverse engineering and theft is an important problem. The goal is to carry out computations using confidential algorithms on an untrusted party while ensuring confidentiality of algorithms. This problem has…

Cryptography and Security · Computer Science 2016-12-13 Johannes Schneider , Thomas Locher

Privacy-preserved LLM Cascade via CoT-enhanced Policy Learning

Large Language Models (LLMs) have gained significant attention in on-device applications due to their remarkable performance across real-world tasks. However, on-device LLMs often suffer from suboptimal performance due to hardware…

Computation and Language · Computer Science 2025-03-03 Kai Zhang , Congchao Wang , Liqian Peng , Alec Go , Xiaozhong Liu

Cross-Cloud Data Privacy Protection: Optimizing Collaborative Mechanisms of AI Systems by Integrating Federated Learning and LLMs

In the age of cloud computing, data privacy protection has become a major challenge, especially when sharing sensitive data across cloud environments. However, how to optimize collaboration across cloud environments remains an unresolved…

Cryptography and Security · Computer Science 2025-05-20 Huaiying Luo , Cheng Ji

CodeChameleon: Personalized Encryption Framework for Jailbreaking Large Language Models

Adversarial misuse, particularly through `jailbreaking' that circumvents a model's safety and ethical protocols, poses a significant challenge for Large Language Models (LLMs). This paper delves into the mechanisms behind such successful…

Computation and Language · Computer Science 2024-02-27 Huijie Lv , Xiao Wang , Yuansen Zhang , Caishuang Huang , Shihan Dou , Junjie Ye , Tao Gui , Qi Zhang , Xuanjing Huang

Preserving Privacy in Large Language Models: A Survey on Current Threats and Solutions

Large Language Models (LLMs) represent a significant advancement in artificial intelligence, finding applications across various domains. However, their reliance on massive internet-sourced datasets for training brings notable privacy…

Cryptography and Security · Computer Science 2025-02-11 Michele Miranda , Elena Sofia Ruzzetti , Andrea Santilli , Fabio Massimo Zanzotto , Sébastien Bratières , Emanuele Rodolà

RL-Finetuned LLMs for Privacy-Preserving Synthetic Rewriting

The performance of modern machine learning systems depends on access to large, high-quality datasets, often sourced from user-generated content or proprietary, domain-specific corpora. However, these rich datasets inherently contain…

Cryptography and Security · Computer Science 2025-08-28 Zhan Shi , Yefeng Yuan , Yuhong Liu , Liang Cheng , Yi Fang

Benchmarking Large Language Models for IoC Recovery under Adversarial Code Obfuscation and Encryption

Software obfuscation and encryption present persistent challenges for program comprehension and security analysis, particularly when adversaries conceal Indicators of Compromise (IoCs) such as IP addresses within source code. While Large…

Cryptography and Security · Computer Science 2026-05-11 Jaime Morales , Sergio Pastrana , Juan Tapiador

A Practical and Privacy-Preserving Framework for Real-World Large Language Model Services

Large language models (LLMs) have demonstrated exceptional capabilities in text understanding and generation, and they are increasingly being utilized across various domains to enhance productivity. However, due to the high costs of…

Cryptography and Security · Computer Science 2024-11-05 Yu Mao , Xueping Liao , Wei Liu , Anjia Yang

An Effective Approach to Embedding Source Code by Combining Large Language and Sentence Embedding Models

The advent of large language models (LLMs) has significantly advanced artificial intelligence (AI) in software engineering (SE), with source code embeddings playing a crucial role in tasks such as source code clone detection and source code…

Software Engineering · Computer Science 2025-06-04 Zixiang Xian , Chenhui Cui , Rubing Huang , Chunrong Fang , Zhenyu Chen

Immersion and Invariance-based Coding for Privacy-Preserving Federated Learning

Federated learning (FL) has emerged as a method to preserve privacy in collaborative distributed learning. In FL, clients train AI models directly on their devices rather than sharing data with a centralized server, which can pose privacy…

Cryptography and Security · Computer Science 2024-11-26 Haleh Hayati , Carlos Murguia , Nathan van de Wouw

Understanding Privacy Risks of Embeddings Induced by Large Language Models

Large language models (LLMs) show early signs of artificial general intelligence but struggle with hallucinations. One promising solution to mitigate these hallucinations is to store external knowledge as embeddings, aiding LLMs in…

Computation and Language · Computer Science 2024-04-26 Zhihao Zhu , Ninglu Shao , Defu Lian , Chenwang Wu , Zheng Liu , Yi Yang , Enhong Chen

SoK: The Privacy Paradox of Large Language Models: Advancements, Privacy Risks, and Mitigation

Large language models (LLMs) are sophisticated artificial intelligence systems that enable machines to generate human-like text with remarkable precision. While LLMs offer significant technological progress, their development using vast…

Cryptography and Security · Computer Science 2025-06-23 Yashothara Shanmugarasa , Ming Ding , M. A. P Chamikara , Thierry Rakotoarivelo

Using AI/ML to Find and Remediate Enterprise Secrets in Code & Document Sharing Platforms

We introduce a new challenge to the software development community: 1) leveraging AI to accurately detect and flag up secrets in code and on popular document sharing platforms that frequently used by developers, such as Confluence and 2)…

Software Engineering · Computer Science 2024-01-04 Gregor Kerr , David Algorry , Senad Ibraimoski , Peter Maciver , Sean Moran

Codexity: Secure AI-assisted Code Generation

Despite the impressive performance of Large Language Models (LLMs) in software development activities, recent studies show the concern of introducing vulnerabilities into software codebase by AI programming assistants (e.g., Copilot,…

Software Engineering · Computer Science 2024-05-08 Sung Yong Kim , Zhiyu Fan , Yannic Noller , Abhik Roychoudhury

CodeCloak: A Method for Evaluating and Mitigating Code Leakage by LLM Code Assistants

LLM-based code assistants are becoming increasingly popular among developers. These tools help developers improve their coding efficiency and reduce errors by providing real-time suggestions based on the developer's codebase. While…

Cryptography and Security · Computer Science 2024-10-30 Amit Finkman Noah , Avishag Shapira , Eden Bar Kochva , Inbar Maimon , Dudu Mimran , Yuval Elovici , Asaf Shabtai

CodedPrivateML: A Fast and Privacy-Preserving Framework for Distributed Machine Learning

How to train a machine learning model while keeping the data private and secure? We present CodedPrivateML, a fast and scalable approach to this critical problem. CodedPrivateML keeps both the data and the model information-theoretically…

Machine Learning · Computer Science 2021-02-23 Jinhyun So , Basak Guler , A. Salman Avestimehr

CodeMorph: Mitigating Data Leakage in Large Language Model Assessment

Concerns about benchmark leakage in large language models for code (Code LLMs) have raised issues of data contamination and inflated evaluation metrics. The diversity and inaccessibility of many training datasets make it difficult to…

Software Engineering · Computer Science 2025-06-24 Hongzhou Rao , Yanjie Zhao , Wenjie Zhu , Ling Xiao , Meizhen Wang , Haoyu Wang

A General Pseudonymization Framework for Cloud-Based LLMs: Replacing Privacy Information in Controlled Text Generation

An increasing number of companies have begun providing services that leverage cloud-based large language models (LLMs), such as ChatGPT. However, this development raises substantial privacy concerns, as users' prompts are transmitted to and…

Cryptography and Security · Computer Science 2025-02-24 Shilong Hou , Ruilin Shang , Zi Long , Xianghua Fu , Yin Chen

EmojiPrompt: Generative Prompt Obfuscation for Privacy-Preserving Communication with Cloud-based LLMs

Cloud-based Large Language Models (LLMs) such as ChatGPT have become increasingly integral to daily operations. Nevertheless, they also introduce privacy concerns: firstly, numerous studies underscore the risks to user privacy posed by…

Computation and Language · Computer Science 2025-03-24 Sam Lin , Wenyue Hua , Zhenting Wang , Mingyu Jin , Lizhou Fan , Yongfeng Zhang