Related papers: Looking into Black Box Code Language Models

Predicting the Performance of Black-box LLMs through Follow-up Queries

Reliably predicting the behavior of language models -- such as whether their outputs are correct or have been adversarially manipulated -- is a fundamentally challenging task. This is often made even more difficult as frontier language…

Machine Learning · Computer Science 2025-12-02 Dylan Sam , Marc Finzi , J. Zico Kolter

Investigating Advanced Reasoning of Large Language Models via Black-Box Environment Interaction

Existing tasks fall short in evaluating reasoning ability of Large Language Models (LLMs) in an interactive, unknown environment. This deficiency leads to the isolated assessment of deductive, inductive, and abductive reasoning, neglecting…

Artificial Intelligence · Computer Science 2026-05-07 Congchi Yin , Tianyi Wu , Yankai Shu , Alex Gu , Yunhan Wang , Jun Shao , Xun Jiang , Piji Li

Talking Heads: Understanding Inter-layer Communication in Transformer Language Models

Although it is known that transformer language models (LMs) pass features from early layers to later layers, it is not well understood how this information is represented and routed by the model. We analyze a mechanism used in two LMs to…

Computation and Language · Computer Science 2025-05-12 Jack Merullo , Carsten Eickhoff , Ellie Pavlick

Beyond the Black Box: A Survey on the Theory and Mechanism of Large Language Models

The rapid emergence of Large Language Models (LLMs) has precipitated a profound paradigm shift in Artificial Intelligence, delivering monumental engineering successes that increasingly impact modern society. However, a critical paradox…

Computation and Language · Computer Science 2026-03-13 Zeyu Gan , Ruifeng Ren , Wei Yao , Xiaolin Hu , Gengze Xu , Chen Qian , Huayi Tang , Zixuan Gong , Xinhao Yao , Pengwei Tang , Zhenxing Dou , Yong Liu

Opening the Black Box: Analyzing Attention Weights and Hidden States in Pre-trained Language Models for Non-language Tasks

Investigating deep learning language models has always been a significant research area due to the ``black box" nature of most advanced models. With the recent advancements in pre-trained language models based on transformers and their…

Computation and Language · Computer Science 2023-06-22 Mohamad Ballout , Ulf Krumnack , Gunther Heidemann , Kai-Uwe Kühnberger

Internal Planning in Language Models: Characterizing Horizon and Branch Awareness

The extent to which decoder-only language models (LMs) engage in planning, that is, organizing intermediate computations to support coherent long-range generation, remains an important question, with implications for interpretability,…

Artificial Intelligence · Computer Science 2026-02-17 Muhammed Ustaomeroglu , Baris Askin , Gauri Joshi , Carlee Joe-Wong , Guannan Qu

Large Language Models (LLMs) for Source Code Analysis: applications, models and datasets

Large language models (LLMs) and transformer-based architectures are increasingly utilized for source code analysis. As software systems grow in complexity, integrating LLMs into code analysis workflows becomes essential for enhancing…

Software Engineering · Computer Science 2025-03-25 Hamed Jelodar , Mohammad Meymani , Roozbeh Razavi-Far

Are Large Language Models Reliable AI Scientists? Assessing Reverse-Engineering of Black-Box Systems

Using AI to create autonomous researchers has the potential to accelerate scientific discovery. A prerequisite for this vision is understanding how well an AI model can identify the underlying structure of a black-box system from its…

Machine Learning · Computer Science 2025-05-26 Jiayi Geng , Howard Chen , Dilip Arumugam , Thomas L. Griffiths

Exploring the True Potential: Evaluating the Black-box Optimization Capability of Large Language Models

Large language models (LLMs) have demonstrated exceptional performance not only in natural language processing tasks but also in a great variety of non-linguistic domains. In diverse optimization scenarios, there is also a rising trend of…

Neural and Evolutionary Computing · Computer Science 2024-07-09 Beichen Huang , Xingyu Wu , Yu Zhou , Jibin Wu , Liang Feng , Ran Cheng , Kay Chen Tan

Position: Leverage Foundational Models for Black-Box Optimization

Undeniably, Large Language Models (LLMs) have stirred an extraordinary wave of innovation in the machine learning research domain, resulting in substantial impact across diverse fields such as reinforcement learning, robotics, and computer…

Machine Learning · Computer Science 2024-05-10 Xingyou Song , Yingtao Tian , Robert Tjarko Lange , Chansoo Lee , Yujin Tang , Yutian Chen

Pre-training Limited Memory Language Models with Internal and External Knowledge

Neural language models are black-boxes--both linguistic patterns and factual knowledge are distributed across billions of opaque parameters. This entangled encoding makes it difficult to reliably inspect, verify, or update specific facts.…

Computation and Language · Computer Science 2025-10-06 Linxi Zhao , Sofian Zalouk , Christian K. Belardi , Justin Lovelace , Jin Peng Zhou , Ryan Thomas Noonan , Dongyoung Go , Kilian Q. Weinberger , Yoav Artzi , Jennifer J. Sun

Frozen Transformers in Language Models Are Effective Visual Encoder Layers

This paper reveals that large language models (LLMs), despite being trained solely on textual data, are surprisingly strong encoders for purely visual tasks in the absence of language. Even more intriguingly, this can be achieved by a…

Computer Vision and Pattern Recognition · Computer Science 2024-05-07 Ziqi Pang , Ziyang Xie , Yunze Man , Yu-Xiong Wang

Exploring Concept Depth: How Large Language Models Acquire Knowledge and Concept at Different Layers?

Large language models (LLMs) have shown remarkable performances across a wide range of tasks. However, the mechanisms by which these models encode tasks of varying complexities remain poorly understood. In this paper, we explore the…

Computation and Language · Computer Science 2025-02-06 Mingyu Jin , Qinkai Yu , Jingyuan Huang , Qingcheng Zeng , Zhenting Wang , Wenyue Hua , Haiyan Zhao , Kai Mei , Yanda Meng , Kaize Ding , Fan Yang , Mengnan Du , Yongfeng Zhang

A Closer Look into LLMs for Table Understanding

Despite the success of Large Language Models (LLMs) in table understanding, their internal mechanisms remain unclear. In this paper, we conduct an empirical study on 16 LLMs, covering general LLMs, specialist tabular LLMs, and…

Computation and Language · Computer Science 2026-03-17 Jia Wang , Chuanyu Qin , Mingyu Zheng , Qingyi Si , Peize Li , Zheng Lin

Language Model Meets Prototypes: Towards Interpretable Text Classification Models through Prototypical Networks

Pretrained transformer-based Language Models (LMs) are well-known for their ability to achieve significant improvement on NLP tasks, but their black-box nature, which leads to a lack of interpretability, has been a major concern. My…

Computation and Language · Computer Science 2024-12-06 Ximing Wen

Prior Knowledge or Search? A Study of LLM Agents in Hardware-Aware Code Optimization

LLM discovery and optimization systems are increasingly applied across domains, implementing a common propose-evaluate-revise loop. Such optimization or discovery progresses via context conditioning on received feedback from an environment.…

Artificial Intelligence · Computer Science 2026-05-20 Dmitry Redko , Albert Fazlyev , Konstantin Sozykin , Maria Ivanova , Evgeny Burnaev , Egor Shvetsov

Transformer Feed-Forward Layers Are Key-Value Memories

Feed-forward layers constitute two-thirds of a transformer model's parameters, yet their role in the network remains under-explored. We show that feed-forward layers in transformer-based language models operate as key-value memories, where…

Computation and Language · Computer Science 2021-09-07 Mor Geva , Roei Schuster , Jonathan Berant , Omer Levy

A Survey of Calibration Process for Black-Box LLMs

Large Language Models (LLMs) demonstrate remarkable performance in semantic understanding and generation, yet accurately assessing their output reliability remains a significant challenge. While numerous studies have explored calibration…

Artificial Intelligence · Computer Science 2024-12-18 Liangru Xie , Hui Liu , Jingying Zeng , Xianfeng Tang , Yan Han , Chen Luo , Jing Huang , Zhen Li , Suhang Wang , Qi He

Backward Lens: Projecting Language Model Gradients into the Vocabulary Space

Understanding how Transformer-based Language Models (LMs) learn and recall information is a key goal of the deep learning community. Recent interpretability methods project weights and hidden states obtained from the forward pass to the…

Computation and Language · Computer Science 2024-02-21 Shahar Katz , Yonatan Belinkov , Mor Geva , Lior Wolf

Linguistic Interpretability of Transformer-based Language Models: a systematic review

Language models based on the Transformer architecture achieve excellent results in many language-related tasks, such as text classification or sentiment analysis. However, despite the architecture of these models being well-defined, little…

Computation and Language · Computer Science 2025-04-14 Miguel López-Otal , Jorge Gracia , Jordi Bernad , Carlos Bobed , Lucía Pitarch-Ballesteros , Emma Anglés-Herrero