English
Related papers

Related papers: Model Equality Testing: Which Model Is This API Se…

200 papers

As API access becomes a primary interface to large language models (LLMs), users often interact with black-box systems that offer little transparency into the deployed model. To reduce costs or maliciously alter model behaviors, API…

Cryptography and Security · Computer Science 2026-04-10 Xiaoyuan Zhu , Yaowen Ye , Tianyi Qiu , Hanlin Zhu , Sijun Tan , Ajraf Mannan , Jonathan Michala , Raluca Ada Popa , Willie Neiswanger

Large Language Models (LLMs) are often provided as a service via an API, making it challenging for developers to detect changes in their behavior. We present an approach to monitor LLMs for changes by comparing the distributions of…

Computation and Language · Computer Science 2025-04-18 Alden Dima , James Foulds , Shimei Pan , Philip Feldman

Large Language Model (LLM) inference systems present significant challenges in statistical performance characterization due to dynamic workload variations, diverse hardware architectures, and complex interactions between model size, batch…

Performance · Computer Science 2025-05-15 Kaustabha Ray , Nelson Mimura Gonzalez , Bruno Wassermann , Rachel Tzoref-Brill , Dean H. Lorenz

Machine learning (ML) prediction APIs are increasingly widely used. An ML API can change over time due to model updates or retraining. This presents a key challenge in the usage of the API because it is often not clear to the user if and…

Machine Learning · Statistics 2021-07-30 Lingjiao Chen , Tracy Cai , Matei Zaharia , James Zou

Large language models (LLMs) increasingly operate as autonomous agents that reason over external APIs to perform complex tasks. However, their reliability and agreement remain poorly characterized. We present a unified benchmarking…

Information Retrieval · Computer Science 2026-04-28 Eyhab Al-Masri

In the recent past, a popular way of evaluating natural language understanding (NLU), was to consider a model's ability to perform natural language inference (NLI) tasks. In this paper, we investigate if NLI tasks, that are rarely used for…

Computation and Language · Computer Science 2024-11-22 Lovish Madaan , David Esiobu , Pontus Stenetorp , Barbara Plank , Dieuwke Hupkes

The rapid advancement of large language models (LLMs) has raised concerns about reliably detecting AI-generated text. Stylometric metrics work well on autoregressive (AR) outputs, but their effectiveness on diffusion-based models is…

Computation and Language · Computer Science 2025-07-15 İsmail Tarım , Aytuğ Onan

Modern web services rely heavily on REST APIs, typically documented using the OpenAPI specification. The widespread adoption of this standard has resulted in the development of many black-box testing tools that generate tests based on…

Software Engineering · Computer Science 2025-04-07 Myeongsoo Kim , Saurabh Sinha , Alessandro Orso

Large Language Models (LLMs) exhibit systematic biases across demographic groups. Auditing is proposed as an accountability tool for black-box LLM applications, but suffers from resource-intensive query access. We conceptualise auditing as…

Machine Learning · Computer Science 2026-01-07 David Hartmann , Lena Pohlmann , Lelia Hanslik , Noah Gießing , Bettina Berendt , Pieter Delobelle

This paper presents solutions to the Machine Learning Model Attribution challenge (MLMAC) collectively organized by MITRE, Microsoft, Schmidt-Futures, Robust-Intelligence, Lincoln-Network, and Huggingface community. The challenge provides…

Computation and Language · Computer Science 2022-11-22 Farhan Dhanani , Muhammad Rafi

Large language models are becoming the go-to solution for the ever-growing number of tasks. However, with growing capacity, models are prone to rely on spurious correlations stemming from biases and stereotypes present in the training data.…

Computation and Language · Computer Science 2024-05-30 Tomasz Limisiewicz , David Mareček , Tomáš Musil

Large language models (LLMs) have progressed rapidly in complex reasoning and question answering, yet LLM hallucination remains a central bottleneck that hinders practical deployment, especially for commercial black-box LLMs accessible only…

Computation and Language · Computer Science 2026-05-08 Huizi Cui , Huan Ma , Qilin Wang , Yuhang Gao , Changqing Zhang

Large language models (LLMs) are useful in many NLP tasks and become more capable with size, with the best open-source models having over 50 billion parameters. However, using these 50B+ models requires high-end hardware, making them…

Machine Learning software systems are frequently used in our day-to-day lives. Some of these systems are used in various sensitive environments to make life-changing decisions. Therefore, it is crucial to ensure that these AI/ML systems do…

Machine Learning · Computer Science 2025-08-25 Ajoy Das , Gias Uddin , Shaiful Chowdhury , Mostafijur Rahman Akhond , Hadi Hemmati

As large language models (LLMs) are deployed widely, detecting and understanding bias in their outputs is critical. We present LLM BiasScope, a web application for side-by-side comparison of LLM outputs with real-time bias analysis. The…

Computation and Language · Computer Science 2026-03-30 Himel Ghosh , Nick Elias Werner

We present the findings of the Machine Learning Model Attribution Challenge. Fine-tuned machine learning models may derive from other trained models without obvious attribution characteristics. In this challenge, participants identify the…

Machine Learning · Computer Science 2023-02-20 Elizabeth Merkhofer , Deepesh Chaudhari , Hyrum S. Anderson , Keith Manville , Lily Wong , João Gante

As Language Model (LM) capabilities advance, evaluating and supervising them at scale is getting harder for humans. There is hope that other language models can automate both these tasks, which we refer to as ''AI Oversight''. We study how…

Large language models (LLMs) are trained on web-scale corpora that inevitably include contradictory factual information from sources of varying reliability. In this paper, we propose measuring an LLM property called trusted source alignment…

Computation and Language · Computer Science 2023-11-14 Vasilisa Bashlovkina , Zhaobin Kuang , Riley Matthews , Edward Clifford , Yennie Jun , William W. Cohen , Simon Baumgartner

The legal field already uses various large language models (LLMs) in actual applications, but their quantitative performance and reasons for it are underexplored. We evaluated several open-source and proprietary LLMs -- including…

Computers and Society · Computer Science 2025-09-12 Bhakti Khera , Rezvan Alamian , Pascal A. Scherz , Stephan M. Goetz

The auditing of financial documents, historically a labor-intensive process, stands on the precipice of transformation. AI-driven solutions have made inroads into streamlining this process by recommending pertinent text passages from…

‹ Prev 1 2 3 10 Next ›