Related papers: PerfXplain: Debugging MapReduce Job Performance

PERFEX: Classifier Performance Explanations for Trustworthy AI Systems

Explainability of a classification model is crucial when deployed in real-world decision support systems. Explanations make predictions actionable to the user and should inform about the capabilities and limitations of the system. Existing…

Machine Learning · Computer Science 2022-12-13 Erwin Walraven , Ajaya Adhikari , Cor J. Veenman

Explaining Documents' Relevance to Search Queries

We present GenEx, a generative model to explain search results to users beyond just showing matches between query and document words. Adding GenEx explanations to search results greatly impacts user satisfaction and search performance.…

Information Retrieval · Computer Science 2021-11-03 Razieh Rahimi , Youngwoo Kim , Hamed Zamani , James Allan

Hadoop Performance Models

Hadoop MapReduce is now a popular choice for performing large-scale data analytics. This technical report describes a detailed set of mathematical performance models for describing the execution of a MapReduce job on Hadoop. The models…

Distributed, Parallel, and Cluster Computing · Computer Science 2011-06-07 Herodotos Herodotou

Benchmarking and Performance Modelling of MapReduce Communication Pattern

Understanding and predicting the performance of big data applications running in the cloud or on-premises could help minimise the overall cost of operations and provide opportunities in efforts to identify performance bottlenecks. The…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-05-26 Sheriffo Ceesay , Adam Barker , Yuhui Lin

RAGXplain: From Explainable Evaluation to Actionable Guidance of RAG Pipelines

Retrieval-Augmented Generation (RAG) systems couple large language models with external knowledge, yet most evaluation methods report aggregate scores that reveal whether a pipeline underperforms but not where or why. We introduce…

Information Retrieval · Computer Science 2026-03-19 Dvir Cohen , Tamir Houri , Lin Burg , Gilad Barkan

Query Performance Explanation through Large Language Model for HTAP Systems

In hybrid transactional and analytical processing (HTAP) systems, users often struggle to understand why query plans from one engine (OLAP or OLTP) perform significantly slower than those from another. Although optimizers provide plan…

Databases · Computer Science 2024-12-03 Haibo Xiu , Li Zhang , Tieying Zhang , Jun Yang , Jianjun Chen

PerfGuard: A Performance-Aware Agent for Visual Content Generation

The advancement of Large Language Model (LLM)-powered agents has enabled automated task processing through reasoning and tool invocation capabilities. However, existing frameworks often operate under the idealized assumption that tool…

Artificial Intelligence · Computer Science 2026-03-06 Zhipeng Chen , Zhongrui Zhang , Chao Zhang , Yifan Xu , Lan Yang , Jun Liu , Ke Li , Yi-Zhe Song

A Sim2Real Approach for Identifying Task-Relevant Properties in Interpretable Machine Learning

Explanations of an AI's function can assist human decision-makers, but the most useful explanation depends on the decision's context, referred to as the downstream task. User studies are necessary to determine the best explanations for each…

Human-Computer Interaction · Computer Science 2024-09-20 Eura Nofshin , Esther Brown , Brian Lim , Weiwei Pan , Finale Doshi-Velez

Explainable Data-Driven Optimization: From Context to Decision and Back Again

Data-driven optimization uses contextual information and machine learning algorithms to find solutions to decision problems with uncertain parameters. While a vast body of work is dedicated to interpreting machine learning models in the…

Machine Learning · Computer Science 2023-07-21 Alexandre Forel , Axel Parmentier , Thibaut Vidal

Resolvable Designs for Speeding up Distributed Computing

Distributed computing frameworks such as MapReduce are often used to process large computational jobs. They operate by partitioning each job into smaller tasks executed on different servers. The servers also need to exchange intermediate…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-20 Konstantinos Konstantinidis , Aditya Ramamoorthy

Improving Neural Model Performance through Natural Language Feedback on Their Explanations

A class of explainable NLP models for reasoning tasks support their decisions by generating free-form or structured explanations, but what happens when these supporting structures contain errors? Our goal is to allow users to interactively…

Computation and Language · Computer Science 2021-04-20 Aman Madaan , Niket Tandon , Dheeraj Rajagopal , Yiming Yang , Peter Clark , Keisuke Sakaguchi , Ed Hovy

PerfCodeBench: Benchmarking LLMs for System-Level High-Performance Code Optimization

Large language models (LLMs) can often generate functionally correct code, but their ability to produce efficient implementations for performance-critical systems tasks remains limited. Existing code benchmarks mainly emphasize correctness…

Software Engineering · Computer Science 2026-05-18 Huihao Jing , Wenbin Hu , Haochen Shi , Hanyu Yang , Sirui Zhang , Shaojin Chen , Haoran Li , Yangqiu Song

PerfCoder: Large Language Models for Interpretable Code Performance Optimization

Large language models (LLMs) have achieved remarkable progress in automatic code generation, yet their ability to produce high-performance code remains limited--a critical requirement in real-world software systems. We argue that current…

Software Engineering · Computer Science 2026-05-11 Jiuding Yang , Shengyao Lu , Hongxuan Liu , Shayan Shirahmad Gale Bagi , Zahra Fazel , Tomasz Czajkowski , Di Niu

Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models

In this work, we investigate whether small language models can determine high-quality subsets of large-scale text datasets that improve the performance of larger language models. While existing work has shown that pruning based on the…

Machine Learning · Computer Science 2024-06-03 Zachary Ankner , Cody Blakeney , Kartik Sreenivasan , Max Marion , Matthew L. Leavitt , Mansheej Paul

XAudit : A Theoretical Look at Auditing with Explanations

Responsible use of machine learning requires models to be audited for undesirable properties. While a body of work has proposed using explanations for auditing, how to do so and why has remained relatively ill-understood. This work…

Machine Learning · Computer Science 2023-06-06 Chhavi Yadav , Michal Moshkovitz , Kamalika Chaudhuri

GraphLab: A New Framework For Parallel Machine Learning

Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and…

Machine Learning · Computer Science 2014-08-12 Yucheng Low , Joseph E. Gonzalez , Aapo Kyrola , Danny Bickson , Carlos E. Guestrin , Joseph Hellerstein

Causality by Abstraction: Symbolic Rule Learning in Multivariate Timeseries with Large Language Models

Inferring causal relations in timeseries data with delayed effects is a fundamental challenge, especially when the underlying system exhibits complex dynamics that cannot be captured by simple functional mappings. Traditional approaches…

Machine Learning · Computer Science 2026-02-23 Preetom Biswas , Giulia Pedrielli , K. Selçuk Candan

GraphLab: A New Framework for Parallel Machine Learning

Designing and implementing efficient, provably correct parallel machine learning (ML) algorithms is challenging. Existing high-level parallel abstractions like MapReduce are insufficiently expressive while low-level tools like MPI and…

Machine Learning · Computer Science 2010-06-28 Yucheng Low , Joseph Gonzalez , Aapo Kyrola , Danny Bickson , Carlos Guestrin , Joseph M. Hellerstein

TSEXPLAIN: Explaining Aggregated Time Series by Surfacing Evolving Contributors

Aggregated time series are generated effortlessly everywhere, e.g., "total confirmed covid-19 cases since 2019" and "total liquor sales over time." Understanding "how" and "why" these key performance indicators (KPI) evolve over time is…

Databases · Computer Science 2022-11-22 Yiru Chen , Silu Huang

Multi-Level Explanations for Generative Language Models

Despite the increasing use of large language models (LLMs) for context-grounded tasks like summarization and question-answering, understanding what makes an LLM produce a certain response is challenging. We propose Multi-Level Explanations…

Computation and Language · Computer Science 2025-07-24 Lucas Monteiro Paes , Dennis Wei , Hyo Jin Do , Hendrik Strobelt , Ronny Luss , Amit Dhurandhar , Manish Nagireddy , Karthikeyan Natesan Ramamurthy , Prasanna Sattigeri , Werner Geyer , Soumya Ghosh