Related papers: AdaptMemBench: Application-Specific MemorySubsyste…

MemBench: Towards More Comprehensive Evaluation on the Memory of LLM-based Agents

Recent works have highlighted the significance of memory mechanisms in LLM-based agents, which enable them to store observed information and adapt to dynamic environments. However, evaluating their memory capabilities still remains…

Computation and Language · Computer Science 2025-06-30 Haoran Tan , Zeyu Zhang , Chen Ma , Xu Chen , Quanyu Dai , Zhenhua Dong

LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning

Labeled data are critical to modern machine learning applications, but obtaining labels can be expensive. To mitigate this cost, machine learning methods, such as transfer learning, semi-supervised learning and active learning, aim to be…

Machine Learning · Computer Science 2024-03-05 Jifan Zhang , Yifang Chen , Gregory Canal , Stephen Mussmann , Arnav M. Das , Gantavya Bhatt , Yinglun Zhu , Jeffrey Bilmes , Simon Shaolei Du , Kevin Jamieson , Robert D Nowak

RT-Bench: an Extensible Benchmark Framework for the Analysis and Management of Real-Time Applications

Benchmarking is crucial for testing and validating any system, even more so in real-time systems. Typical real-time applications adhere to well-understood abstractions: they exhibit a periodic behavior, operate on a well-defined working…

Software Engineering · Computer Science 2022-08-02 Mattia Nicolella , Shahin Roozkhosh , Denis Hoornaert , Andrea Bastoni , Renato Mancuso

Heterogeneous Memory Benchmarking Toolkit

This paper presents an open-source kernel-level heterogeneous memory characterization framework (MemScope) for embedded systems. MemScope enables precise characterization of the temporal behavior of available memory modules under…

Hardware Architecture · Computer Science 2025-07-08 Golsana Ghaemi , Gabriel Franco , Kazem Taram , Renato Mancuso

Measuring what matters: A scalable framework for application-level quantum benchmarking

As quantum computing systems continue to mature, there is an increasing need for benchmarking methodologies that capture performance in terms of meaningful, application-level metrics. In this work, we present a scalable framework for…

Quantum Physics · Physics 2026-04-14 Willie Aboumrad , Claudio Girotto , Joshua Goings , Luning Zhao , Miguel Angel Lopez-Ruiz , Daiwei Zhu , Ananth Kaushik , Sayonee Ray , Samwel Sekwao , Jason Iaconis , Andrew Arrasmith , Andrii Maksymov , Yvette de Sereville , Felix Tripier , Far McKon , Coleman Collins , Evgeny Epifanovsky , Masako Yamada , Martin Roetteler

MemoryBench: A Benchmark for Memory and Continual Learning in LLM Systems

Scaling up data, parameters, and test-time computation has been the mainstream methods to improve LLM systems (LLMsys), but their upper bounds are almost reached due to the gradual depletion of high-quality data and marginal gains obtained…

Machine Learning · Computer Science 2026-05-12 Qingyao Ai , Yichen Tang , Changyue Wang , Jianming Long , Weihang Su , Yiqun Liu

A Mess of Memory System Benchmarking, Simulation and Application Profiling

The Memory stress (Mess) framework provides a unified view of the memory system benchmarking, simulation and application profiling. The Mess benchmark provides a holistic and detailed memory system characterization. It is based on hundreds…

Hardware Architecture · Computer Science 2024-12-10 Pouya Esmaili-Dokht , Francesco Sgherzi , Valeria Soldera Girelli , Isaac Boixaderas , Mariana Carmin , Alireza Monemi , Adria Armejach , Estanislao Mercadal , German Llort , Petar Radojkovic , Miquel Moreto , Judit Gimenez , Xavier Martorell , Eduard Ayguade , Jesus Labarta , Emanuele Confalonieri , Rishabh Dubey , Jason Adlard

Application-Oriented Performance Benchmarks for Quantum Computing

In this work we introduce an open source suite of quantum application-oriented performance benchmarks that is designed to measure the effectiveness of quantum computing hardware at executing quantum applications. These benchmarks probe a…

Quantum Physics · Physics 2025-04-15 Thomas Lubinski , Sonika Johri , Paul Varosy , Jeremiah Coleman , Luning Zhao , Jason Necaise , Charles H. Baldwin , Karl Mayer , Timothy Proctor

SProBench: Stream Processing Benchmark for High Performance Computing Infrastructure

Recent advancements in data stream processing frameworks have improved real-time data handling, however, scalability remains a significant challenge affecting throughput and latency. While studies have explored this issue on local machines…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-04-04 Apurv Deepak Kulkarni , Siavash Ghiasvand

ConsumerBench: Benchmarking Generative AI Applications on End-User Devices

The recent shift in Generative AI (GenAI) applications from cloud-only environments to end-user devices introduces new challenges in resource management, system efficiency, and user experience. This paper presents ConsumerBench, a…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-06-24 Yile Gu , Rohan Kadekodi , Hoang Nguyen , Keisuke Kamahori , Yiyu Liu , Baris Kasikci

NeuroBench: A Framework for Benchmarking Neuromorphic Computing Algorithms and Systems

Neuromorphic computing shows promise for advancing computing efficiency and capabilities of AI applications using brain-inspired principles. However, the neuromorphic research field currently lacks standardized benchmarks, making it…

Artificial Intelligence · Computer Science 2025-01-16 Jason Yik , Korneel Van den Berghe , Douwe den Blanken , Younes Bouhadjar , Maxime Fabre , Paul Hueber , Weijie Ke , Mina A Khoei , Denis Kleyko , Noah Pacik-Nelson , Alessandro Pierro , Philipp Stratmann , Pao-Sheng Vincent Sun , Guangzhi Tang , Shenqi Wang , Biyan Zhou , Soikat Hasan Ahmed , George Vathakkattil Joseph , Benedetto Leto , Aurora Micheli , Anurag Kumar Mishra , Gregor Lenz , Tao Sun , Zergham Ahmed , Mahmoud Akl , Brian Anderson , Andreas G. Andreou , Chiara Bartolozzi , Arindam Basu , Petrut Bogdan , Sander Bohte , Sonia Buckley , Gert Cauwenberghs , Elisabetta Chicca , Federico Corradi , Guido de Croon , Andreea Danielescu , Anurag Daram , Mike Davies , Yigit Demirag , Jason Eshraghian , Tobias Fischer , Jeremy Forest , Vittorio Fra , Steve Furber , P. Michael Furlong , William Gilpin , Aditya Gilra , Hector A. Gonzalez , Giacomo Indiveri , Siddharth Joshi , Vedant Karia , Lyes Khacef , James C. Knight , Laura Kriener , Rajkumar Kubendran , Dhireesha Kudithipudi , Shih-Chii Liu , Yao-Hong Liu , Haoyuan Ma , Rajit Manohar , Josep Maria Margarit-Taulé , Christian Mayr , Konstantinos Michmizos , Dylan R. Muir , Emre Neftci , Thomas Nowotny , Fabrizio Ottati , Ayca Ozcelikkale , Priyadarshini Panda , Jongkil Park , Melika Payvand , Christian Pehle , Mihai A. Petrovici , Christoph Posch , Alpha Renner , Yulia Sandamirskaya , Clemens JS Schaefer , André van Schaik , Johannes Schemmel , Samuel Schmidgall , Catherine Schuman , Jae-sun Seo , Sadique Sheik , Sumit Bam Shrestha , Manolis Sifalakis , Amos Sironi , Kenneth Stewart , Matthew Stewart , Terrence C. Stewart , Jonathan Timcheck , Nergis Tömen , Gianvito Urgese , Marian Verhelst , Craig M. Vineyard , Bernhard Vogginger , Amirreza Yousefzadeh , Fatima Tuz Zohora , Charlotte Frenkel , Vijay Janapa Reddi

AMES: Asymmetric and Memory-Efficient Similarity Estimation for Instance-level Retrieval

This work investigates the problem of instance-level image retrieval re-ranking with the constraint of memory efficiency, ultimately aiming to limit memory usage to 1KB per image. Departing from the prevalent focus on performance…

Computer Vision and Pattern Recognition · Computer Science 2024-08-07 Pavel Suma , Giorgos Kordopatis-Zilos , Ahmet Iscen , Giorgos Tolias

nanoBench: A Low-Overhead Tool for Running Microbenchmarks on x86 Systems

We present nanoBench, a tool for evaluating small microbenchmarks using hardware performance counters on Intel and AMD x86 systems. Most existing tools and libraries are intended to either benchmark entire programs, or program segments in…

Performance · Computer Science 2020-11-04 Andreas Abel , Jan Reineke

ImplicitMemBench: Measuring Unconscious Behavioral Adaptation in Large Language Models

Existing memory benchmarks for LLM agents evaluate explicit recall of facts, yet overlook implicit memory where experience becomes automated behavior without conscious retrieval. This gap is critical: effective assistants must automatically…

Artificial Intelligence · Computer Science 2026-04-16 Chonghan Qin , Xiachong Feng , Weitao Ma , Xiaocheng Feng , Lingpeng Kong

ScALPEL: A Scalable Adaptive Lightweight Performance Evaluation Library for application performance monitoring

As supercomputers continue to grow in scale and capabilities, it is becoming increasingly difficult to isolate processor and system level causes of performance degradation. Over the last several years, a significant number of performance…

Distributed, Parallel, and Cluster Computing · Computer Science 2009-03-03 Hari K. Pyla , Bharath Ramesh , Calvin J. Ribbens , Srinidhi Varadarajan

LLMeBench: A Flexible Framework for Accelerating LLMs Benchmarking

The recent development and success of Large Language Models (LLMs) necessitate an evaluation of their performance across diverse NLP tasks in different languages. Although several frameworks have been developed and made publicly available,…

Computation and Language · Computer Science 2024-02-27 Fahim Dalvi , Maram Hasanain , Sabri Boughorbel , Basel Mousi , Samir Abdaljalil , Nizi Nazar , Ahmed Abdelali , Shammur Absar Chowdhury , Hamdy Mubarak , Ahmed Ali , Majd Hawasly , Nadir Durrani , Firoj Alam

RMBench: Memory-Dependent Robotic Manipulation Benchmark with Insights into Policy Design

Robotic manipulation policies have made rapid progress in recent years, yet most existing approaches give limited consideration to memory capabilities. Consequently, they struggle to solve tasks that require reasoning over historical…

Robotics · Computer Science 2026-03-17 Tianxing Chen , Yuran Wang , Mingleyang Li , Yan Qin , Hao Shi , Zixuan Li , Yifan Hu , Yingsheng Zhang , Kaixuan Wang , Yue Chen , Hongcheng Wang , Renjing Xu , Ruihai Wu , Yao Mu , Yaodong Yang , Hao Dong , Ping Luo

MedMemoryBench: Benchmarking Agent Memory in Personalized Healthcare

The large-scale deployment of personalized healthcare agents demands memory mechanisms that are exceptionally precise, safe, and capable of long-term clinical tracking. However, existing benchmarks primarily focus on daily open-domain…

Artificial Intelligence · Computer Science 2026-05-13 Yihao Wang , Haoran Xu , Renjie Gu , Yixuan Ye , Xinyi Chen , Xinyu Mu , Yuan Gao , Chunxiao Guo , Peng Wei , Jinjie Gu , Huan Li , Ke Chen , Lidan Shou

Adaptive Matching of Kernel Means

As a promising step, the performance of data analysis and feature learning are able to be improved if certain pattern matching mechanism is available. One of the feasible solutions can refer to the importance estimation of instances, and…

Machine Learning · Computer Science 2020-11-17 Miao Cheng , Xinge You

VehicleMemBench: An Executable Benchmark for Multi-User Long-Term Memory in In-Vehicle Agents

With the growing demand for intelligent in-vehicle experiences, vehicle-based agents are evolving from simple assistants to long-term companions. This evolution requires agents to continuously model multi-user preferences and make reliable…

Artificial Intelligence · Computer Science 2026-03-26 Yuhao Chen , Yi Xu , Xinyun Ding , Xiang Fang , Shuochen Liu , Luxi Lin , Qingyu Zhang , Ya Li , Quan Liu , Tong Xu