Related papers: Talaria: Interactively Optimizing Machine Learning…

Your Inference Request Will Become a Black Box: Confidential Inference for Cloud-based Large Language Models

The increasing reliance on cloud-hosted Large Language Models (LLMs) exposes sensitive client data, such as prompts and responses, to potential privacy breaches by service providers. Existing approaches fail to ensure privacy, maintain…

Cryptography and Security · Computer Science 2026-03-03 Chung-ju Huang , Huiqiang Zhao , Yuanpeng He , Lijian Li , Wenpin Jiao , Zhi Jin , Peixuan Chen , Leye Wang

Model Compression in Practice: Lessons Learned from Practitioners Creating On-device Machine Learning Experiences

On-device machine learning (ML) promises to improve the privacy, responsiveness, and proliferation of new, intelligent user experiences by moving ML computation onto everyday personal devices. However, today's large ML models must be…

Human-Computer Interaction · Computer Science 2024-04-05 Fred Hohman , Mary Beth Kery , Donghao Ren , Dominik Moritz

SOTERIA: In Search of Efficient Neural Networks for Private Inference

ML-as-a-service is gaining popularity where a cloud server hosts a trained model and offers prediction (inference) service to users. In this setting, our objective is to protect the confidentiality of both the users' input queries as well…

Cryptography and Security · Computer Science 2020-07-28 Anshul Aggarwal , Trevor E. Carlson , Reza Shokri , Shruti Tople

Towards Designing a Self-Managed Machine Learning Inference Serving System inPublic Cloud

We are witnessing an increasing trend towardsusing Machine Learning (ML) based prediction systems, span-ning across different application domains, including productrecommendation systems, personal assistant devices, facialrecognition, etc.…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-08-24 Jashwant Raj Gunasekaran , Prashanth Thinakaran , Cyan Subhra Mishra , Mahmut Taylan Kandemir , Chita R. Das

Towards an Efficient ML System: Unveiling a Trade-off between Task Accuracy and Engineering Efficiency in a Large-scale Car Sharing Platform

Upon the significant performance of the supervised deep neural networks, conventional procedures of developing ML system are \textit{task-centric}, which aims to maximize the task accuracy. However, we scrutinized this \textit{task-centric}…

Computer Vision and Pattern Recognition · Computer Science 2022-10-14 Kyung Ho Park , Hyunhee Chung , Soonwoo Kwon

Efficient Multi-stage Inference on Tabular Data

Many ML applications and products train on medium amounts of input data but get bottlenecked in real-time inference. When implementing ML systems, conventional wisdom favors segregating ML code into services queried by product code via…

Machine Learning · Computer Science 2023-07-25 Daniel S Johnson , Igor L Markov

Deep learning and machine learning for Malaria detection: overview, challenges and future directions

To have the greatest impact, public health initiatives must be made using evidence-based decision-making. Machine learning Algorithms are created to gather, store, process, and analyse data to provide knowledge and guide decisions. A…

Machine Learning · Computer Science 2022-09-28 Imen Jdey , Ghazala Hcini , Hela Ltifi

Simulation-Based Optimization of User Interfaces for Quality-Assuring Machine Learning Model Predictions

Quality-sensitive applications of machine learning (ML) require quality assurance (QA) by humans before the predictions of an ML model can be deployed. QA for ML (QA4ML) interfaces require users to view a large amount of data and perform…

Human-Computer Interaction · Computer Science 2023-09-01 Yu Zhang , Martijn Tennekes , Tim de Jong , Lyana Curier , Bob Coecke , Min Chen

More Than Accuracy: Towards Trustworthy Machine Learning Interfaces for Object Recognition

This paper investigates the user experience of visualizations of a machine learning (ML) system that recognizes objects in images. This is important since even good systems can fail in unexpected ways as misclassifications on photo-sharing…

Human-Computer Interaction · Computer Science 2020-08-06 Hendrik Heuer , Andreas Breiter

On Using Information Retrieval to Recommend Machine Learning Good Practices for Software Engineers

Machine learning (ML) is nowadays widely used for different purposes and in several disciplines. From self-driving cars to automated medical diagnosis, machine learning models extensively support users' daily activities, and software…

Software Engineering · Computer Science 2023-08-28 Laura Cabra-Acela , Anamaria Mojica-Hanke , Mario Linares-Vásquez , Steffen Herbold

SneakPeek: Data-Aware Model Selection and Scheduling for Inference Serving on the Edge

Modern applications increasingly rely on inference serving systems to provide low-latency insights with a diverse set of machine learning models. Existing systems often utilize resource elasticity to scale with demand. However, many…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-05-13 Joel Wolfrath , Daniel Frink , Abhishek Chandra

Exploring the Boundaries of On-Device Inference: When Tiny Falls Short, Go Hierarchical

On-device inference holds great potential for increased energy efficiency, responsiveness, and privacy in edge ML systems. However, due to less capable ML models that can be embedded in resource-limited devices, use cases are limited to…

Machine Learning · Computer Science 2025-04-18 Adarsh Prasad Behera , Paulius Daubaris , Iñaki Bravo , José Gallego , Roberto Morabito , Joerg Widmer , Jaya Prakash Varma Champati

Guiding Optimizations with Meliora: A Deep Walk down Memory Lane

Performance models can be very useful for understanding the behavior of applications and hence can help guide design and optimization decisions. Unfortunately, performance modeling of nontrivial computations typically requires significant…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-06-18 Kewen Meng , Boyana Norris

Monitoring and Adapting ML Models on Mobile Devices

ML models are increasingly being pushed to mobile devices, for low-latency inference and offline operation. However, once the models are deployed, it is hard for ML operators to track their accuracy, which can degrade unpredictably (e.g.,…

Machine Learning · Computer Science 2023-05-18 Wei Hao , Zixi Wang , Lauren Hong , Lingxiao Li , Nader Karayanni , Chengzhi Mao , Junfeng Yang , Asaf Cidon

A Holistic Assessment of the Reliability of Machine Learning Systems

As machine learning (ML) systems increasingly permeate high-stakes settings such as healthcare, transportation, military, and national security, concerns regarding their reliability have emerged. Despite notable progress, the performance of…

Machine Learning · Computer Science 2023-08-01 Anthony Corso , David Karamadian , Romeo Valentin , Mary Cooper , Mykel J. Kochenderfer

An Interactive Machine Learning Framework

Machine learning (ML) is believed to be an effective and efficient tool to build reliable prediction model or extract useful structure from an avalanche of data. However, ML is also criticized by its difficulty in interpretation and…

Human-Computer Interaction · Computer Science 2016-10-19 Teng Lee , James Johnson , Steve Cheng

Define-ML: An Approach to Ideate Machine Learning-Enabled Systems

[Context] The increasing adoption of machine learning (ML) in software systems demands specialized ideation approaches that address ML-specific challenges, including data dependencies, technical feasibility, and alignment between business…

Software Engineering · Computer Science 2025-06-26 Silvio Alonso , Antonio Pedro Santos Alves , Lucas Romao , Hélio Lopes , Marcos Kalinowski

MLModelCI: An Automatic Cloud Platform for Efficient MLaaS

MLModelCI provides multimedia researchers and developers with a one-stop platform for efficient machine learning (ML) services. The system leverages DevOps techniques to optimize, test, and manage models. It also containerizes and deploys…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-12-16 Huaizheng Zhang , Yuanming Li , Yizheng Huang , Yonggang Wen , Jianxiong Yin , Kyle Guan

Deriva-ML: A Continuous FAIRness Approach to Reproducible Machine Learning Models

Increasingly, artificial intelligence (AI) and machine learning (ML) are used in eScience applications [9]. While these approaches have great potential, the literature has shown that ML-based approaches frequently suffer from results that…

Machine Learning · Computer Science 2024-07-03 Zhiwei Li , Carl Kesselman , Mike D'Arch , Michael Pazzani , Benjamin Yizing Xu

Simulating Performance of ML Systems with Offline Profiling

We advocate that simulation based on offline profiling is a promising approach to better understand and improve the complex ML systems. Our approach uses operation-level profiling and dataflow based simulation to ensure it offers a unified…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-02-18 Hongming Huang , Peng Cheng , Hong Xu , Yongqiang Xiong