English
Related papers

Related papers: Continuous Prefetch for Interactive Data Applicati…

200 papers

Achieving faster execution with shorter compilation time can foster further diversity and innovation in neural networks. However, the current paradigm of executing neural networks either relies on hand-optimized libraries, traditional…

Machine Learning · Computer Science 2020-01-27 Byung Hoon Ahn , Prannoy Pilligundla , Amir Yazdanbakhsh , Hadi Esmaeilzadeh

The widespread adoption of LLMs has driven an exponential rise in their deployment, imposing substantial demands on inference clusters. These clusters must handle numerous concurrent queries for different LLM downstream tasks. To handle…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-11-14 Nikoleta Iliakopoulou , Jovan Stojkovic , Chloe Alverti , Tianyin Xu , Hubertus Franke , Josep Torrellas

Multimodal Artificial Intelligence (AI) systems, particularly Vision-Language Models (VLMs), have become integral to critical applications ranging from autonomous decision-making to automated document processing. As these systems scale,…

Artificial Intelligence · Computer Science 2025-12-05 M Zeeshan , Saud Satti

We present Chameleon, a novel hybrid (mixed-protocol) framework for secure function evaluation (SFE) which enables two parties to jointly compute a function without disclosing their private inputs. Chameleon combines the best aspects of…

Cryptography and Security · Computer Science 2018-01-11 M. Sadegh Riazi , Christian Weinert , Oleksandr Tkachenko , Ebrahim M. Songhori , Thomas Schneider , Farinaz Koushanfar

The increasing size of large language models (LLMs) has led to a surge in memory requirements during training, often exceeding the capacity of high-bandwidth memory (HBM). Swap-based memory optimization incurs neither accuracy loss nor…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-16 Zibo Wang , Yuhang Zhou , Zhibin Wang , Shipeng Li , Xinjing Huang , Chendong Cai , Bingxu Mu , Yuqing Sun , Zhiheng Hu , Bin She , Shu You , Guanghuan Fang , Rong Gu , Wanchun Dou , Guihai Chen , Chen Tian

This paper proposes an efficient neural network (NN) architecture design methodology called Chameleon that honors given resource constraints. Instead of developing new building blocks or using computationally-intensive reinforcement…

Computer Vision and Pattern Recognition · Computer Science 2018-12-24 Xiaoliang Dai , Peizhao Zhang , Bichen Wu , Hongxu Yin , Fei Sun , Yanghan Wang , Marat Dukhan , Yunqing Hu , Yiming Wu , Yangqing Jia , Peter Vajda , Matt Uyttendaele , Niraj K. Jha

This paper presents PALPATINE, the first in-memory application-level cache for Distributed Key-Value (DKV) data stores, capable of prefetching data that is likely to be accessed in an immediate future. To predict data accesses, PALPATINE…

Distributed, Parallel, and Cluster Computing · Computer Science 2020-04-24 Sergio Esteves , Joao Nuno Silva , Luis Veiga

A Retrieval-Augmented Language Model (RALM) combines a large language model (LLM) with a vector database to retrieve context-specific knowledge during text generation. This strategy facilitates impressive generation quality even with…

Machine Learning · Computer Science 2025-03-26 Wenqi Jiang , Marco Zeller , Roger Waleffe , Torsten Hoefler , Gustavo Alonso

Fully homomorphic encryption (FHE) enables direct computation on encrypted data, making it a crucial technology for privacy protection. However, FHE suffers from significant performance bottlenecks. In this context, GPU acceleration offers…

Cryptography and Security · Computer Science 2024-10-10 Zhiwei Wang , Haoqi He , Lutan Zhao , Peinan Li , Zhihao Li , Dan Meng , Rui Hou

On-device learning at the edge enables low-latency, private personalization with improved long-term robustness and reduced maintenance costs. Yet, achieving scalable, low-power end-to-end on-chip learning, especially from real-world…

Hardware Architecture · Computer Science 2026-03-31 Douwe den Blanken , Charlotte Frenkel

Dark mode has gained widespread adoption across mobile platforms due to its benefits in reducing eye strain and conserving battery life. However, while the mobile system switches to dark mode, most visualizations remain designed for light…

Human-Computer Interaction · Computer Science 2026-01-06 Manusha Karunathilaka , Songheng Zhang , Anthony Tang , Kotaro Hara , Jiannan Li , Yong Wang

Embedding is a useful technique to project a high-dimensional feature into a low-dimensional space, and it has many successful applications including link prediction, node classification and natural language processing. Current approaches…

Information Retrieval · Computer Science 2020-09-21 Meimei Liu , Hongxia Yang

Large language models have evolved data-efficient generalists, benefiting from the universal language interface and large-scale pre-training. However, constructing a data-efficient generalist for dense visual prediction presents a distinct…

Computer Vision and Pattern Recognition · Computer Science 2024-12-20 Donggyun Kim , Seongwoong Cho , Semin Kim , Chong Luo , Seunghoon Hong

Unified Virtual Memory (UVM) relieves the developers from the onus of maintaining complex data structures and explicit data migration by enabling on-demand data movement between CPU memory and GPU memory. However, on-demand paging soon…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-11 Xinjian Long , Xiangyang Gong , Huiyang Zhou

Developing, scaling, and deploying modern Machine Learning solutions remains challenging for small- and middle-sized enterprises (SMEs). This is due to a high entry barrier of building and maintaining a dedicated IT team as well as the…

Software Engineering · Computer Science 2021-05-11 Johannes Otterbach , Thomas Wollmann

Personalizing large language models (LLMs) is essential for delivering tailored interactions that improve user experience. Many existing personalization methods require fine-tuning LLMs for each user, rendering them prohibitively expensive…

Machine Learning · Computer Science 2025-03-06 Yijing Zhang , Dyah Adila , Changho Shin , Frederic Sala

Prefetching web pages is a well-studied solution to reduce network latency by predicting users' future actions based on their past behaviors. However, such techniques are largely unexplored on mobile platforms. Today's privacy regulations…

Software Engineering · Computer Science 2021-03-25 Yixue Zhao , Siwei Yin , Adriana Sejfia , Marcelo Schmitt Laser , Haoyu Wang , Nenad Medvidovic

Prefetching is a crucial technique employed in traditional databases to enhance interactivity, particularly in the context of data exploitation. Data exploration is a query processing paradigm in which users search for insights buried in…

Databases · Computer Science 2025-02-24 Farzaneh Zirak , Farhana Choudhury , Renata Borovica-Gajic

The recent proposal of learned index structures opens up a new perspective on how traditional range indexes can be optimized. However, the current learned indexes assume the data distribution is relatively static and the access pattern is…

Machine Learning · Computer Science 2019-02-05 Chuzhe Tang , Zhiyuan Dong , Minjie Wang , Zhaoguo Wang , Haibo Chen

Large language models (LLMs) power a new generation of interactive AI applications exemplified by ChatGPT. The interactive nature of these applications demands low latency for LLM inference. Existing LLM serving systems use…

Machine Learning · Computer Science 2024-09-26 Bingyang Wu , Yinmin Zhong , Zili Zhang , Shengyu Liu , Fangyue Liu , Yuanhang Sun , Gang Huang , Xuanzhe Liu , Xin Jin
‹ Prev 1 2 3 10 Next ›