English
Related papers

Related papers: RowClone: Accelerating Data Movement and Initializ…

200 papers

In existing systems, the off-chip memory interface allows the memory controller to perform only read or write operations. Therefore, to perform any operation, the processor must first read the source data and then write the result back to…

Hardware Architecture · Computer Science 2016-11-01 Vivek Seshadri , Onur Mutlu

In most modern systems, the memory subsystem is managed and accessed at multiple different granularities at various resources. We observe that such multi-granularity management results in significant inefficiency in the memory subsystem.…

Hardware Architecture · Computer Science 2016-05-23 Vivek Seshadri

Data copy is a widely-used memory operation in many programs and operating system services. In conventional computers, data copy is often carried out by two separate read and write transactions that pass data back and forth between the DRAM…

Analytical database systems are typically designed to use a column-first data layout to access only the desired fields. On the other hand, storing data row-first works great for accessing, inserting, or updating entire rows. Transforming…

Deploying large language models (LLMs) on edge devices is crucial for delivering fast responses and ensuring data privacy. However, the limited storage, weight, and power of edge devices make it difficult to deploy LLM-powered applications.…

Hardware Architecture · Computer Science 2025-06-04 Chunlin Tian , Xinpeng Qin , Kahou Tam , Li Li , Zijian Wang , Yuanzhe Zhao , Minglei Zhang , Chengzhong Xu

Spawning duplicate requests, called cloning, is a powerful technique to reduce tail latency by masking service-time variability. However, traditional client-based cloning is static and harmful to performance under high load, while a recent…

Networking and Internet Architecture · Computer Science 2023-07-26 Gyuyeong Kim

Persistent key value stores are an important component of many distributed data serving solutions with innovations targeted at taking advantage of growing flash speeds. Unfortunately their performance is hampered by the need to maintain and…

Distributed, Parallel, and Cluster Computing · Computer Science 2017-11-21 Amitabha Roy , Subramanya R. Dulloor

Modern HBM-based memory systems have evolved over generations while retaining cache line granularity accesses. Preserving this fine granularity necessitated the introduction of bank groups and pseudo channels. These structures expand timing…

Hardware Architecture · Computer Science 2025-12-02 Hwayong Nam , Seungmin Baek , Jumin Kim , Michael Jaemin Kim , Jung Ho Ahn

Read-optimized columnar databases use differential updates to handle writes by maintaining a separate write-optimized delta partition which is periodically merged with the read-optimized and compressed main partition. This merge process…

DRAM-based main memory is used in nearly all computing systems as a major component. One way of overcoming the main memory bottleneck is to move computation near memory, a paradigm known as processing-in-memory (PiM). Recent PiM techniques…

Hardware Architecture · Computer Science 2022-06-02 Ataberk Olgun , Juan Gomez Luna , Konstantinos Kanellopoulos , Behzad Salami , Hasan Hassan , Oguz Ergin , Onur Mutlu

We demonstrate that general-purpose memory allocation involving many threads on many cores can be done with high performance, multicore scalability, and low memory consumption. For this purpose, we have designed and implemented scalloc, a…

Programming Languages · Computer Science 2015-08-26 Martin Aigner , Christoph M. Kirsch , Michael Lippautz , Ana Sokolova

Data movement between the processor and the main memory is a first-order obstacle against improving performance and energy efficiency in modern systems. To address this obstacle, Processing-using-Memory (PuM) is a promising approach where…

Linearizable datastores are desirable because they provide users with the illusion that the datastore is run on a single machine that performs client operations one at a time. To reduce the performance cost of providing this illusion, many…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-04-09 Myles Thiessen , Aleksey Panas , Guy Khazma , Eyal de Lara

This paper summarizes the idea of ChargeCache, which was published in HPCA 2016 [51], and examines the work's significance and future potential. DRAM latency continues to be a critical bottleneck for system performance. In this work, we…

Hardware Architecture · Computer Science 2018-05-11 Hasan Hassan , Gennady Pekhimenko , Nandita Vijaykumar , Vivek Seshadri , Donghyuk Lee , Oguz Ergin , Onur Mutlu

Memory-centric computing aims to enable computation capability in and near all places where data is generated and stored. As such, it can greatly reduce the large negative performance and energy impact of data access and data movement, by…

Hardware Architecture · Computer Science 2024-12-30 Onur Mutlu , Ataberk Olgun , Geraldo F. Oliveira , Ismail Emir Yuksel

Resistive Random-Access Memory (RRAM) is well-suited to accelerate neural network (NN) workloads as RRAM-based Processing-in-Memory (PIM) architectures natively support highly-parallel multiply-accumulate (MAC) operations that form the…

Hardware Architecture · Computer Science 2022-11-11 Aditya Manglik , Minesh Patel , Haiyu Mao , Behzad Salami , Jisung Park , Lois Orosa , Onur Mutlu

The data engineering and data science community has embraced the idea of using Python & R dataframes for regular applications. Driven by the big data revolution and artificial intelligence, these applications are now essential in order to…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-01-20 Niranda Perera , Kaiying Shan , Supun Kamburugamuwe , Thejaka Amila Kanewela , Chathura Widanage , Arup Sarker , Mills Staylor , Tianle Zhong , Vibhatha Abeykoon , Geoffrey Fox

Kernel methods provide a principled way to perform non linear, nonparametric learning. They rely on solid functional analytic foundations and enjoy optimal statistical properties. However, at least in their basic form, they have limited…

Machine Learning · Statistics 2018-02-01 Alessandro Rudi , Luigi Carratino , Lorenzo Rosasco

Computing-in-memory (CiM) is a promising technique to achieve high energy efficiency in data-intensive matrix-vector multiplication (MVM) by relieving the memory bottleneck. Unfortunately, due to the limited SRAM capacity, existing…

Hardware Architecture · Computer Science 2022-08-18 Yiming Chen , Guodong Yin , Zhanhong Tan , Mingyen Lee , Zekun Yang , Yongpan Liu , Huazhong Yang , Kaisheng Ma , Xueqing Li

The data science community today has embraced the concept of Dataframes as the de facto standard for data representation and manipulation. Ease of use, massive operator coverage, and popularization of R and Python languages have heavily…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-07-06 Niranda Perera , Supun Kamburugamuve , Chathura Widanage , Vibhatha Abeykoon , Ahmet Uyar , Kaiying Shan , Hasara Maithree , Damitha Lenadora , Thejaka Amila Kanewala , Geoffrey Fox
‹ Prev 1 2 3 10 Next ›