Related papers: Mitigating the Memory Bottleneck with Machine Lear…
Computing is bottlenecked by data. Large amounts of application data overwhelm storage capability, communication capability, and computation capability of the modern machines we design today. As a result, many key applications' performance,…
The memory subsystem has always been a bottleneck in performance as well as significant power contributor in memory intensive applications. Many researchers have presented multi-layered memory hierarchies as a means to design energy and…
Today's computing systems require moving data back-and-forth between computing resources (e.g., CPUs, GPUs, accelerators) and off-chip main memory so that computation can take place on the data. Unfortunately, this data movement is a major…
Computing is bottlenecked by data. Large amounts of application data overwhelm storage capability, communication capability, and computation capability of the modern machines we design today. We argue that an intelligent architecture should…
Computing has a huge memory problem. The memory system, consisting of multiple technologies at different levels, is responsible for most of the energy consumption, performance bottlenecks, robustness problems, monetary cost, and hardware…
Major advancements in the capabilities of computer vision models have been primarily fueled by rapid expansion of datasets, model parameters, and computational budgets, leading to ever-increasing demands on computational infrastructure.…
The explosion in workload complexity and the recent slow-down in Moore's law scaling call for new approaches towards efficient computing. Researchers are now beginning to use recent advances in machine learning in software optimizations,…
Machine learning tasks are generally formulated as optimization problems, where one searches for an optimal function within a certain functional space. In practice, parameterized functional spaces are considered, in order to be able to…
Today's systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in systems that cause performance, scalability and energy bottlenecks: (1) data access from memory…
High capacity and scalable memory systems play a vital role in enabling our desktops, smartphones, and pervasive technologies like Internet of Things (IoT). Unfortunately, memory systems are becoming increasingly prone to faults. This is…
Understanding micro-architectural behavior is profound in efficiently using hardware resources. Recent work has shown that, despite being aggressively optimized for modern hardware, in-memory online transaction processing (OLTP) systems…
The rapid advancement of embedded multicore and many-core systems has revolutionized computing, enabling the development of high-performance, energy-efficient solutions for a wide range of applications. As models scale up in size, data…
Recent work has shown that out-of-order and speculative execution mechanisms used to increase performance in the majority of processors expose the processors to critical attacks. These attacks, called Meltdown and Spectre, exploit the side…
Despite the data-rich environment in which memory systems of modern computing platforms operate, many state-of-the-art architectural policies employed in the memory system rely on static, human-designed heuristics that fail to truly adapt…
Transformer models have emerged as potent solutions to a wide array of multidisciplinary challenges. The deployment of Transformer architectures is significantly hindered by their extensive computational and memory requirements,…
Microservice architectures and design patterns enhance the development of large-scale applications by promoting flexibility. Industrial practitioners perceive the importance of applying architectural patterns but they struggle to quantify…
A large body of research in continual learning is devoted to overcoming the catastrophic forgetting of neural networks by designing new algorithms that are robust to the distribution shifts. However, the majority of these works are strictly…
Memory-centric computing aims to enable computation capability in and near all places where data is generated and stored. As such, it can greatly reduce the large negative performance and energy impact of data access and data movement, by…
The development of large-scale foundation models, particularly Large Language Models (LLMs), is constrained by significant computational and memory bottlenecks. These challenges elevate throughput optimization from a mere engineering task…
Machine learning has enabled significant benefits in diverse fields, but, with a few exceptions, has had limited impact on computer architecture. Recent work, however, has explored broader applicability for design, optimization, and…