Related papers: Queue Management in Network Processors

Data Path Processing in Fast Programmable Routers

Internet is growing at a fast pace. The link speeds are surging toward 40 Gbps with the emergence of faster link technologies. New applications are coming up which require intelligent processing at the intermediate routers. Switches and…

Networking and Internet Architecture · Computer Science 2007-05-23 Pradipta De

Techniques for Shared Resource Management in Systems with Throughput Processors

The continued growth of the computational capability of throughput processors has made throughput processors the platform of choice for a wide variety of high performance computing applications. Graphics Processing Units (GPUs) are a prime…

Hardware Architecture · Computer Science 2018-05-01 Rachata Ausavarungnirun

A Quantum Network Processor Unit for Distributed Quantum Computing

As quantum computing progresses, the need for scalable solutions to address large-scale computational problems has become critical. Quantum supercomputers are the next upcoming frontier by enabling multiple quantum processors to collaborate…

Quantum Physics · Physics 2025-11-20 Peiyi Li , Chenxu Liu , Ji Liu , Huiyang Zhou , Ang Li

High-Speed Query Processing over High-Speed Networks

Modern database clusters entail two levels of networks: connecting CPUs and NUMA regions inside a single server in the small and multiple servers in the large. The huge performance gap between these two types of networks used to slow down…

Databases · Computer Science 2015-11-03 Wolf Roediger , Tobias Muehlbauer , Alfons Kemper , Thomas Neumann

Estimate The Efficiency Of Multiprocessor's Cash Memory Work Algorithms

Many computer systems for calculating the proper organization of memory are among the most critical issues. Using a tier cache memory (along with branching prediction) is an effective means of increasing modern multi-core processors'…

Networking and Internet Architecture · Computer Science 2021-05-21 Mohamed A. Hamada , Abdelrahman Abdallah

Processor Based Active Queue Management for providing QoS in Multimedia Application

The objective of this paper is to implement the Active Network based Active Queue Management Technique for providing Quality of Service (QoS) using Network Processor(NP) based router to enhance multimedia applications. The performance is…

Networking and Internet Architecture · Computer Science 2010-04-13 N. Saravana Selvam , S. Radhakrishnan

Processing Data Where It Makes Sense: Enabling In-Memory Computation

Today's systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in systems that cause performance, scalability and energy bottlenecks: (1) data access from memory…

Hardware Architecture · Computer Science 2019-03-12 Onur Mutlu , Saugata Ghose , Juan Gómez-Luna , Rachata Ausavarungnirun

High-Performance Computing with Quantum Processing Units

The prospects of quantum computing have driven efforts to realize fully functional quantum processing units (QPUs). Recent success in developing proof-of-principle QPUs has prompted the question of how to integrate these emerging processors…

Emerging Technologies · Computer Science 2015-12-10 Keith A. Britt , Travis S. Humble

A Design of Endurance Queue for Co-Existing Systems in Multi-Programmed Environments

These days enterprise applications try to integrate online processing and batch jobs into a common software stack for seamless monitoring and driverless operations. Continuous integration of these systems results in choking of the poorly…

Performance · Computer Science 2015-12-16 Subrata Ashe

Design of a Timer Queue Supporting Dynamic Update Operations

Large-scale timers are ubiquitous in network processing, including flow table entry expiration control in software defined network (SDN) switches, MAC address aging in Ethernet bridges, and retransmission timeout management in TCP/IP…

Networking and Internet Architecture · Computer Science 2025-08-15 Zekun Wang , Binghao Yue , Weitao Pan , Jiangyi Shi , Yue Hao

In-Datacenter Performance Analysis of a Tensor Processing Unit

Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that…

Hardware Architecture · Computer Science 2017-04-18 Norman P. Jouppi , Cliff Young , Nishant Patil , David Patterson , Gaurav Agrawal , Raminder Bajwa , Sarah Bates , Suresh Bhatia , Nan Boden , Al Borchers , Rick Boyle , Pierre-luc Cantin , Clifford Chao , Chris Clark , Jeremy Coriell , Mike Daley , Matt Dau , Jeffrey Dean , Ben Gelb , Tara Vazir Ghaemmaghami , Rajendra Gottipati , William Gulland , Robert Hagmann , C. Richard Ho , Doug Hogberg , John Hu , Robert Hundt , Dan Hurt , Julian Ibarz , Aaron Jaffey , Alek Jaworski , Alexander Kaplan , Harshit Khaitan , Andy Koch , Naveen Kumar , Steve Lacy , James Laudon , James Law , Diemthu Le , Chris Leary , Zhuyuan Liu , Kyle Lucke , Alan Lundin , Gordon MacKean , Adriana Maggiore , Maire Mahony , Kieran Miller , Rahul Nagarajan , Ravi Narayanaswami , Ray Ni , Kathy Nix , Thomas Norrie , Mark Omernick , Narayana Penukonda , Andy Phelps , Jonathan Ross , Matt Ross , Amir Salek , Emad Samadiani , Chris Severn , Gregory Sizikov , Matthew Snelham , Jed Souter , Dan Steinberg , Andy Swing , Mercedes Tan , Gregory Thorson , Bo Tian , Horia Toma , Erick Tuttle , Vijay Vasudevan , Richard Walter , Walter Wang , Eric Wilcox , Doe Hyun Yoon

ALPHA-PIM: Analysis of Linear Algebraic Processing for High-Performance Graph Applications on a Real Processing-In-Memory System

Processing large-scale graph datasets is computationally intensive and time-consuming. Processor-centric CPU and GPU architectures, commonly used for graph applications, often face bottlenecks caused by extensive data movement between the…

Distributed, Parallel, and Cluster Computing · Computer Science 2026-02-11 Marzieh Barkhordar , Alireza Tabatabaeian , Mohammad Sadrosadati , Christina Giannoula , Juan Gomez Luna , Izzat El Hajj , Onur Mutlu , Alaa R. Alameldeen

Memory Bounds for Concurrent Bounded Queues

Concurrent data structures often require additional memory for handling synchronization issues in addition to memory for storing elements. Depending on the amount of this additional memory, implementations can be more or less…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-01-17 Vitaly Aksenov , Nikita Koval , Petr Kuznetsov , Anton Paramonov

A Priority-Aware Multiqueue NIC Design

Low-level embedded systems are used to control cyber-phyiscal systems in industrial and autonomous applications. They need to meet hard real-time requirements as unanticipated controller delays on moving machines can have devastating…

Networking and Internet Architecture · Computer Science 2022-01-04 Ilja Behnke , Philipp Wiesner , Robert Danicki , Lauritz Thamsen

Data access optimizations for highly threaded multi-core CPUs with multiple memory controllers

Processor and system architectures that feature multiple memory controllers are prone to show bottlenecks and erratic performance numbers on codes with regular access patterns. Although such effects are well known in the form of cache…

Distributed, Parallel, and Cluster Computing · Computer Science 2008-01-28 Georg Hager , Thomas Zeiser , Gerhard Wellein

Memos: Revisiting Hybrid Memory Management in Modern Operating System

The emerging hybrid DRAM-NVM architecture is challenging the existing memory management mechanism in operating system. In this paper, we introduce memos, which can schedule memory resources over the entire memory hierarchy including cache,…

Operating Systems · Computer Science 2017-03-23 Lei Liu , Mengyao Xie , Hao Yang

A Grouped Sorting Queue Supporting Dynamic Updates for Timer Management in High-Speed Network Interface Cards

With the hardware offloading of network functions, network interface cards (NICs) undertake massive stateful, high-precision, and high-throughput tasks, where timers serve as a critical enabling component. However, existing timer management…

Data Structures and Algorithms · Computer Science 2026-04-14 Zekun Wang , Binghao Yue , Weitao Pan , Jianyi Shi , Yue Hao

Programmable FPGA-based Memory Controller

Even with generational improvements in DRAM technology, memory access latency still remains the major bottleneck for application accelerators, primarily due to limitations in memory interface IPs which cannot fully account for variations in…

Hardware Architecture · Computer Science 2021-08-24 Sasindu Wijeratne , Sanket Pattnaik , Zhiyu Chen , Rajgopal Kannan , Viktor Prasanna

Instruction Set Architectures for Quantum Processing Units

Progress in quantum computing hardware raises questions about how these devices can be controlled, programmed, and integrated with existing computational workflows. We briefly describe several prominent quantum computational models, their…

Emerging Technologies · Computer Science 2017-07-20 Keith A. Britt , Travis S. Humble

SmartPQ: An Adaptive Concurrent Priority Queue for NUMA Architectures

Concurrent priority queues are widely used in important workloads, such as graph applications and discrete event simulations. However, designing scalable concurrent priority queues for NUMA architectures is challenging. Even though several…

Distributed, Parallel, and Cluster Computing · Computer Science 2024-06-12 Christina Giannoula , Foteini Strati , Dimitrios Siakavaras , Georgios Goumas , Nectarios Koziris