Related papers: Queue Management in Network Processors
Internet is growing at a fast pace. The link speeds are surging toward 40 Gbps with the emergence of faster link technologies. New applications are coming up which require intelligent processing at the intermediate routers. Switches and…
The continued growth of the computational capability of throughput processors has made throughput processors the platform of choice for a wide variety of high performance computing applications. Graphics Processing Units (GPUs) are a prime…
As quantum computing progresses, the need for scalable solutions to address large-scale computational problems has become critical. Quantum supercomputers are the next upcoming frontier by enabling multiple quantum processors to collaborate…
Modern database clusters entail two levels of networks: connecting CPUs and NUMA regions inside a single server in the small and multiple servers in the large. The huge performance gap between these two types of networks used to slow down…
Many computer systems for calculating the proper organization of memory are among the most critical issues. Using a tier cache memory (along with branching prediction) is an effective means of increasing modern multi-core processors'…
The objective of this paper is to implement the Active Network based Active Queue Management Technique for providing Quality of Service (QoS) using Network Processor(NP) based router to enhance multimedia applications. The performance is…
Today's systems are overwhelmingly designed to move data to computation. This design choice goes directly against at least three key trends in systems that cause performance, scalability and energy bottlenecks: (1) data access from memory…
The prospects of quantum computing have driven efforts to realize fully functional quantum processing units (QPUs). Recent success in developing proof-of-principle QPUs has prompted the question of how to integrate these emerging processors…
These days enterprise applications try to integrate online processing and batch jobs into a common software stack for seamless monitoring and driverless operations. Continuous integration of these systems results in choking of the poorly…
Large-scale timers are ubiquitous in network processing, including flow table entry expiration control in software defined network (SDN) switches, MAC address aging in Ethernet bridges, and retransmission timeout management in TCP/IP…
Many architects believe that major improvements in cost-energy-performance must now come from domain-specific hardware. This paper evaluates a custom ASIC---called a Tensor Processing Unit (TPU)---deployed in datacenters since 2015 that…
Processing large-scale graph datasets is computationally intensive and time-consuming. Processor-centric CPU and GPU architectures, commonly used for graph applications, often face bottlenecks caused by extensive data movement between the…
Concurrent data structures often require additional memory for handling synchronization issues in addition to memory for storing elements. Depending on the amount of this additional memory, implementations can be more or less…
Low-level embedded systems are used to control cyber-phyiscal systems in industrial and autonomous applications. They need to meet hard real-time requirements as unanticipated controller delays on moving machines can have devastating…
Processor and system architectures that feature multiple memory controllers are prone to show bottlenecks and erratic performance numbers on codes with regular access patterns. Although such effects are well known in the form of cache…
The emerging hybrid DRAM-NVM architecture is challenging the existing memory management mechanism in operating system. In this paper, we introduce memos, which can schedule memory resources over the entire memory hierarchy including cache,…
With the hardware offloading of network functions, network interface cards (NICs) undertake massive stateful, high-precision, and high-throughput tasks, where timers serve as a critical enabling component. However, existing timer management…
Even with generational improvements in DRAM technology, memory access latency still remains the major bottleneck for application accelerators, primarily due to limitations in memory interface IPs which cannot fully account for variations in…
Progress in quantum computing hardware raises questions about how these devices can be controlled, programmed, and integrated with existing computational workflows. We briefly describe several prominent quantum computational models, their…
Concurrent priority queues are widely used in important workloads, such as graph applications and discrete event simulations. However, designing scalable concurrent priority queues for NUMA architectures is challenging. Even though several…