Related papers: A dynamic memory assignment strategy for dilation-…

Accelerating Exact and Approximate Inference for (Distributed) Discrete Optimization with GPUs

Discrete optimization is a central problem in artificial intelligence. The optimization of the aggregated cost of a network of cost functions arises in a variety of problems including (W)CSP, DCOP, as well as optimization in stochastic…

Artificial Intelligence · Computer Science 2018-01-12 Ferdinando Fioretto , Enrico Pontelli , William Yeoh , Rina Dechter

A Hierarchical Distributed Processing Framework for Big Image Data

This paper introduces an effective processing framework nominated ICP (Image Cloud Processing) to powerfully cope with the data explosion in image processing field. While most previous researches focus on optimizing the image processing…

Computer Vision and Pattern Recognition · Computer Science 2016-07-05 Le Dong , Zhiyu Lin , Yan Liang , Ling He , Ning Zhang , Qi Chen , Xiaochun Cao , Ebroul lzquierdo

Towards a High-performance and Secure Memory System and Architecture for Emerging Applications

In this dissertation, we propose a memory and computing coordinated methodology to thoroughly exploit the characteristics and capabilities of the GPU-based heterogeneous system to effectively optimize applications' performance and privacy.…

Cryptography and Security · Computer Science 2022-09-07 Zhendong Wang , Yang Hu

Generalized Ping-Pong: Off-Chip Memory Bandwidth Centric Pipelining Strategy for Processing-In-Memory Accelerators

Processing-in-memory (PIM) is a promising choice for accelerating deep neural networks (DNNs) featuring high efficiency and low power. However, the rapid upscaling of neural network model sizes poses a crucial challenge for the limited…

Hardware Architecture · Computer Science 2024-11-21 Ruibao Wang , Bonan Yan

Enhanced VIP Algorithms for Forwarding, Caching, and Congestion Control in Named Data Networks

Emerging Information-Centric Networking (ICN) architectures seek to optimally utilize both bandwidth and storage for efficient content distribution over the network. The Virtual Interest Packet (VIP) framework has been proposed to enable…

Information Theory · Computer Science 2016-07-13 Ying Cui , Fan Lai , Edmund Yeh , Ran Liu

A Collaborative PIM Computing Optimization Framework for Multi-Tenant DNN

Modern Artificial Intelligence (AI) applications are increasingly utilizing multi-tenant deep neural networks (DNNs), which lead to a significant rise in computing complexity and the need for computing parallelism. ReRAM-based…

Emerging Technologies · Computer Science 2024-08-12 Bojing Li , Duo Zhong , Xiang Chen , Chenchen Liu

DeepICP: An End-to-End Deep Neural Network for 3D Point Cloud Registration

We present DeepICP - a novel end-to-end learning-based 3D point cloud registration framework that achieves comparable registration accuracy to prior state-of-the-art geometric methods. Different from other keypoint based methods where a…

Computer Vision and Pattern Recognition · Computer Science 2020-07-14 Weixin Lu , Guowei Wan , Yao Zhou , Xiangyu Fu , Pengfei Yuan , Shiyu Song

ScaleGANN: Accelerate Large-Scale ANN Indexing by Cost-effective Cloud GPUs

Graph-based ANNS algorithms have gained increasing research interest and market adoption due to their efficiency and accuracy in retrieval. Existing approaches primarily rely on CPUs for graph index construction and retrieval, but this…

Databases · Computer Science 2026-05-12 Lan Lu , Peiqi Yin , Isaac Yang , Tao Luo , Hua Fan , Wenchao Zhou , Feifei Li , Boon Thau Loo

DICP: Doppler Iterative Closest Point Algorithm

In this paper, we present a novel algorithm for point cloud registration for range sensors capable of measuring per-return instantaneous radial velocity: Doppler ICP. Existing variants of ICP that solely rely on geometry or other features…

Robotics · Computer Science 2022-06-01 Bruno Hexsel , Heethesh Vhavle , Yi Chen

Demand Layering for Real-Time DNN Inference with Minimized Memory Usage

When executing a deep neural network (DNN), its model parameters are loaded into GPU memory before execution, incurring a significant GPU memory burden. There are studies that reduce GPU memory usage by exploiting CPU memory as a swap…

Machine Learning · Computer Science 2022-10-11 Mingoo Ji , Saehanseul Yi , Changjin Koo , Sol Ahn , Dongjoo Seo , Nikil Dutt , Jong-Chan Kim

DYNAMAP: Dynamic Algorithm Mapping Framework for Low Latency CNN Inference

Most of the existing work on FPGA acceleration of Convolutional Neural Network (CNN) focus on employing a single strategy (algorithm, dataflow, etc.) across all the layers. Such an approach does not achieve optimal latency on complex and…

Distributed, Parallel, and Cluster Computing · Computer Science 2021-03-16 Yuan Meng , Sanmukh Kuppannagari , Rajgopal Kannan , Viktor Prasanna

LP-ICP: General Localizability-Aware Point Cloud Registration for Robust Localization in Extreme Unstructured Environments

The Iterative Closest Point (ICP) algorithm is a crucial component of LiDAR-based SLAM algorithms. However, its performance can be negatively affected in unstructured environments that lack features and geometric structures, leading to low…

Robotics · Computer Science 2025-06-03 Haosong Yue , Qingyuan Xu , Fei Chen , Jia Pan , Weihai Chen

Efficient Deployment of CNN Models on Multiple In-Memory Computing Units

In-Memory Computing (IMC) represents a paradigm shift in deep learning acceleration by mitigating data movement bottlenecks and leveraging the inherent parallelism of memory-based computations. The efficient deployment of Convolutional…

Hardware Architecture · Computer Science 2025-11-10 Eleni Bougioukou , Theodore Antonakopoulos

Scaled VIP Algorithms for Joint Dynamic Forwarding and Caching in Named Data Networks

Emerging Information-Centric Networking (ICN) architectures seek to optimally utilize both bandwidth and storage for efficient content distribution over the network. The Virtual Interest Packet (VIP) framework has been proposed to enable…

Information Theory · Computer Science 2016-08-16 Fan Lai , Feng Qiu , Wenjie Bian , Ying Cui , Edmund Yeh

A GPU-Accelerated Distributed Algorithm for Optimal Power Flow in Distribution Systems

We propose a GPU-accelerated distributed optimization algorithm for controlling multi-phase optimal power flow in active distribution systems with dynamically changing topologies. To handle varying network configurations and enable…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-01-15 Minseok Ryu , Geunyeong Byeon , Kibaek Kim

Continual Learning Approach for Improving the Data and Computation Mapping in Near-Memory Processing System

The resurgence of near-memory processing (NMP) with the advent of big data has shifted the computation paradigm from processor-centric to memory-centric computing. To meet the bandwidth and capacity demands of memory-centric computing, 3D…

Hardware Architecture · Computer Science 2021-04-29 Pritam Majumder , Jiayi Huang , Sungkeun Kim , Abdullah Muzahid , Dylan Siegers , Chia-Che Tsai , Eun Jung Kim

WGICP: Differentiable Weighted GICP-Based Lidar Odometry

We present a novel differentiable weighted generalized iterative closest point (WGICP) method applicable to general 3D point cloud data, including that from Lidar. Our method builds on differentiable generalized ICP (GICP), and we propose…

Robotics · Computer Science 2022-10-05 Sanghyun Son , Jing Liang , Ming Lin , Dinesh Manocha

DynIMS: A Dynamic Memory Controller for In-memory Storage on HPC Systems

In order to boost the performance of data-intensive computing on HPC systems, in-memory computing frameworks, such as Apache Spark and Flink, use local DRAM for data storage. Optimizing the memory allocation to data storage is critical to…

Performance · Computer Science 2016-09-30 Pengfei Xuan , Feng Luo , Rong Ge , Pradip K Srimani

Fast-OverlaPIM: A Fast Overlap-driven Mapping Framework for Processing In-Memory Neural Network Acceleration

Processing in-memory (PIM) is promising to accelerate neural networks (NNs) because it minimizes data movement and provides large computational parallelism. Similar to machine learning accelerators, application mapping, which determines the…

Hardware Architecture · Computer Science 2024-07-02 Xuan Wang , Minxuan Zhou , Tajana Rosing

DVFS-Aware DNN Inference on GPUs: Latency Modeling and Performance Analysis

The rapid development of deep neural networks (DNNs) is inherently accompanied by the problem of high computational costs. To tackle this challenge, dynamic voltage frequency scaling (DVFS) is emerging as a promising technology for…

Machine Learning · Computer Science 2025-06-23 Yunchu Han , Zhaojun Nan , Sheng Zhou , Zhisheng Niu