Heming Sun
Learned image compression methods have shown impressive performance but are often highly specialized for either human perception or specific machine vision tasks. This specialization limits their versatility and requires costly retraining…
Aligning Large Language Models (LLMs) with human preferences is critical, yet traditional fine-tuning methods are computationally expensive and inflexible. While test-time alignment offers a promising alternative, existing approaches often…
Multi-view videos are becoming widely used in different fields, but their high resolution and multi-camera shooting raise significant challenges for storage and transmission. In this paper, we propose MV-MGINR, a multi-grid implicit neural…
Transmission latency significantly affects users' quality of experience in real-time interaction and actuation. As latency is principally inevitable, video prediction can be utilized to mitigate the latency and ultimately enable…
Accurate video prediction by deep neural networks, especially for dynamic regions, is a challenging task in computer vision for critical applications such as autonomous driving, remote working, and telemedicine. Due to inherent…
Supported by powerful generative models, low-bitrate learned image compression (LIC) models utilizing perceptual metrics have become feasible. Some of the most advanced models achieve high compression rates and superior perceptual quality…
This study presents the first implementation of multilayer neural networks on a memristor/CMOS integrated system on chip (SoC) to simultaneously detect multiple diseases. To overcome limitations in medical data, generative AI techniques are…
We present a new image compression paradigm to achieve ``intelligently coding for machine'' by cleverly leveraging the common sense of Large Multimodal Models (LMMs). We are motivated by the evidence that large language/multimodal models…
In recent years, end-to-end learnt video codecs have demonstrated their potential to compete with conventional coding algorithms in term of compression efficiency. However, most learning-based video compression models are associated with…
This paper provides a survey of the latest developments in visual signal coding and processing with generative models. Specifically, our focus is on presenting the advancement of generative models and their influence on research in the…
Learned image compression (LIC) is becoming more and more popular these years with its high efficiency and outstanding compression quality. Still, the practicality against modified inputs added with specific noise could not be ignored.…
In recent years, the task of learned point cloud compression has gained prominence. An important type of point cloud, the spinning LiDAR point cloud, is generated by spinning LiDAR on vehicles. This process results in numerous circular…
Entropy coding is essential to data compression, image and video coding, etc. The Range variant of Asymmetric Numeral Systems (rANS) is a modern entropy coder, featuring superior speed and compression rate. As rANS is not designed for…
Image coding for machines (ICM) aims to compress images to support downstream AI analysis instead of human perception. For ICM, developing a unified codec to reduce information redundancy while empowering the compressed features to support…
Learned image compression (LIC) methods have exhibited promising progress and superior rate-distortion performance compared with classical image compression standards. Most existing LIC methods are Convolutional Neural Networks-based…
Recent state-of-the-art Learned Image Compression methods feature spatial context models, achieving great rate-distortion improvements over hyperprior methods. However, the autoregressive context model requires serial decoding, limiting…
Spectral Normalization is one of the best methods for stabilizing the training of Generative Adversarial Network. Spectral Normalization limits the gradient of discriminator between the distribution between real data and fake data. However,…
Most machine vision tasks (e.g., semantic segmentation) are based on images encoded and decoded by image compression algorithms (e.g., JPEG). However, these decoded images in the pixel domain introduce distortion, and they are optimized for…
Lossless image compression is an essential research field in image compression. Recently, learning-based image compression methods achieved impressive performance compared with traditional lossless methods, such as WebP, JPEG2000, and FLIF.…
Learned image compression allows achieving state-of-the-art accuracy and compression ratios, but their relatively slow runtime performance limits their usage. While previous attempts on optimizing learned image codecs focused more on the…