Heming Sun — Scifaro

Semantics Disentanglement and Composition for Universal Image Coding with Efficiently LLM Reasoning and Generative Diffusion

Learned image compression methods have shown impressive performance but are often highly specialized for either human perception or specific machine vision tasks. This specialization limits their versatility and requires costly retraining…

Computer Vision and Pattern Recognition · Computer Science 2026-05-19 Jinming Liu , Yuntao Wei , Junyan Lin , Shengyang Zhao , Heming Sun , Zhibo Chen , Wenjun Zeng , Xin Jin

LLMdoctor: Token-Level Flow-Guided Preference Optimization for Efficient Test-Time Alignment of Large Language Models

Aligning Large Language Models (LLMs) with human preferences is critical, yet traditional fine-tuning methods are computationally expensive and inflexible. While test-time alignment offers a promising alternative, existing approaches often…

Artificial Intelligence · Computer Science 2026-01-16 Tiesunlong Shen , Rui Mao , Jin Wang , Heming Sun , Jian Zhang , Xuejie Zhang , Erik Cambria

A Multi-Grid Implicit Neural Representation for Multi-View Videos

Multi-view videos are becoming widely used in different fields, but their high resolution and multi-camera shooting raise significant challenges for storage and transmission. In this paper, we propose MV-MGINR, a multi-grid implicit neural…

Image and Video Processing · Electrical Eng. & Systems 2025-09-23 Qingyue Ling , Zhengxue Cheng , Donghui Feng , Shen Wang , Chen Zhu , Guo Lu , Heming Sun , Jiro Katto , Li Song

Real-time Video Prediction With Fast Video Interpolation Model and Prediction Training

Transmission latency significantly affects users' quality of experience in real-time interaction and actuation. As latency is principally inevitable, video prediction can be utilized to mitigate the latency and ultimately enable…

Computer Vision and Pattern Recognition · Computer Science 2025-04-07 Shota Hirose , Kazuki Kotoyori , Kasidis Arunruangsirilert , Fangzheng Lin , Heming Sun , Jiro Katto

Lightweight Stochastic Video Prediction via Hybrid Warping

Accurate video prediction by deep neural networks, especially for dynamic regions, is a challenging task in computer vision for critical applications such as autonomous driving, remote working, and telemedicine. Due to inherent…

Computer Vision and Pattern Recognition · Computer Science 2024-12-05 Kazuki Kotoyori , Shota Hirose , Heming Sun , Jiro Katto

LMM-driven Semantic Image-Text Coding for Ultra Low-bitrate Learned Image Compression

Supported by powerful generative models, low-bitrate learned image compression (LIC) models utilizing perceptual metrics have become feasible. Some of the most advanced models achieve high compression rates and superior perceptual quality…

Image and Video Processing · Electrical Eng. & Systems 2024-11-21 Shimon Murai , Heming Sun , Jiro Katto

Multi-diseases detection with memristive system on chip

This study presents the first implementation of multilayer neural networks on a memristor/CMOS integrated system on chip (SoC) to simultaneously detect multiple diseases. To overcome limitations in medical data, generative AI techniques are…

Hardware Architecture · Computer Science 2024-10-22 Zihan Wang , Daniel W. Yang , Zerui Liu , Evan Yan , Heming Sun , Ning Ge , Miao Hu , Wei Wu

Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs

We present a new image compression paradigm to achieve ``intelligently coding for machine'' by cleverly leveraging the common sense of Large Multimodal Models (LMMs). We are motivated by the evidence that large language/multimodal models…

Computer Vision and Pattern Recognition · Computer Science 2024-08-19 Jinming Liu , Yuntao Wei , Junyan Lin , Shengyang Zhao , Heming Sun , Zhibo Chen , Wenjun Zeng , Xin Jin

Accelerating Learnt Video Codecs with Gradient Decay and Layer-wise Distillation

In recent years, end-to-end learnt video codecs have demonstrated their potential to compete with conventional coding algorithms in term of compression efficiency. However, most learning-based video compression models are associated with…

Image and Video Processing · Electrical Eng. & Systems 2024-07-02 Tianhao Peng , Ge Gao , Heming Sun , Fan Zhang , David Bull

Survey on Visual Signal Coding and Processing with Generative Models: Technologies, Standards and Optimization

This paper provides a survey of the latest developments in visual signal coding and processing with generative models. Specifically, our focus is on presenting the advancement of generative models and their influence on research in the…

Image and Video Processing · Electrical Eng. & Systems 2024-05-24 Zhibo Chen , Heming Sun , Li Zhang , Fan Zhang

Attack and Defense Analysis of Learned Image Compression

Learned image compression (LIC) is becoming more and more popular these years with its high efficiency and outstanding compression quality. Still, the practicality against modified inputs added with specific noise could not be ignored.…

Image and Video Processing · Electrical Eng. & Systems 2024-03-28 Tianyu Zhu , Heming Sun , Xiankui Xiong , Xuanpeng Zhu , Yong Gong , Minge jing , Yibo Fan

SCP: Spherical-Coordinate-based Learned Point Cloud Compression

In recent years, the task of learned point cloud compression has gained prominence. An important type of point cloud, the spinning LiDAR point cloud, is generated by spinning LiDAR on vehicles. This process results in numerous circular…

Computer Vision and Pattern Recognition · Computer Science 2024-02-09 Ao Luo , Linxin Song , Keisuke Nonaka , Kyohei Unno , Heming Sun , Masayuki Goto , Jiro Katto

Recoil: Parallel rANS Decoding with Decoder-Adaptive Scalability

Entropy coding is essential to data compression, image and video coding, etc. The Range variant of Asymmetric Numeral Systems (rANS) is a modern entropy coder, featuring superior speed and compression rate. As rANS is not designed for…

Distributed, Parallel, and Cluster Computing · Computer Science 2023-06-27 Fangzheng Lin , Kasidis Arunruangsirilert , Heming Sun , Jiro Katto

Prompt-ICM: A Unified Framework towards Image Coding for Machines with Task-driven Prompts

Image coding for machines (ICM) aims to compress images to support downstream AI analysis instead of human perception. For ICM, developing a unified codec to reduce information redundancy while empowering the compressed features to support…

Computer Vision and Pattern Recognition · Computer Science 2023-05-05 Ruoyu Feng , Jinming Liu , Xin Jin , Xiaohan Pan , Heming Sun , Zhibo Chen

Learned Image Compression with Mixed Transformer-CNN Architectures

Learned image compression (LIC) methods have exhibited promising progress and superior rate-distortion performance compared with classical image compression standards. Most existing LIC methods are Convolutional Neural Networks-based…

Image and Video Processing · Electrical Eng. & Systems 2023-03-28 Jinming Liu , Heming Sun , Jiro Katto

Multistage Spatial Context Models for Learned Image Compression

Recent state-of-the-art Learned Image Compression methods feature spatial context models, achieving great rate-distortion improvements over hyperprior methods. However, the autoregressive context model requires serial decoding, limiting…

Computer Vision and Pattern Recognition · Computer Science 2023-02-21 Fangzheng Lin , Heming Sun , Jinming Liu , Jiro Katto

ABCAS: Adaptive Bound Control of spectral norm as Automatic Stabilizer

Spectral Normalization is one of the best methods for stabilizing the training of Generative Adversarial Network. Spectral Normalization limits the gradient of discriminator between the distribution between real data and fake data. However,…

Computer Vision and Pattern Recognition · Computer Science 2022-11-15 Shota Hirose , Shiori Maki , Naoki Wada , Heming Sun , Jiro Katto

Semantic Segmentation in Learned Compressed Domain

Most machine vision tasks (e.g., semantic segmentation) are based on images encoded and decoded by image compression algorithms (e.g., JPEG). However, these decoded images in the pixel domain introduce distortion, and they are optimized for…

Computer Vision and Pattern Recognition · Computer Science 2022-09-07 Jinming Liu , Heming Sun , Jiro Katto

Learned Lossless Image Compression With Combined Autoregressive Models And Attention Modules

Lossless image compression is an essential research field in image compression. Recently, learning-based image compression methods achieved impressive performance compared with traditional lossless methods, such as WebP, JPEG2000, and FLIF.…

Image and Video Processing · Electrical Eng. & Systems 2022-08-31 Ran Wang , Jinming Liu , Heming Sun , Jiro Katto

Streaming-capable High-performance Architecture of Learned Image Compression Codecs

Learned image compression allows achieving state-of-the-art accuracy and compression ratios, but their relatively slow runtime performance limits their usage. While previous attempts on optimizing learned image codecs focused more on the…

Image and Video Processing · Electrical Eng. & Systems 2022-08-04 Fangzheng Lin , Heming Sun , Jiro Katto