English
Related papers

Related papers: Dv2v: A Dynamic Variable-to-Variable Compressor

200 papers

Conventional video compression approaches use the predictive coding architecture and encode the corresponding motion information and residual information. In this paper, taking advantage of both classical architecture in the conventional…

Image and Video Processing · Electrical Eng. & Systems 2019-04-09 Guo Lu , Wanli Ouyang , Dong Xu , Xiaoyun Zhang , Chunlei Cai , Zhiyong Gao

We propose a new speech discrete token vocoder, vec2wav 2.0, which advances voice conversion (VC). We use discrete tokens from speech self-supervised models as the content features of source speech, and treat VC as a prompted vocoding task.…

Audio and Speech Processing · Electrical Eng. & Systems 2025-05-27 Yiwei Guo , Zhihan Li , Junjie Li , Chenpeng Du , Hankun Wang , Shuai Wang , Xie Chen , Kai Yu

Video-to-video translation aims to generate video frames of a target domain from an input video. Despite its usefulness, the existing networks require enormous computations, necessitating their model compression for wide use. While there…

Computer Vision and Pattern Recognition · Computer Science 2023-10-05 Chaeyeon Chung , Yeojeong Park , Seunghwan Choi , Munkhsoyol Ganbat , Jaegul Choo

In recent years, multiple sensor-based devices and systems have been deployed in smart agriculture, industrial automation, E-Health, etc. The diversity of sensor data types and the amount of data pose critical challenges for data…

Signal Processing · Electrical Eng. & Systems 2024-10-21 Gajraj Kuldeep , Qi Zhang

Dynamic mode decomposition has emerged as a leading technique to identify spatiotemporal coherent structures from high-dimensional data, benefiting from a strong connection to nonlinear dynamical systems via the Koopman operator. In this…

Systems and Control · Computer Science 2017-12-01 Zhe Bai , Eurika Kaiser , Joshua L. Proctor , J. Nathan Kutz , Steven L. Brunton

The current increasing need for privacy-preserving voice communications is leading to new ideas for securing voice transmission. This paper refers to a relatively new concept of sending encrypted data or speech as pseudo-speech in the audio…

Cryptography and Security · Computer Science 2022-02-02 Piotr Krasnowski , Jerome Lebrun , Bruno Martin

Learned video compression methods have demonstrated great promise in catching up with traditional video codecs in their rate-distortion (R-D) performance. However, existing learned video compression schemes are limited by the binding of the…

Image and Video Processing · Electrical Eng. & Systems 2022-01-06 Runsen Feng , Zongyu Guo , Zhizheng Zhang , Zhibo Chen

The proliferation of deep learning-based machine vision applications has given rise to a new type of compression, so called video coding for machine (VCM). VCM differs from traditional video coding in that it is optimized for machine vision…

Computer Vision and Pattern Recognition · Computer Science 2023-08-09 Yeongwoong Kim , Hyewon Jeong , Janghyun Yu , Younhee Kim , Jooyoung Lee , Se Yoon Jeong , Hui Yong Kim

Recent Multimodal Large Language Models (MLLMs) have demonstrated strong performance on vision-language understanding tasks, yet their inference efficiency is often hampered by the large number of visual tokens, particularly in…

Computer Vision and Pattern Recognition · Computer Science 2026-04-21 Jiafei Song , Fengwei Zhou , Jin Qu , Wenjin Jason Li , Tong Wu , Gengjian Xue , Zhikang Zhao , Daomin Wei , Yichao Lu , Bailin Na

Recently, the deep learning technology has been successfully applied in the field of image compression, leading to superior rate-distortion performance. However, a challenge of many learning-based approaches is that they often achieve…

Image and Video Processing · Electrical Eng. & Systems 2023-08-24 Yongqiang Wang , Feng Liang , Haisheng Fu , Jie Liang , Haipeng Qin , Junzhe Liang

We present Step-Video-T2V, a state-of-the-art text-to-video pre-trained model with 30B parameters and the ability to generate videos up to 204 frames in length. A deep compression Variational Autoencoder, Video-VAE, is designed for video…

Computer Vision and Pattern Recognition · Computer Science 2025-02-25 Guoqing Ma , Haoyang Huang , Kun Yan , Liangyu Chen , Nan Duan , Shengming Yin , Changyi Wan , Ranchen Ming , Xiaoniu Song , Xing Chen , Yu Zhou , Deshan Sun , Deyu Zhou , Jian Zhou , Kaijun Tan , Kang An , Mei Chen , Wei Ji , Qiling Wu , Wen Sun , Xin Han , Yanan Wei , Zheng Ge , Aojie Li , Bin Wang , Bizhu Huang , Bo Wang , Brian Li , Changxing Miao , Chen Xu , Chenfei Wu , Chenguang Yu , Dapeng Shi , Dingyuan Hu , Enle Liu , Gang Yu , Ge Yang , Guanzhe Huang , Gulin Yan , Haiyang Feng , Hao Nie , Haonan Jia , Hanpeng Hu , Hanqi Chen , Haolong Yan , Heng Wang , Hongcheng Guo , Huilin Xiong , Huixin Xiong , Jiahao Gong , Jianchang Wu , Jiaoren Wu , Jie Wu , Jie Yang , Jiashuai Liu , Jiashuo Li , Jingyang Zhang , Junjing Guo , Junzhe Lin , Kaixiang Li , Lei Liu , Lei Xia , Liang Zhao , Liguo Tan , Liwen Huang , Liying Shi , Ming Li , Mingliang Li , Muhua Cheng , Na Wang , Qiaohui Chen , Qinglin He , Qiuyan Liang , Quan Sun , Ran Sun , Rui Wang , Shaoliang Pang , Shiliang Yang , Sitong Liu , Siqi Liu , Shuli Gao , Tiancheng Cao , Tianyu Wang , Weipeng Ming , Wenqing He , Xu Zhao , Xuelin Zhang , Xianfang Zeng , Xiaojia Liu , Xuan Yang , Yaqi Dai , Yanbo Yu , Yang Li , Yineng Deng , Yingming Wang , Yilei Wang , Yuanwei Lu , Yu Chen , Yu Luo , Yuchu Luo , Yuhe Yin , Yuheng Feng , Yuxiang Yang , Zecheng Tang , Zekai Zhang , Zidong Yang , Binxing Jiao , Jiansheng Chen , Jing Li , Shuchang Zhou , Xiangyu Zhang , Xinhao Zhang , Yibo Zhu , Heung-Yeung Shum , Daxin Jiang

Learning whole-body mobile manipulation via imitation is essential for generalizing robotic skills to diverse environments and complex tasks. However, this goal is hindered by significant challenges, particularly in effectively processing…

Robotics · Computer Science 2025-09-29 Yue Su , Chubin Zhang , Sijin Chen , Liufan Tan , Yansong Tang , Jianan Wang , Xihui Liu

Visual sensors serve as a critical component of the Internet of Things (IoT). There is an ever-increasing demand for broad applications and higher resolutions of videos and cameras in smart homes and smart cities, such as in security…

Image and Video Processing · Electrical Eng. & Systems 2021-03-30 Amir Fotovvat , Khan A. Wahid

The application of Large Vision-Language Models (LVLMs) for analyzing images and videos is an exciting and rapidly evolving field. In recent years, we've seen significant growth in high-quality image-text datasets for fine-tuning image…

Computer Vision and Pattern Recognition · Computer Science 2024-12-13 Han Wang , Yuxiang Nie , Yongjie Ye , Deng GuanYu , Yanjie Wang , Shuai Li , Haiyang Yu , Jinghui Lu , Can Huang

Variational autoencoders (VAEs) are powerful deep generative models widely used to represent high-dimensional complex data through a low-dimensional latent space learned in an unsupervised manner. In the original VAE model, the input data…

Machine Learning · Computer Science 2022-07-05 Laurent Girin , Simon Leglaive , Xiaoyu Bie , Julien Diard , Thomas Hueber , Xavier Alameda-Pineda

In this paper, we propose a novel variable rate deep compression architecture that operates on raw 3D point cloud data. The majority of learning-based point cloud compression methods work on a downsampled representation of the data.…

Computer Vision and Pattern Recognition · Computer Science 2022-05-17 Md Ahmed Al Muzaddid , William J. Beksi

Large Vision-Language Models (VLMs) have been extended to understand both images and videos. Visual token compression is leveraged to reduce the considerable token length of visual inputs. To meet the needs of different tasks, existing…

Computer Vision and Pattern Recognition · Computer Science 2024-12-13 Chenyu Yang , Xuan Dong , Xizhou Zhu , Weijie Su , Jiahao Wang , Hao Tian , Zhe Chen , Wenhai Wang , Lewei Lu , Jifeng Dai

Dynamism is common in AI computation, e.g., the dynamic tensor shapes and the dynamic control flows in models. Due to the long compilation time, existing runtime compilation damages the model efficiency, while the offline compilers either…

Programming Languages · Computer Science 2026-04-03 Jingzhi Fang , Xiong Gao , Renwei Zhang , Zichun Ye , Lei Chen , Jie Zhao , Chengnuo Huang , Hui Xu , Xuefeng Jin

We introduce Motion-I2V, a novel framework for consistent and controllable image-to-video generation (I2V). In contrast to previous methods that directly learn the complicated image-to-video mapping, Motion-I2V factorizes I2V into two…

Computer Vision and Pattern Recognition · Computer Science 2024-02-01 Xiaoyu Shi , Zhaoyang Huang , Fu-Yun Wang , Weikang Bian , Dasong Li , Yi Zhang , Manyuan Zhang , Ka Chun Cheung , Simon See , Hongwei Qin , Jifeng Dai , Hongsheng Li

Recent advancements in image-to-video (I2V) generation have shown promising performance in conventional scenarios. However, these methods still encounter significant challenges when dealing with complex scenes that require a deep…

Computer Vision and Pattern Recognition · Computer Science 2025-06-04 Peng Liu , Xiaoming Ren , Fengkai Liu , Qingsong Xie , Quanlong Zheng , Yanhao Zhang , Haonan Lu , Yujiu Yang
‹ Prev 1 2 3 10 Next ›