English
Related papers

Related papers: DiffusionRet: Generative Text-Video Retrieval with…

200 papers

Existing audio-text retrieval (ATR) methods are essentially discriminative models that aim to maximize the conditional likelihood, represented as p(candidates|query). Nevertheless, this methodology fails to consider the intrinsic data…

Sound · Computer Science 2024-10-18 Yifei Xin , Xuxin Cheng , Zhihong Zhu , Xusheng Yang , Yuexian Zou

Video moment retrieval and highlight detection have received attention in the current era of video content proliferation, aiming to localize moments and estimate clip relevances based on user-specific queries. Given that the video content…

Computer Vision and Pattern Recognition · Computer Science 2024-03-05 Henghao Zhao , Kevin Qinghong Lin , Rui Yan , Zechao Li

The recent wave of large-scale text-to-image diffusion models has dramatically increased our text-based image generation abilities. These models can generate realistic images for a staggering variety of prompts and exhibit impressive…

Machine Learning · Computer Science 2023-09-14 Alexander C. Li , Mihir Prabhudesai , Shivam Duggal , Ellis Brown , Deepak Pathak

Text-conditioned image generation models have recently shown immense qualitative success using denoising diffusion processes. However, unlike discriminative vision-and-language models, it is a non-trivial task to subject these…

Computer Vision and Pattern Recognition · Computer Science 2023-11-06 Benno Krojer , Elinor Poole-Dayan , Vikram Voleti , Christopher Pal , Siva Reddy

Diffusion models have demonstrated significant potential in achieving state-of-the-art performance across various text generation tasks. In this systematic study, we investigate their application to the table-to-text problem by adapting the…

Computation and Language · Computer Science 2024-09-24 Aleksei S. Krylov , Oleg D. Somov

Recently, diffusion models have emerged as a new paradigm for generative models. Despite the success in domains using continuous signals such as vision and audio, adapting diffusion models to natural language is under-explored due to the…

Computation and Language · Computer Science 2023-02-15 Shansan Gong , Mukai Li , Jiangtao Feng , Zhiyong Wu , Lingpeng Kong

We present DiffusionBERT, a new generative masked language model based on discrete diffusion models. Diffusion models and many pre-trained language models have a shared training objective, i.e., denoising, making it possible to combine the…

Computation and Language · Computer Science 2022-12-02 Zhengfu He , Tianxiang Sun , Kuanning Wang , Xuanjing Huang , Xipeng Qiu

Video diffusion models are able to generate high-quality videos by learning strong spatial-temporal priors on large-scale datasets. In this paper, we aim to investigate whether such priors derived from a generative process are suitable for…

Computer Vision and Pattern Recognition · Computer Science 2024-11-13 Zejia Weng , Xitong Yang , Zhen Xing , Zuxuan Wu , Yu-Gang Jiang

Diffusion-based generative models have had a high impact on the computer vision and speech processing communities these past years. Besides data generation tasks, they have also been employed for data restoration tasks like speech…

Audio and Speech Processing · Electrical Eng. & Systems 2023-03-17 Jean-Marie Lemercier , Julius Richter , Simon Welker , Timo Gerkmann

Multimodal contrastive models have achieved strong performance in text-audio retrieval and zero-shot settings, but improving joint embedding spaces remains an active research area. Less attention has been given to making these systems…

Sound · Computer Science 2025-06-25 Julien Guinot , Elio Quinton , György Fazekas

Partially Relevant Video Retrieval (PRVR) aims to retrieve untrimmed videos based on text queries that describe only partial events. Existing methods suffer from incomplete global contextual perception, struggling with query ambiguity and…

Computer Vision and Pattern Recognition · Computer Science 2026-04-07 Jun Li , Xuhang Lou , Jinpeng Wang , Yuting Wang , Yaowei Wang , Shu-Tao Xia , Bin Chen

Generative models such as Generative Adversarial Networks (GANs) and Variational Auto-Encoders (VAEs) are widely utilized to model the generative process of user interactions. However, these generative models suffer from intrinsic…

Information Retrieval · Computer Science 2025-06-26 Wenjie Wang , Yiyan Xu , Fuli Feng , Xinyu Lin , Xiangnan He , Tat-Seng Chua

Generative diffusion models, famous for their performance in image generation, are popular in various cross-domain applications. However, their use in the communication community has been mostly limited to auxiliary tasks like data modeling…

Networking and Internet Architecture · Computer Science 2025-03-11 Ruihuai Liang , Bo Yang , Zhiwen Yu , Bin Guo , Xuelin Cao , Mérouane Debbah , H. Vincent Poor , Chau Yuen

Generating temporally coherent high fidelity video is an important milestone in generative modeling research. We make progress towards this milestone by proposing a diffusion model for video generation that shows very promising initial…

Computer Vision and Pattern Recognition · Computer Science 2022-06-24 Jonathan Ho , Tim Salimans , Alexey Gritsenko , William Chan , Mohammad Norouzi , David J. Fleet

Scene text detection techniques have garnered significant attention due to their wide-ranging applications. However, existing methods have a high demand for training data, and obtaining accurate human annotations is labor-intensive and…

Computer Vision and Pattern Recognition · Computer Science 2023-11-29 Ling Fu , Zijie Wu , Yingying Zhu , Yuliang Liu , Xiang Bai

In information retrieval (IR), learning-to-rank (LTR) methods have traditionally limited themselves to discriminative machine learning approaches that model the probability of the document being relevant to the query given some feature…

Information Retrieval · Computer Science 2026-02-13 Sajad Ebrahimi , Bhaskar Mitra , Negar Arabzadeh , Ye Yuan , Haolun Wu , Fattane Zarrinkalam , Ebrahim Bagheri

Learning from a large corpus of data, pre-trained models have achieved impressive progress nowadays. As popular generative pre-training, diffusion models capture both low-level visual knowledge and high-level semantic relations. In this…

Computer Vision and Pattern Recognition · Computer Science 2023-03-20 Chaofan Ma , Yuhuan Yang , Chen Ju , Fei Zhang , Jinxiang Liu , Yu Wang , Ya Zhang , Yanfeng Wang

While many unsupervised learning models focus on one family of tasks, either generative or discriminative, we explore the possibility of a unified representation learner: a model which addresses both families of tasks simultaneously. We…

Computer Vision and Pattern Recognition · Computer Science 2024-09-25 Soumik Mukhopadhyay , Matthew Gwilliam , Yosuke Yamaguchi , Vatsal Agarwal , Namitha Padmanabhan , Archana Swaminathan , Tianyi Zhou , Jun Ohya , Abhinav Shrivastava

Learning rewards from expert videos offers an affordable and effective solution to specify the intended behaviors for reinforcement learning (RL) tasks. In this work, we propose Diffusion Reward, a novel framework that learns rewards from…

Machine Learning · Computer Science 2024-08-12 Tao Huang , Guangqi Jiang , Yanjie Ze , Huazhe Xu

In this paper, we present VideoGen, a text-to-video generation approach, which can generate a high-definition video with high frame fidelity and strong temporal consistency using reference-guided latent diffusion. We leverage an…

Computer Vision and Pattern Recognition · Computer Science 2023-09-08 Xin Li , Wenqing Chu , Ye Wu , Weihang Yuan , Fanglong Liu , Qi Zhang , Fu Li , Haocheng Feng , Errui Ding , Jingdong Wang
‹ Prev 1 2 3 10 Next ›