English
Related papers

Related papers: A new approach for encoding code and assisting cod…

200 papers

As generative technologies advance, visual content has evolved into a complex mix of natural and AI-generated images, driving the need for more efficient coding techniques that prioritize perceptual quality. Traditional codecs and learned…

Computer Vision and Pattern Recognition · Computer Science 2025-09-18 Jianhui Chang

Current text-to-image generation models often struggle to follow textual instructions, especially the ones requiring spatial reasoning. On the other hand, Large Language Models (LLMs), such as GPT-4, have shown remarkable precision in…

Computer Vision and Pattern Recognition · Computer Science 2023-05-31 Tianjun Zhang , Yi Zhang , Vibhav Vineet , Neel Joshi , Xin Wang

Recent progress in text-to-image (T2I) diffusion models (DMs) has enabled high-quality visual synthesis from diverse textual prompts. Yet, most existing T2I DMs, even those equipped with large language model (LLM)-based text encoders,…

Computer Vision and Pattern Recognition · Computer Science 2026-01-16 Siqi Kou , Jiachun Jin , Zetong Zhou , Ye Ma , Yugang Wang , Quan Chen , Peng Jiang , Xiao Yang , Jun Zhu , Kai Yu , Zhijie Deng

Vision-language models such as CLIP have shown impressive capabilities in encoding texts and images into aligned embeddings, enabling the retrieval of multimodal data in a shared embedding space. However, these embedding-based models still…

Computer Vision and Pattern Recognition · Computer Science 2024-01-23 Timothy Ossowski , Ming Jiang , Junjie Hu

The "Thinking with Text" and "Thinking with Images" paradigms significantly improve the reasoning abilities of large language models (LLMs) and Vision-Language Models (VLMs). However, these paradigms have inherent limitations. (1) Images…

Computer Vision and Pattern Recognition · Computer Science 2026-04-08 Jingqi Tong , Yurong Mou , Hangcheng Li , Mingzhe Li , Yongzhuo Yang , Ming Zhang , Qiguang Chen , Tianyi Liang , Xiaomeng Hu , Yining Zheng , Xinchi Chen , Jun Zhao , Xuanjing Huang , Xipeng Qiu

The excellent generative capabilities of text-to-image diffusion models suggest they learn informative representations of image-text data. However, what knowledge their representations capture is not fully understood, and they have not been…

Computer Vision and Pattern Recognition · Computer Science 2023-09-07 Kevin Clark , Priyank Jaini

Autoregressive language models like GPT aim to predict next tokens, while autoencoding models such as BERT are trained on tasks such as predicting masked tokens. We train a decoder-only architecture for predicting the second to last token…

Computation and Language · Computer Science 2025-02-17 Johannes Schneider

LLMs have become the mainstream approaches to code generation. Existing LLMs mainly employ autoregressive generation, i.e. generating code token-by-token from left to right. However, the underlying autoregressive generation has two…

Software Engineering · Computer Science 2025-11-04 Chengze Li , Yitong Zhang , Jia Li , Liyi Cai , Ge Li

Text-to-image diffusion models enable high-quality image generation but are computationally expensive. While prior work optimizes per-inference efficiency, we explore an orthogonal approach: reducing redundancy across correlated prompts.…

Computer Vision and Pattern Recognition · Computer Science 2025-08-29 Dale Decatur , Thibault Groueix , Wang Yifan , Rana Hanocka , Vladimir Kim , Matheus Gadelha

We observe that the mapping between an image's representation in one model to its representation in another can be learned surprisingly well with just a linear layer, even across diverse models. Building on this observation, we propose…

Computer Vision and Pattern Recognition · Computer Science 2023-05-12 Mazda Moayeri , Keivan Rezaei , Maziar Sanjabi , Soheil Feizi

Despite impressive recent advances in text-to-image diffusion models, obtaining high-quality images often requires prompt engineering by humans who have developed expertise in using them. In this work, we present NeuroPrompts, an adaptive…

Artificial Intelligence · Computer Science 2024-04-09 Shachar Rosenman , Vasudev Lal , Phillip Howard

Personalized text-to-image models allow users to generate varied styles of images (specified with a sentence) for an object (specified with a set of reference images). While remarkable results have been achieved using diffusion-based…

Computer Vision and Pattern Recognition · Computer Science 2024-07-19 Fanyue Wei , Wei Zeng , Zhenyang Li , Dawei Yin , Lixin Duan , Wen Li

Autoregressive language models dominate modern text generation, yet their sequential nature introduces fundamental limitations: decoding is slow, and maintaining global coherence remains challenging. Diffusion models offer a promising…

Computation and Language · Computer Science 2026-01-06 Viacheslav Meshchaninov , Egor Chimbulatov , Alexander Shabalin , Aleksandr Abramov , Dmitry Vetrov

The quality of the prompts provided to text-to-image diffusion models determines how faithful the generated content is to the user's intent, often requiring `prompt engineering'. To harness visual concepts from target images without prompt…

Computer Vision and Pattern Recognition · Computer Science 2023-12-20 Shweta Mahajan , Tanzila Rahman , Kwang Moo Yi , Leonid Sigal

Contrastive pretrained large Vision-Language Models (VLMs) like CLIP have revolutionized visual representation learning by providing good performance on downstream datasets. VLMs are 0-shot adapted to a downstream dataset by designing…

Computer Vision and Pattern Recognition · Computer Science 2023-08-09 Mayug Maniparambil , Chris Vorster , Derek Molloy , Noel Murphy , Kevin McGuinness , Noel E. O'Connor

Zero-shot, training-free, image-based text-to-video generation is an emerging area that aims to generate videos using existing image-based diffusion models. Current methods in this space require specific architectural changes to image…

Computer Vision and Pattern Recognition · Computer Science 2025-04-10 Diljeet Jagpal , Xi Chen , Vinay P. Namboodiri

GPT has shown its remarkable success in natural language processing. However, the language sequence is not sufficient to describe spatial-temporal details in the visual world. Alternatively, the video sequence is good at capturing such…

Computer Vision and Pattern Recognition · Computer Science 2025-05-22 Shaobin Zhuang , Zhipeng Huang , Ying Zhang , Fangyikang Wang , Canmiao Fu , Binxin Yang , Chong Sun , Chen Li , Yali Wang

Diffusion models have gained tremendous success in text-to-image generation, yet still lag behind with visual understanding tasks, an area dominated by autoregressive vision-language models. We propose a large-scale and fully end-to-end…

Computer Vision and Pattern Recognition · Computer Science 2025-04-03 Zijie Li , Henry Li , Yichun Shi , Amir Barati Farimani , Yuval Kluger , Linjie Yang , Peng Wang

Text-to-image generation models have progressed considerably in recent years, which can now generate impressive realistic images from arbitrary text. Most of such models are trained on web-scale image-text paired datasets, which may not be…

Computer Vision and Pattern Recognition · Computer Science 2022-10-26 Yufan Zhou , Chunyuan Li , Changyou Chen , Jianfeng Gao , Jinhui Xu

This paper enhances image-GPT (iGPT), one of the pioneering works that introduce autoregressive pretraining to predict the next pixels for visual representation learning. Two simple yet essential changes are made. First, we shift the…

Computer Vision and Pattern Recognition · Computer Science 2024-07-08 Sucheng Ren , Zeyu Wang , Hongru Zhu , Junfei Xiao , Alan Yuille , Cihang Xie
‹ Prev 1 2 3 10 Next ›