Related papers: Behavior Optimized Image Generation

From Pixels to Feelings: Aligning MLLMs with Human Cognitive Perception of Images

While Multimodal Large Language Models (MLLMs) are adept at answering what is in an image-identifying objects and describing scenes-they often lack the ability to understand how an image feels to a human observer. This gap is most evident…

Computer Vision and Pattern Recognition · Computer Science 2025-12-01 Yiming Chen , Junlin Han , Tianyi Bai , Shengbang Tong , Filippos Kokkinos , Philip Torr

CTR-Driven Advertising Image Generation with Multimodal Large Language Models

In web data, advertising images are crucial for capturing user attention and improving advertising effectiveness. Most existing methods generate background for products primarily focus on the aesthetic quality, which may fail to achieve…

Machine Learning · Computer Science 2025-02-13 Xingye Chen , Wei Feng , Zhenbang Du , Weizhen Wang , Yanyin Chen , Haohan Wang , Linkai Liu , Yaoyu Li , Jinyuan Zhao , Yu Li , Zheng Zhang , Jingjing Lv , Junjie Shen , Zhangang Lin , Jingping Shao , Yuanjie Shao , Xinge You , Changxin Gao , Nong Sang

Explainable AI-Generated Image Detection RewardBench

Conventional, classification-based AI-generated image detection methods cannot explain why an image is considered real or AI-generated in a way a human expert would, which reduces the trustworthiness and persuasiveness of these detection…

Computer Vision and Pattern Recognition · Computer Science 2025-11-18 Michael Yang , Shijian Deng , William T. Doan , Kai Wang , Tianyu Yang , Harsh Singh , Yapeng Tian

MCGM: Mask Conditional Text-to-Image Generative Model

Recent advancements in generative models have revolutionized the field of artificial intelligence, enabling the creation of highly-realistic and detailed images. In this study, we propose a novel Mask Conditional Text-to-Image Generative…

Computer Vision and Pattern Recognition · Computer Science 2024-10-02 Rami Skaik , Leonardo Rossi , Tomaso Fontanini , Andrea Prati

HumanLLM: Towards Personalized Understanding and Simulation of Human Nature

Motivated by the remarkable progress of large language models (LLMs) in objective tasks like mathematics and coding, there is growing interest in their potential to simulate human behavior--a capability with profound implications for…

Computation and Language · Computer Science 2026-01-23 Yuxuan Lei , Tianfu Wang , Jianxun Lian , Zhengyu Hu , Defu Lian , Xing Xie

LMM4LMM: Benchmarking and Evaluating Large-multimodal Image Generation with LMMs

Recent breakthroughs in large multimodal models (LMMs) have significantly advanced both text-to-image (T2I) generation and image-to-text (I2T) interpretation. However, many generated images still suffer from issues related to perceptual…

Computer Vision and Pattern Recognition · Computer Science 2025-04-14 Jiarui Wang , Huiyu Duan , Yu Zhao , Juntong Wang , Guangtao Zhai , Xiongkuo Min

Generating Fine Details of Entity Interactions

Recent text-to-image models excel at generating high-quality object-centric images from instructions. However, images should also encapsulate rich interactions between objects, where existing models often fall short, likely due to limited…

Computer Vision and Pattern Recognition · Computer Science 2026-03-05 Xinyi Gu , Jiayuan Mao

GOBench: Benchmarking Geometric Optics Generation and Understanding of MLLMs

The rapid evolution of Multi-modality Large Language Models (MLLMs) is driving significant advancements in visual understanding and generation. Nevertheless, a comprehensive assessment of their capabilities, concerning the fine-grained…

Computer Vision and Pattern Recognition · Computer Science 2025-08-06 Xiaorong Zhu , Ziheng Jia , Jiarui Wang , Xiangyu Zhao , Haodong Duan , Xiongkuo Min , Jia Wang , Zicheng Zhang , Guangtao Zhai

ImageGem: In-the-wild Generative Image Interaction Dataset for Generative Model Personalization

We introduce ImageGem, a dataset for studying generative models that understand fine-grained individual preferences. We posit that a key challenge hindering the development of such a generative model is the lack of in-the-wild and…

Computer Vision and Pattern Recognition · Computer Science 2025-10-22 Yuanhe Guo , Linxi Xie , Zhuoran Chen , Kangrui Yu , Ryan Po , Guandao Yang , Gordon Wetztein , Hongyi Wen

LEGO: Learning EGOcentric Action Frame Generation via Visual Instruction Tuning

Generating instructional images of human daily actions from an egocentric viewpoint serves as a key step towards efficient skill transfer. In this paper, we introduce a novel problem -- egocentric action frame generation. The goal is to…

Computer Vision and Pattern Recognition · Computer Science 2024-03-25 Bolin Lai , Xiaoliang Dai , Lawrence Chen , Guan Pang , James M. Rehg , Miao Liu

Q-REAL: Towards Realism and Plausibility Evaluation for AI-Generated Content

Quality assessment of AI-generated content is crucial for evaluating model capability and guiding model optimization. However, most existing quality assessment datasets and models provide only a single quality score, which is too coarse to…

Computer Vision and Pattern Recognition · Computer Science 2026-04-02 Shushi Wang , Zicheng Zhang , Chunyi Li , Wei Wang , Liya Ma , Fengjiao Chen , Xiaoyu Li , Xuezhi Cao , Guangtao Zhai , Xiaohong Liu

ComfyGI: Automatic Improvement of Image Generation Workflows

Automatic image generation is no longer just of interest to researchers, but also to practitioners. However, current models are sensitive to the settings used and automatic optimization methods often require human involvement. To bridge…

Computer Vision and Pattern Recognition · Computer Science 2024-11-22 Dominik Sobania , Martin Briesch , Franz Rothlauf

Chain-of-Image Generation: Toward Monitorable and Controllable Image Generation

While state-of-the-art image generation models achieve remarkable visual quality, their internal generative processes remain a "black box." This opacity limits human observation and intervention, and poses a barrier to ensuring model…

Computer Vision and Pattern Recognition · Computer Science 2025-12-10 Young Kyung Kim , Oded Schlesinger , Yuzhou Zhao , J. Matias Di Martino , Guillermo Sapiro

Interactive Fashion Content Generation Using LLMs and Latent Diffusion Models

Fashionable image generation aims to synthesize images of diverse fashion prevalent around the globe, helping fashion designers in real-time visualization by giving them a basic customized structure of how a specific design preference would…

Computer Vision and Pattern Recognition · Computer Science 2023-06-14 Krishna Sri Ipsit Mantri , Nevasini Sasikumar

To See or To Read: User Behavior Reasoning in Multimodal LLMs

Multimodal Large Language Models (MLLMs) are reshaping how modern agentic systems reason over sequential user-behavior data. However, whether textual or image representations of user behavior data are more effective for maximizing MLLM…

Artificial Intelligence · Computer Science 2025-11-07 Tianning Dong , Luyi Ma , Varun Vasudevan , Jason Cho , Sushant Kumar , Kannan Achan

Prefill-Guided Thinking for zero-shot detection of AI-generated images

Traditional supervised methods for detecting AI-generated images depend on large, curated datasets for training and fail to generalize to novel, out-of-domain image generators. As an alternative, we explore pre-trained Vision-Language…

Machine Learning · Computer Science 2026-01-27 Zoher Kachwala , Danishjeet Singh , Danielle Yang , Filippo Menczer

BootPIG: Bootstrapping Zero-shot Personalized Image Generation Capabilities in Pretrained Diffusion Models

Recent text-to-image generation models have demonstrated incredible success in generating images that faithfully follow input prompts. However, the requirement of using words to describe a desired concept provides limited control over the…

Computer Vision and Pattern Recognition · Computer Science 2024-01-26 Senthil Purushwalkam , Akash Gokul , Shafiq Joty , Nikhil Naik

A Picture Tells a Thousand Words -- About You! User Interest Profiling from User Generated Visual Content

Inference of online social network users' attributes and interests has been an active research topic. Accurate identification of users' attributes and interests is crucial for improving the performance of personalization and recommender…

Social and Information Networks · Computer Science 2015-04-21 Quanzeng You , Sumit Bhatia , Jiebo Luo

LaPIG: Cross-Modal Generation of Paired Thermal and Visible Facial Images

The success of modern machine learning, particularly in facial translation networks, is highly dependent on the availability of high-quality, paired, large-scale datasets. However, acquiring sufficient data is often challenging and costly.…

Computer Vision and Pattern Recognition · Computer Science 2025-09-09 Leyang Wang , Joice Lin

From Pixels to Posts: Retrieval-Augmented Fashion Captioning and Hashtag Generation

This paper introduces the retrieval-augmented framework for automatic fashion caption and hashtag generation, combining multi-garment detection, attribute reasoning, and Large Language Model (LLM) prompting. The system aims to produce…

Computer Vision and Pattern Recognition · Computer Science 2025-11-25 Moazzam Umer Gondal , Hamad Ul Qudous , Daniya Siddiqui , Asma Ahmad Farhan