English
Related papers

Related papers: GenClaw: Code-Driven Agentic Image Generation

200 papers

We introduce GenAgent, unifying visual understanding and generation through an agentic multimodal model. Unlike unified models that face expensive training costs and understanding-generation trade-offs, GenAgent decouples these capabilities…

Computer Vision and Pattern Recognition · Computer Science 2026-01-29 Kaixun Jiang , Yuzheng Wang , Junjie Zhou , Pandeng Li , Zhihang Liu , Chen-Wei Xie , Zhaoyu Chen , Yun Zheng , Wenqiang Zhang

Generative art unlocks boundless creative possibilities, yet its full potential remains untapped due to the technical expertise required for advanced architectural concepts and computational workflows. To bridge this gap, we present…

Computer Vision and Pattern Recognition · Computer Science 2025-09-11 Duc-Hung Nguyen , Huu-Phuc Huynh , Minh-Triet Tran , Trung-Nghia Le

Despite the success achieved by existing image generation and editing methods, current models still struggle with complex problems including intricate text prompts, and the absence of verification and self-correction mechanisms makes the…

Computer Vision and Pattern Recognition · Computer Science 2024-10-29 Zhenyu Wang , Aoxue Li , Zhenguo Li , Xihui Liu

Text-to-video generation models have shown significant progress in the recent years. However, they still struggle with generating complex dynamic scenes based on compositional text prompts, such as attribute binding for multiple objects,…

Computer Vision and Pattern Recognition · Computer Science 2024-12-06 Kaiyi Huang , Yukun Huang , Xuefei Ning , Zinan Lin , Yu Wang , Xihui Liu

Existing multi-agent video generation systems use LLM agents to orchestrate neural video generators, producing visually impressive but semantically unreliable outputs with no ground truth annotations. We present an agentic system that…

Computer Vision and Pattern Recognition · Computer Science 2026-04-14 Nicolae Cudlenco , Mihai Masala , Marius Leordeanu

Generative Adversarial Networks (GANs) are currently an indispensable tool for visual editing, being a standard component of image-to-image translation and image restoration pipelines. Furthermore, GANs are especially useful for…

Machine Learning · Computer Science 2021-04-22 Anton Cherepkov , Andrey Voynov , Artem Babenko

Recent visual generation models have made major progress in photorealism, typography, instruction following, and interactive editing, yet they still struggle with spatial reasoning, persistent state, long-horizon consistency, and causal…

Text-to-image generation is conducted through Generative Adversarial Networks (GANs) or transformer models. However, the current challenge lies in accurately generating images based on textual descriptions, especially in scenarios where the…

Human-Computer Interaction · Computer Science 2024-01-10 Yang Li , Huaqiang Jiang , Yangkai Wu

Photo-realistic and controllable 3D avatars are crucial for various applications such as virtual and mixed reality (VR/MR), telepresence, gaming, and film production. Traditional methods for avatar creation often involve time-consuming…

Computer Vision and Pattern Recognition · Computer Science 2024-08-27 Keqiang Sun , Amin Jourabloo , Riddhish Bhalodia , Moustafa Meshry , Yu Rong , Zhengyu Yang , Thu Nguyen-Phuoc , Christian Haene , Jiu Xu , Sam Johnson , Hongsheng Li , Sofien Bouaziz

Existing LLM agents for computational materials science are constrained by pipeline-bounded architectures tied to specific simulation codes and by dependence on manually written tool functions that grow with task scope. We present MatClaw,…

Materials Science · Physics 2026-05-25 Chenmu Zhang , Boris I. Yakobson

Large language model (LLM)-based agents that reason, plan, and act through tools, memory, and structured interaction are emerging as a promising paradigm for automating complex workflows. Recent systems such as OpenClaw and Claude Code…

Information Retrieval · Computer Science 2026-05-27 Yingli Zhou , Wang Shu , Yaodong Su , Wenchuan Du , Yixiang Fang , Xuemin Lin

Autonomous agents powered by Large Language Models are transforming AI, creating an imperative for the visualization field to embrace agentic frameworks. However, our field's focus on a human in the sensemaking loop raises critical…

Human-Computer Interaction · Computer Science 2025-09-17 Vaishali Dhanoa , Anton Wolter , Gabriela Molina León , Hans-Jörg Schulz , Niklas Elmqvist

The field of advanced text-to-image generation is witnessing the emergence of unified frameworks that integrate powerful text encoders, such as CLIP and T5, with Diffusion Transformer backbones. Although there have been efforts to control…

Computer Vision and Pattern Recognition · Computer Science 2025-02-28 Liang Chen , Shuai Bai , Wenhao Chai , Weichu Xie , Haozhe Zhao , Leon Vinci , Junyang Lin , Baobao Chang

Computer-Aided Design (CAD) is widely used for conceptual design and parametric 3D modeling, but typically requires a high level of expertise from designers. To lower the entry barrier and facilitate early-stage CAD modeling, we present…

Artificial Intelligence · Computer Science 2026-05-20 Fengxiao Fan , Jingzhe Ni , Xiaolong Yin , Sirui Wang , Xingyu Lu , Qiang Zou , Ruofeng Tong , Min Tang , Peng Du

While text-to-image generation has achieved unprecedented fidelity, the vast majority of existing models function fundamentally as static text-to-pixel decoders. Consequently, they often fail to grasp implicit user intentions. Although…

Computer Vision and Pattern Recognition · Computer Science 2026-02-03 Jun He , Junyan Ye , Zilong Huang , Dongzhi Jiang , Chenjue Zhang , Leqi Zhu , Renrui Zhang , Xiang Zhang , Weijia Li

Recent deep generative models are able to provide photo-realistic images as well as visual or textual content embeddings useful to address various tasks of computer vision and natural language processing. Their usefulness is nevertheless…

Machine Learning · Computer Science 2020-01-29 Antoine Plumerault , Hervé Le Borgne , Céline Hudelot

Large Language Model (LLM) based agents are powerful yet fundamentally static after deployment, lacking the ability to autonomously expand capabilities, generate new tools, or evolve their reasoning. This work introduces a hierarchical…

Computation and Language · Computer Science 2026-01-21 Indrajit Kar , Sammy Zonunpuia , Zonunfeli Ralte

Automating the transformation of user interface (UI) designs into front-end code holds significant promise for accelerating software development and democratizing design workflows. While multimodal large language models (MLLMs) can…

Computer Vision and Pattern Recognition · Computer Science 2025-10-21 Yilei Jiang , Yaozhi Zheng , Yuxuan Wan , Jiaming Han , Qunzhong Wang , Michael R. Lyu , Xiangyu Yue

Generative AI is reshaping how software is designed, written, and maintained. Advances in large language models (LLMs) are enabling new development styles - from chat-oriented programming and 'vibe coding' to agentic programming - that can…

Software Engineering · Computer Science 2025-10-14 Vivek Acharya

Generative Adversarial Networks (GANs) are the driving force behind the state-of-the-art in image generation. Despite their ability to synthesize high-resolution photo-realistic images, generating content with on-demand conditioning of…

Computer Vision and Pattern Recognition · Computer Science 2021-12-28 Markos Georgopoulos , James Oldfield , Grigorios G Chrysos , Yannis Panagakis
‹ Prev 1 2 3 10 Next ›