English
Related papers

Related papers: ReactGenie: A Development Framework for Complex Mu…

200 papers

Verbal and non-verbal human reaction generation is a challenging task, as different reactions could be appropriate for responding to the same behaviour. This paper proposes the first multiple and multimodal (verbal and nonverbal)…

Computer Vision and Pattern Recognition · Computer Science 2023-07-07 Jiaqi Xu , Cheng Luo , Weicheng Xie , Linlin Shen , Xiaofeng Liu , Lu Liu , Hatice Gunes , Siyang Song

Human behaviors in real-world environments are inherently interactive, with an individual's motion shaped by surrounding agents and the scene. Such capabilities are essential for applications in virtual avatars, interactive animation, and…

Computer Vision and Pattern Recognition · Computer Science 2026-04-02 Yaoqin Ye , Yiteng Xu , Qin Sun , Xinge Zhu , Yujing Sun , Yuexin Ma

AutoGen is an open-source framework that allows developers to build LLM applications via multiple agents that can converse with each other to accomplish tasks. AutoGen agents are customizable, conversable, and can operate in various modes…

Code generation models based on large language models (LLMs) have gained wide adoption, but challenges remain in ensuring safety, accuracy, and controllability, especially for complex tasks. Existing methods often lack dynamic integration…

Software Engineering · Computer Science 2025-10-13 Aofan Liu , Haoxuan Li , Bin Wang , Ao Yang , Hui Li

Large language models (LLMs) are increasingly seen as assistants, copilots, and consultants, capable of supporting a wide range of tasks through natural conversation. However, most systems remain constrained by a linear request-response…

Computation and Language · Computer Science 2026-05-05 Jiaqi Chen , Yanzhe Zhang , Yutong Zhang , Yijia Shao , Diyi Yang

Large language models (LLMs) are increasingly used for complex tasks that require multiple generation calls, advanced prompting techniques, control flow, and structured inputs/outputs. However, efficient systems are lacking for programming…

In this paper we present a framework for creating natural language interfaces to action-based applications. Our framework uses a number of reusable application-independent components, in order to reduce the effort of creating a natural…

Computation and Language · Computer Science 2007-05-23 Stephen Chong , Riccardo Pucella

Visual presentations are vital for effective communication. Early attempts to automate their creation using deep learning often faced issues such as poorly organized layouts, inaccurate text summarization, and a lack of image understanding,…

Machine Learning · Computer Science 2025-09-03 Xiaojie Xu , Xinli Xu , Sirui Chen , Haoyu Chen , Fan Zhang , Ying-Cong Chen

Large language model (LLM) agents often suffer from high reasoning overhead, excessive token consumption, unstable execution, and inability to reuse past experiences in complex tasks like business queries, tool use, and workflow…

Machine Learning · Computer Science 2026-04-23 Ruocan Wei , Shufeng Wang , Ziwei Shi

Conversational user interfaces powered by large language models (LLMs) have significantly lowered the technical barriers to database querying. However, existing tools still encounter several challenges, such as misinterpretation of user…

Human-Computer Interaction · Computer Science 2025-08-22 Longfei Chen , Shenghan Gao , Shiwei Wang , Ken Lin , Yun Wang , Quan Li

Large-scale, high-quality interaction trajectories are essential for advancing mobile Graphical User Interface (GUI) agents. While existing methods typically rely on labor-intensive human demonstrations or automated model exploration to…

Artificial Intelligence · Computer Science 2026-02-02 Linjia Kang , Zhimin Wang , Yongkang Zhang , Duo Wu , Jinghe Wang , Ming Ma , Haopeng Yan , Zhi Wang

Supporting voice commands in applications presents significant benefits to users. However, adding such support to existing GUI-based web apps is effort-consuming with a high learning barrier, as shown in our formative study, due to the lack…

Human-Computer Interaction · Computer Science 2020-07-21 Ritam Jyoti Sarmah , Yunpeng Ding , Di Wang , Cheuk Yin Phipson Lee , Toby Jia-Jun Li , Xiang 'Anthony' Chen

With the advancement of Multimodal Large Language Models (MLLM), LLM-driven visual agents are increasingly impacting software interfaces, particularly those with graphical user interfaces. This work introduces a novel LLM-based multimodal…

Human-Computer Interaction · Computer Science 2025-09-18 Yanda Li , Chi Zhang , Wenjia Jiang , Wanqi Yang , Bin Fu , Pei Cheng , Xin Chen , Ling Chen , Yunchao Wei

Existing multimodal generative models fall short as qualified design copilots, as they often struggle to generate imaginative outputs once instructions are less detailed or lack the ability to maintain consistency with the provided…

Computer Vision and Pattern Recognition · Computer Science 2025-03-11 Zhipeng Huang , Shaobin Zhuang , Canmiao Fu , Binxin Yang , Ying Zhang , Chong Sun , Zhizheng Zhang , Yali Wang , Chen Li , Zheng-Jun Zha

Large language models (LLMs) have taken the scientific world by storm, changing the landscape of natural language processing and human-computer interaction. These powerful tools can answer complex questions and, surprisingly, perform…

Artificial Intelligence · Computer Science 2023-11-14 Pier Luca Lanzi , Daniele Loiacono

An interactive robot framework accomplishes long-horizon task planning and can easily generalize to new goals and distinct tasks, even during execution. However, most traditional methods require predefined module design, making it hard to…

Robotics · Computer Science 2025-02-11 Boyi Li , Philipp Wu , Pieter Abbeel , Jitendra Malik

Simulation is an invaluable tool for developing and evaluating controllers for self-driving cars. Current simulation frameworks are driven by highly-specialist domain specific languages, and so a natural language interface would greatly…

Artificial Intelligence · Computer Science 2023-10-27 Antonio Valerio Miceli-Barone , Alex Lascarides , Craig Innes

Humans solve problems by executing targeted plans, yet large language models (LLMs) remain unreliable for structured workflow execution. We propose RunAgent, a multi-agent plan execution platform that interprets natural-language plans while…

Machine Learning · Computer Science 2026-05-04 Arunabh Srivastava , Mohammad A. , Khojastepour , Srimat Chakradhar , Sennur Ulukus

Tool-augmented large language models (LLMs) have achieved remarkable progress in tackling a broad range of tasks. However, existing methods are mainly restricted to specifically designed tools and fail to fulfill complex instructions,…

Computation and Language · Computer Science 2023-08-29 Yifan Song , Weimin Xiong , Dawei Zhu , Wenhao Wu , Han Qian , Mingbo Song , Hailiang Huang , Cheng Li , Ke Wang , Rong Yao , Ye Tian , Sujian Li

In dyadic interaction, predicting the listener's facial reactions is challenging as different reactions could be appropriate in response to the same speaker's behaviour. Previous approaches predominantly treated this task as an…

Computer Vision and Pattern Recognition · Computer Science 2024-11-05 Cheng Luo , Siyang Song , Weicheng Xie , Micol Spitale , Zongyuan Ge , Linlin Shen , Hatice Gunes
‹ Prev 1 2 3 10 Next ›