Yichen Han — Scifaro

MAPGD: Multi-Agent Prompt Gradient Descent for Collaborative Prompt Optimization

Prompt engineering is crucial for fully leveraging large language models (LLMs), yet most existing optimization methods follow a single trajectory, resulting in limited adaptability, gradient conflicts, and high computational overhead. We…

Artificial Intelligence · Computer Science 2026-02-04 Yichen Han , Yuhang Han , Siteng Huang , Guanyu Liu , Zhengpeng Zhou , Bojun Liu , Yujia Zhang , Isaac N Shi , Lewei He , Tianyu Shi

Comprehend and Talk: Text to Speech Synthesis via Dual Language Modeling

Existing Large Language Model (LLM) based autoregressive (AR) text-to-speech (TTS) systems, while achieving state-of-the-art quality, still face critical challenges. The foundation of this LLM-based paradigm is the discretization of the…

Sound · Computer Science 2025-09-29 Junjie Cao , Yichen Han , Ruonan Zhang , Xiaoyang Hao , Hongxiang Li , Shuaijiang Zhao , Yue Liu , Xiao-Ping Zhng

MBCodec:Thorough disentangle for high-fidelity audio compression

High-fidelity neural audio codecs in Text-to-speech (TTS) aim to compress speech signals into discrete representations for faithful reconstruction. However, prior approaches faced challenges in effectively disentangling acoustic and…

Sound · Computer Science 2025-09-23 Ruonan Zhang , Xiaoyang Hao , Yichen Han , Junjie Cao , Yue Liu , Kai Zhang

Quantize More, Lose Less: Autoregressive Generation from Residually Quantized Speech Representations

Text-to-speech (TTS) synthesis has seen renewed progress under the discrete modeling paradigm. Existing autoregressive approaches often rely on single-codebook representations, which suffer from significant information loss. Even with…

Sound · Computer Science 2025-07-17 Yichen Han , Xiaoyang Hao , Keming Chen , Weibo Xiong , Jun He , Ruonan Zhang , Junjie Cao , Yue Liu , Bowen Li , Dongrui Zhang , Hui Xia , Huilei Fu , Kai Jia , Kaixuan Guo , Mingli Jin , Qingyun Meng , Ruidong Ma , Ruiqian Fang , Shaotong Guo , Xuhui Li , Yang Xiang , Ying Zhang , Yulong Liu , Yunfeng Li , Yuyi Zhang , Yuze Zhou , Zhen Wang , Zhaowen Chen

How do Older Adults Set Up Voice Assistants? Lessons Learned from a Deployment Experience for Older Adults to Set Up Standalone Voice Assistants

While standalone Voice Assistants (VAs) are promising to support older adults' daily routine and wellbeing management, onboarding and setting up these devices can be challenging. Although some older adults choose to seek assistance from…

Human-Computer Interaction · Computer Science 2024-03-15 Chen Chen , Ella T. Lifset , Yichen Han , Arkajyoti Roy , Michael Hogarth , Alison A. Moore , Emilia Farcas , Nadir Weibel

Frame-level emotional state alignment method for speech emotion recognition

Speech emotion recognition (SER) systems aim to recognize human emotional state during human-computer interaction. Most existing SER systems are trained based on utterance-level labels. However, not all frames in an audio have affective…

Sound · Computer Science 2023-12-29 Qifei Li , Yingming Gao , Cong Wang , Yayue Deng , Jinlong Xue , Yichen Han , Ya Li

CONCSS: Contrastive-based Context Comprehension for Dialogue-appropriate Prosody in Conversational Speech Synthesis

Conversational speech synthesis (CSS) incorporates historical dialogue as supplementary information with the aim of generating speech that has dialogue-appropriate prosody. While previous methods have already delved into enhancing context…

Computation and Language · Computer Science 2023-12-19 Yayue Deng , Jinlong Xue , Yukang Jia , Qifei Li , Yichen Han , Fengping Wang , Yingming Gao , Dengfeng Ke , Ya Li

Screen or No Screen? Lessons Learnt from a Real-World Deployment Study of Using Voice Assistants With and Without Touchscreen for Older Adults

While voice user interfaces offer increased accessibility due to hands-free and eyes-free interactions, older adults often have challenges such as constructing structured requests and perceiving how such devices operate. Voice-first user…

Human-Computer Interaction · Computer Science 2023-07-18 Chen Chen , Ella T. Lifset , Yichen Han , Arkajyoti Roy , Michael Hogarth , Alison A. Moore , Emilia Farcas , Nadir Weibel

A Keypoint Based Enhancement Method for Audio Driven Free View Talking Head Synthesis

Audio driven talking head synthesis is a challenging task that attracts increasing attention in recent years. Although existing methods based on 2D landmarks or 3D face models can synthesize accurate lip synchronization and rhythmic head…

Computer Vision and Pattern Recognition · Computer Science 2022-10-10 Yichen Han , Ya Li , Yingming Gao , Jinlong Xue , Songpo Wang , Lei Yang

Towards Visualization of Time-Series Ecological Momentary Assessment (EMA) Data on Standalone Voice-First Virtual Assistants

Population aging is an increasingly important consideration for health care in the 21th century, and continuing to have access and interact with digital health information is a key challenge for aging populations. Voice-based Intelligent…

Human-Computer Interaction · Computer Science 2022-08-02 Yichen Han , Christopher Bo Han , Chen Chen , Peng Wei Lee , Michael Hogarth , Alison A. Moore , Nadir Weibel , Emilia Farcas

ECAPA-TDNN for Multi-speaker Text-to-speech Synthesis

In recent years, neural network based methods for multi-speaker text-to-speech synthesis (TTS) have made significant progress. However, the current speaker encoder models used in these methods still cannot capture enough speaker…

Sound · Computer Science 2022-03-29 Jinlong Xue , Yayue Deng , Yichen Han , Ya Li , Jianqing Sun , Jiaen Liang