Junchen Fu — Scifaro

The 2nd EReL@MIR Workshop on Efficient Representation Learning for Multimodal Information Retrieval

Multimodal representation learning has attracted increasing attention in AI, driven by the strong performance of large, pretrained multimodal foundation models such as Qwen, LLaVA, and CLIP. These models deliver impressive performance on a…

Information Retrieval · Computer Science 2026-05-27 Junchen Fu , Xuri Ge , Xin Xin , Alexandros Karatzoglou , Ioannis Arapakis , Xi Wang , Qijiong Liu , Qian Li , Joemon M. Jose

Differentiable Semantic ID for Generative Recommendation

Generative recommendation provides a novel paradigm in which each item is represented by a discrete semantic ID (SID) learned from rich content. Most existing methods treat SIDs as predefined and train recommenders under static indexing. In…

Information Retrieval · Computer Science 2026-04-15 Junchen Fu , Xuri Ge , Alexandros Karatzoglou , Ioannis Arapakis , Suzan Verberne , Joemon M. Jose , Zhaochun Ren

Benchmarking Multimodal Large Language Models for Missing Modality Completion in Product Catalogues

Missing-modality information on e-commerce platforms, such as absent product images or textual descriptions, often arises from annotation errors or incomplete metadata, impairing both product presentation and downstream applications such as…

Multimedia · Computer Science 2026-01-29 Junchen Fu , Wenhao Deng , Kaiwen Zheng , Ioannis Arapakis , Yu Ye , Yongxin Ni , Joemon M. Jose , Xuri Ge

LLMPopcorn: Exploring LLMs as Assistants for Popular Micro-video Generation

In an era where micro-videos dominate platforms like TikTok and YouTube, AI-generated content is nearing cinematic quality. The next frontier is using large language models (LLMs) to autonomously create viral micro-videos, a largely…

Computation and Language · Computer Science 2026-01-27 Junchen Fu , Xuri Ge , Kaiwen Zheng , Alexandros Karatzoglou , Ioannis Arapakis , Xin Xin , Yongxin Ni , Joemon M. Jose

Are Multimodal Embeddings Truly Beneficial for Recommendation? A Deep Dive into Whole vs. Individual Modalities

Multimodal recommendation has emerged as a mainstream paradigm, typically leveraging text and visual embeddings extracted from pre-trained models such as Sentence-BERT, Vision Transformers, and ResNet. This approach is founded on the…

Information Retrieval · Computer Science 2026-01-19 Yu Ye , Junchen Fu , Yu Song , Kaiwen Zheng , Joemon M. Jose

Focal-RegionFace: Generating Fine-Grained Multi-attribute Descriptions for Arbitrarily Selected Face Focal Regions

In this paper, we introduce an underexplored problem in facial analysis: generating and recognizing multi-attribute natural language descriptions, containing facial action units (AUs), emotional states, and age estimation, for arbitrarily…

Computer Vision and Pattern Recognition · Computer Science 2026-01-05 Kaiwen Zheng , Junchen Fu , Songpei Xu , Yaoqing He , Joemon M. Jose , Han Hu , Xuri Ge

CROSSAN: Towards Efficient and Effective Adaptation of Multiple Multimodal Foundation Models for Sequential Recommendation

In this paper, we explore a less-studied yet practically important problem: how to efficiently and effectively adapt multiple ($>$2) multimodal foundation models (MFMs) for the sequential recommendation task. To this end, we propose a…

Information Retrieval · Computer Science 2025-09-16 Junchen Fu , Yongxin Ni , Joemon M. Jose , Ioannis Arapakis , Kaiwen Zheng , Youhua Li , Xuri Ge

Efficient and Effective Adaptation of Multimodal Foundation Models in Sequential Recommendation

Multimodal foundation models (MFMs) have revolutionized sequential recommender systems through advanced representation learning. While Parameter-efficient Fine-tuning (PEFT) is commonly used to adapt these models, studies often prioritize…

Information Retrieval · Computer Science 2025-09-15 Junchen Fu , Xuri Ge , Xin Xin , Alexandros Karatzoglou , Ioannis Arapakis , Kaiwen Zheng , Yongxin Ni , Joemon M. Jose

Video-Bench: Human-Aligned Video Generation Benchmark

Video generation assessment is essential for ensuring that generative models produce visually realistic, high-quality videos while aligning with human expectations. Current video generation benchmarks fall into two main categories:…

Computer Vision and Pattern Recognition · Computer Science 2025-04-30 Hui Han , Siyuan Li , Jiaqi Chen , Yiwen Yuan , Yuling Wu , Chak Tou Leong , Hanwen Du , Junchen Fu , Youhua Li , Jie Zhang , Chi Zhang , Li-jia Li , Yongxin Ni

The 1st EReL@MIR Workshop on Efficient Representation Learning for Multimodal Information Retrieval

Multimodal representation learning has garnered significant attention in the AI community, largely due to the success of large pre-trained multimodal foundation models like LLaMA, GPT, Mistral, and CLIP. These models have achieved…

Information Retrieval · Computer Science 2025-04-22 Junchen Fu , Xuri Ge , Xin Xin , Haitao Yu , Yue Feng , Alexandros Karatzoglou , Ioannis Arapakis , Joemon M. Jose

Teach Me How to Denoise: A Universal Framework for Denoising Multi-modal Recommender Systems via Guided Calibration

The surge in multimedia content has led to the development of Multi-Modal Recommender Systems (MMRecs), which use diverse modalities such as text, images, videos, and audio for more personalized recommendations. However, MMRecs struggle…

Information Retrieval · Computer Science 2025-04-22 Hongji Li , Hanwen Du , Youhua Li , Junchen Fu , Chunxiao Li , Ziyi Zhuang , Jiakang Li , Yongxin Ni

Multimodal Representation Learning Techniques for Comprehensive Facial State Analysis

Multimodal foundation models have significantly improved feature representation by integrating information from multiple modalities, making them highly suitable for a broader set of applications. However, the exploration of multimodal…

Computer Vision and Pattern Recognition · Computer Science 2025-04-15 Kaiwen Zheng , Xuri Ge , Junchen Fu , Jun Peng , Joemon M. Jose

An Empirical Study of Training ID-Agnostic Multi-modal Sequential Recommenders

Sequential Recommendation (SR) aims to predict future user-item interactions based on historical interactions. While many SR approaches concentrate on user IDs and item IDs, the human perception of the world through multi-modal signals,…

Information Retrieval · Computer Science 2024-10-08 Youhua Li , Hanwen Du , Yongxin Ni , Yuanqi He , Junchen Fu , Xiangyan Liu , Qi Guo

Towards End-to-End Explainable Facial Action Unit Recognition via Vision-Language Joint Learning

Facial action units (AUs), as defined in the Facial Action Coding System (FACS), have received significant research interest owing to their diverse range of applications in facial state analysis. Current mainstream FAU recognition models…

Computer Vision and Pattern Recognition · Computer Science 2024-08-02 Xuri Ge , Junchen Fu , Fuhai Chen , Shan An , Nicu Sebe , Joemon M. Jose

IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT

Multimodal foundation models are transformative in sequential recommender systems, leveraging powerful representation learning capabilities. While Parameter-efficient Fine-tuning (PEFT) is commonly used to adapt foundation models for…

Information Retrieval · Computer Science 2024-07-23 Junchen Fu , Xuri Ge , Xin Xin , Alexandros Karatzoglou , Ioannis Arapakis , Jie Wang , Joemon M. Jose

NineRec: A Benchmark Dataset Suite for Evaluating Transferable Recommendation

Large foundational models, through upstream pre-training and downstream fine-tuning, have achieved immense success in the broad AI community due to improved model performance and significant reductions in repetitive engineering. By…

Information Retrieval · Computer Science 2024-03-19 Jiaqi Zhang , Yu Cheng , Yongxin Ni , Yunzhu Pan , Zheng Yuan , Junchen Fu , Youhua Li , Jie Wang , Fajie Yuan

Exploring Adapter-based Transfer Learning for Recommender Systems: Empirical Studies and Practical Insights

Adapters, a plug-in neural network module with some tunable parameters, have emerged as a parameter-efficient transfer learning technique for adapting pre-trained models to downstream tasks, especially for natural language processing (NLP)…

Information Retrieval · Computer Science 2023-12-11 Junchen Fu , Fajie Yuan , Yu Song , Zheng Yuan , Mingyue Cheng , Shenghui Cheng , Jiaqi Zhang , Jie Wang , Yunzhu Pan

A Content-Driven Micro-Video Recommendation Dataset at Scale

Micro-videos have recently gained immense popularity, sparking critical research in micro-video recommendation with significant implications for the entertainment, advertising, and e-commerce industries. However, the lack of large-scale…

Information Retrieval · Computer Science 2023-09-28 Yongxin Ni , Yu Cheng , Xiangyan Liu , Junchen Fu , Youhua Li , Xiangnan He , Yongfeng Zhang , Fajie Yuan

Where to Go Next for Recommender Systems? ID- vs. Modality-based Recommender Models Revisited

Recommendation models that utilize unique identities (IDs) to represent distinct users and items have been state-of-the-art (SOTA) and dominated the recommender systems (RS) literature for over a decade. Meanwhile, the pre-trained modality…

Information Retrieval · Computer Science 2023-05-04 Zheng Yuan , Fajie Yuan , Yu Song , Youhua Li , Junchen Fu , Fei Yang , Yunzhu Pan , Yongxin Ni