Cong Lin
We introduce GVGAI-LLM, a video game benchmark for evaluating the reasoning and problem-solving capabilities of large language models (LLMs). Built on the General Video Game AI framework, it features a diverse collection of arcade-style…
We show that both temporal and spatial symmetry breaking in canonical K-type transition arise as organized hydrodynamic structures rather than stochastic fluctuations. Before the skin-friction maximum, the flow is fully described by a…
As visual misinformation becomes increasingly prevalent, platform algorithms act as intermediaries that curate information for users' verification practices. Yet, it remains unclear how algorithmic gatekeeping tools, such as reverse image…
The (re)creation and distribution of cultural products such as music are increasingly shaped by digital platforms. This study explores how TikTok and Spotify, situated in different governance and user contexts, could influence digital music…
Most model reduction methods reduce the state dimension and then temporally evolve a set of coefficients that encode the state in the reduced representation. In this paper, we instead employ an efficient representation of the entire…
Vision-Language Models (VLMs) based on Mixture-of-Experts (MoE) architectures have emerged as a pivotal paradigm in multimodal understanding, offering a powerful framework for integrating visual and linguistic information. However, the…
In Neural Networks, there are various methods of feature fusion. Different strategies can significantly affect the effectiveness of feature representation, consequently influencing the ability of model to extract representative and…
Computing power has evolved into a foundational and indispensable resource in the area of deep learning, particularly in tasks such as Face Recognition (FR) model training on large-scale datasets, where multiple GPUs are often a necessity.…
Temporal knowledge graph (TKG) reasoning predicts future events based on historical data, but it's challenging due to the complex semantic and hierarchical information involved. Existing Euclidean models excel at capturing semantics but…
Large Vision Language Models (LVLMs) have demonstrated impressive zero-shot capabilities in various vision-language dialogue scenarios. However, the absence of fine-grained visual object detection hinders the model from understanding the…
Cortical surface registration plays a crucial role in aligning cortical functional and anatomical features across individuals. However, conventional registration algorithms are computationally inefficient. Recently, learning-based…
Time-series classification approaches based on deep neural networks are easy to be overfitting on UCR datasets, which is caused by the few-shot problem of those datasets. Therefore, in order to alleviate the overfitting phenomenon for…
Overparametrized neural networks trained by gradient descent (GD) can provably overfit any training data. However, the generalization guarantee may not hold for noisy data. From a nonparametric perspective, this paper studies how well…
Living neural networks in our brains autonomously self-organize into large, complex architectures during early development to result in an organized and functional organic computational device. A key mechanism that enables the formation of…
This document introduces the background and the usage of the Dunhuang Grottoes Dataset and the benchmark. The documentation first starts with the background of the Dunhuang Grotto, which is widely recognised as an priceless heritage. Given…