Zebin Yang
Memory-augmented Large Language Models (LLMs) have demonstrated remarkable capability for complex and long-horizon embodied planning. By keeping track of past experiences and environmental states, memory enables LLMs to maintain a global…
Vision-Language-Action (VLA) models have shown remarkable success in robotic tasks like manipulation by fusing a language model's reasoning with a vision model's 3D understanding. However, their high computational cost remains a major…
Object-goal navigation (ObjNav) tasks an agent with navigating to the location of a specific object in an unseen environment. Embodied agents equipped with large language models (LLMs) and online constructed navigation maps can perform…
State space models (SSMs) like Mamba have recently attracted much attention. Compared to Transformer-based large language models (LLMs), Mamba achieves linear computation complexity with the sequence length and demonstrates superior…
Tree ensemble models like random forests and gradient boosting machines are widely used in machine learning due to their excellent predictive performance. However, a high-performance ensemble consisting of a large number of decision trees…
In this paper, we propose MCUBERT to enable language models like BERT on tiny microcontroller units (MCUs) through network and scheduling co-optimization. We observe the embedding table contributes to the major storage bottleneck for tiny…
With the fast evolution of large language models (LLMs), privacy concerns with user queries arise as they may contain sensitive information. Private inference based on homomorphic encryption (HE) has been proposed to protect user query…
Recent advancements in generative large language models (LLMs) have significantly boosted the performance in natural language processing tasks. However, their efficiency is hampered by the inherent limitations in autoregressive token…
Large language models (LLMs) with Transformer architectures have become phenomenal in natural language processing, multimodal generative artificial intelligence, and agent-oriented artificial intelligence. The self-attention module is the…
PiML (read $\pi$-ML, /`pai`em`el/) is an integrated and open-access Python toolbox for interpretable machine learning model development and model diagnostics. It is designed with machine learning workflows in both low-code and high-code…
Hyperparameter optimization (HPO) plays a central role in the automated machine learning (AutoML). It is a challenging task as the response surfaces of hyperparameters are generally unknown, hence essentially a global optimization problem.…
The lack of interpretability is an inevitable problem when using neural network models in real applications. In this paper, an explainable neural network based on generalized additive models with structured interactions (GAMI-Net) is…
In recent years, the field of recommendation systems has attracted increasing attention to developing predictive models that provide explanations of why an item is recommended to a user. The explanations can be either obtained by post-hoc…
The deep neural networks (DNNs) have achieved great success in learning complex patterns with strong predictive power, but they are often thought of as "black box" models without a sufficient level of transparency and interpretability. It…
Network initialization is the first and critical step for training neural networks. In this paper, we propose a novel network initialization scheme based on the celebrated Stein's identity. By viewing multi-layer feedforward neural networks…
Prediction accuracy and model explainability are the two most important objectives when developing machine learning algorithms to solve real-world problems. The neural networks are known to possess good prediction performance, but lack of…
A regularized artificial neural network (RANN) is proposed for interval-valued data prediction. The ANN model is selected due to its powerful capability in fitting linear and nonlinear functions. To meet mathematical coherence requirement…