English
Related papers

Related papers: LFM2 Technical Report

200 papers

We present F2LLM-v2, a new family of general-purpose, multilingual embedding models in 8 distinct sizes ranging from 80M to 14B. Trained on a newly curated composite of 60 million publicly available high-quality data samples, F2LLM-v2…

Computation and Language · Computer Science 2026-03-20 Ziyin Zhang , Zihan Liao , Hang Yu , Peng Di , Rui Wang

Real-time AI experiences call for on-device large language models (OD-LLMs) optimized for efficient deployment on resource-constrained hardware. The most useful OD-LLMs produce near-real-time responses and exhibit broad hardware…

While scaling laws have been continuously validated in large language models (LLMs) with increasing model parameters, the inherent tension between the inference demands of LLMs and the limited resources of edge devices poses a critical…

On-device inference for Large Language Models (LLMs), driven by increasing privacy concerns and advancements of mobile-sized models, has gained significant interest. However, even mobile-sized LLMs (e.g., Gemma-2B) encounter unacceptably…

Artificial Intelligence · Computer Science 2024-12-17 Daliang Xu , Hao Zhang , Liming Yang , Ruiqi Liu , Gang Huang , Mengwei Xu , Xuanzhe Liu

We introduce F2LLM - Foundation to Feature Large Language Models, a suite of state-of-the-art embedding models in three sizes: 0.6B, 1.7B, and 4B. Unlike previous top-ranking embedding models that require massive contrastive pretraining,…

Computation and Language · Computer Science 2025-10-03 Ziyin Zhang , Zihan Liao , Hang Yu , Peng Di , Rui Wang

This paper introduces EdgeProfiler, a fast profiling framework designed for evaluating lightweight Large Language Models (LLMs) on edge systems. While LLMs offer remarkable capabilities in natural language understanding and generation,…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-09-18 Alyssa Pinnock , Shakya Jayakody , Kawsher A Roxy , Md Rubel Ahmed

Large Language Models (LLMs) are increasingly integrated into everyday applications, but their prevalent cloud-based deployment raises growing concerns around data privacy and long-term sustainability. Running LLMs locally on mobile and…

Machine Learning · Computer Science 2025-10-08 Haoxin Wang , Xiaolong Tu , Hongyu Ke , Huirong Chai , Dawei Chen , Kyungtae Han

With the emergence of wearable devices and other embedded systems, deploying large language models (LLMs) on edge platforms has become an urgent need. However, this is challenging because of their high computational and memory demands.…

Hardware Architecture · Computer Science 2025-10-22 Ye Qiao , Zhiheng Chen , Yifan Zhang , Yian Wang , Sitao Huang

Deploying large language models (LLMs) on edge platforms is challenged by their high computational and memory demands. Although recent low-bit quantization methods (e.g., BitNet, DeepSeek) compress weights to as little as 1.58 bits with…

Hardware Architecture · Computer Science 2025-04-28 Ye Qiao , Zhiheng Chen , Yifan Zhang , Yian Wang , Sitao Huang

While frontier large language models (LLMs) continue to push capability boundaries, their deployment remains confined to GPU-powered cloud infrastructure. We challenge this paradigm with SmallThinker, a family of LLMs natively designed -…

Efficient on-device language models around 1 billion parameters are essential for powering low-latency AI applications on mobile and wearable devices. However, achieving strong performance in this model class, while supporting long context…

Sub-billion-parameter Transformer language models are increasingly deployed on edge devices, where the privacy, latency, and operating-cost advantages of on-device inference are constrained by tight memory-bandwidth, energy, and thermal…

Machine Learning · Computer Science 2026-05-19 Xinting Jiang , Junyi Luo , Ruichen Qi , Kauna Lei , Ben Laurie , Gregory Kielian , Mehdi Saligane

Large language models (LLMs) have showcased profound capabilities in language understanding and generation, facilitating a wide array of applications. However, there is a notable paucity of detailed, open-sourced methodologies on…

Transformer based Large Language Models (LLMs) have been widely used in many fields, and the efficiency of LLM inference becomes hot topic in real applications. However, LLMs are usually complicatedly designed in model structure with…

Hardware Architecture · Computer Science 2024-06-25 Hui Wu , Yi Gan , Feng Yuan , Jing Ma , Wei Zhu , Yutao Xu , Hong Zhu , Yuhua Zhu , Xiaoli Liu , Jinghui Gu , Peng Zhao

The Large Language Model (LLM) is widely employed for tasks such as intelligent assistants, text summarization, translation, and multi-modality on mobile phones. However, the current methods for on-device LLM deployment maintain slow…

Computation and Language · Computer Science 2024-07-08 Luchang Li , Sheng Qian , Jie Lu , Lunxi Yuan , Rui Wang , Qin Xie

Large language models (LLMs) have emerged as a powerful foundation for intelligent reasoning and decision-making, demonstrating substantial impact across a wide range of domains and applications. However, their massive parameter scales and…

Distributed, Parallel, and Cluster Computing · Computer Science 2025-12-29 Mingyu Sun , Xiao Zhang , Shen Qu , Yan Li , Mengbai Xiao , Yuan Yuan , Dongxiao Yu

We present OLMo 2, the next generation of our fully open language models. OLMo 2 includes a family of dense autoregressive language models at 7B, 13B and 32B scales with fully released artifacts -- model weights, full training data,…

Extremely low-bit quantization is critical for efficiently deploying Large Language Models (LLMs), yet it often leads to severe performance degradation at 2 bits and even at 4 bits (e.g., MXFP4). We present SignRoundV2, a post-training…

Computation and Language · Computer Science 2026-05-19 Wenhua Cheng , Weiwei Zhang , Heng Guo , Haihao Shen , Zaner Ma

The deployment of transformer-based models on resource-constrained edge devices represents a critical challenge in enabling real-time artificial intelligence applications. This comprehensive survey examines lightweight transformer…

Machine Learning · Computer Science 2026-01-08 Hema Hariharan Samson
‹ Prev 1 2 3 10 Next ›