English
Related papers

Related papers: Granite Embedding Models

200 papers

We introduce the Granite Embedding R2 models, a comprehensive family of high-performance English encoder-based embedding models engineered for enterprise-scale dense retrieval applications. Building upon our first-generation release, these…

We introduce the multilingual Granite Embedding R2 models, a family of encoder-based embedding models for enterprise-scale dense retrieval across 200+ languages. Extending our English-focused R2 release, these models add enhanced support…

In this paper, we introduce a new embedding model called M3-Embedding, which is distinguished for its versatility in \textit{Multi-Linguality}, \textit{Multi-Functionality}, and \textit{Multi-Granularity}. It provides a uniform support for…

Computation and Language · Computer Science 2025-12-15 Jianlv Chen , Shitao Xiao , Peitian Zhang , Kun Luo , Defu Lian , Zheng Liu

We introduce EmbeddingGemma, a new lightweight, open text embedding model based on the Gemma 3 language model family. Our innovative training recipe strategically captures knowledge from larger models via encoder-decoder initialization and…

With hundreds of thousands of language models available on Huggingface today, efficiently evaluating and utilizing these models across various downstream, tasks has become increasingly critical. Many existing methods repeatedly learn…

Computation and Language · Computer Science 2024-10-18 Richard Zhuang , Tianhao Wu , Zhaojin Wen , Andrew Li , Jiantao Jiao , Kannan Ramchandran

Enterprises grapple with the significant challenge of managing proprietary unstructured data, hindering efficient information retrieval. This has led to the emergence of AI-driven information retrieval solutions, designed to adeptly extract…

Embeddings extracted by pre-trained Large Language Models (LLMs) have significant potential to improve information retrieval and search. Beyond the zero-shot setup in which they are being conventionally used, being able to take advantage of…

Machine Learning · Computer Science 2024-08-26 Jinsung Yoon , Sercan O Arik , Yanfei Chen , Tomas Pfister

Representation learning is a fundamental building block for analyzing entities in a database. While the existing embedding learning methods are effective in various data mining problems, their applicability is often limited because these…

Machine Learning · Computer Science 2020-09-24 Chin-Chia Michael Yeh , Dhruv Gelda , Zhongfang Zhuang , Yan Zheng , Liang Gou , Wei Zhang

Foundation models possess strong capabilities in reasoning and memorizing across modalities. To further unleash the power of foundation models, we present FIND, a generalized interface for aligning foundation models' embeddings with unified…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Xueyan Zou , Linjie Li , Jianfeng Wang , Jianwei Yang , Mingyu Ding , Junyi Wei , Zhengyuan Yang , Feng Li , Hao Zhang , Shilong Liu , Arul Aravinthan , Yong Jae Lee , Lijuan Wang

In the large language model (LLM) revolution, embedding is a key component of various systems, such as retrieving knowledge or memories for LLMs or building content moderation filters. As such cases span from English to other natural or…

Computation and Language · Computer Science 2025-05-23 Xin Zhang , Zehan Li , Yanzhao Zhang , Dingkun Long , Pengjun Xie , Meishan Zhang , Min Zhang

Foundation models have established unified representations for natural language processing, yet this paradigm remains largely unexplored for tabular data. Existing methods face fundamental limitations: LLM-based approaches lack…

Computation and Language · Computer Science 2026-05-07 Minjie Qiang , Mingming Zhang , Xiaoyi Bao , Xing Fu , Yu Cheng , Weiqiang Wang , Zhongqing Wang , Ningtao Wang

Embeddings are a powerful way to enrich data-driven machine learning models with the world knowledge of large language models (LLMs). Yet, there is limited evidence on how to design effective LLM-based embedding pipelines for tabular…

Machine Learning · Computer Science 2026-03-19 Oksana Kolomenko , Ricardo Knauer , Erik Rodner

As retrieval-augmented generation prevails in large language models, embedding models are becoming increasingly crucial. Despite the growing number of general embedding models, prior work often overlooks the critical role of training data…

Computation and Language · Computer Science 2025-01-16 Xinshuo Hu , Zifei Shan , Xinping Zhao , Zetian Sun , Zhenyu Liu , Dongfang Li , Shaolin Ye , Xinyuan Wei , Qian Chen , Baotian Hu , Haofen Wang , Jun Yu , Min Zhang

We introduce two pre-trained retrieval focused multilingual sentence encoding models, respectively based on the Transformer and CNN model architectures. The models embed text from 16 languages into a single semantic space using a multi-task…

Embedding models have become essential tools in both natural language processing and computer vision, enabling efficient semantic search, recommendation, clustering, and more. However, the high memory and computational demands of…

Computation and Language · Computer Science 2024-11-26 Jiayi Chen , Chen Wu , Shaoqun Zhang , Nan Li , Liangjie Zhang , Qi Zhang

Large Language Models (LLMs) have achieved impressive progress in natural language processing, but their limited ability to retain long-term context constrains performance on document-level or multi-turn tasks. Retrieval-Augmented…

Computation and Language · Computer Science 2025-05-20 Zhangyu Wang , Siyuan Gao , Rong Zhou , Hao Wang , Li Ning

We explore the effectiveness of an LLM-guided query refinement paradigm for extending the usability of embedding models to challenging zero-shot search and classification tasks. Our approach refines the embedding representation of a user…

Computation and Language · Computer Science 2026-05-13 Ariel Gera , Shir Ashury-Tahan , Gal Bloch , Ohad Eytan , Assaf Toledo

Recently embedding-based retrieval or dense retrieval have shown state of the art results, compared with traditional sparse or bag-of-words based approaches. This paper introduces a model-agnostic doc-level embedding framework through large…

Information Retrieval · Computer Science 2024-04-10 Mingrui Wu , Sheng Cao
‹ Prev 1 2 3 10 Next ›