Related papers: Granite Embedding Models

Granite Embedding R2 Models

We introduce the Granite Embedding R2 models, a comprehensive family of high-performance English encoder-based embedding models engineered for enterprise-scale dense retrieval applications. Building upon our first-generation release, these…

Computation and Language · Computer Science 2025-09-01 Parul Awasthy , Aashka Trivedi , Yulong Li , Meet Doshi , Riyaz Bhat , Vignesh P , Vishwajeet Kumar , Yushu Yang , Bhavani Iyer , Abraham Daniels , Rudra Murthy , Ken Barker , Martin Franz , Madison Lee , Todd Ward , Salim Roukos , David Cox , Luis Lastras , Jaydeep Sen , Radu Florian

Granite Embedding Multilingual R2 Models

We introduce the multilingual Granite Embedding R2 models, a family of encoder-based embedding models for enterprise-scale dense retrieval across 200+ languages. Extending our English-focused R2 release, these models add enhanced support…

Information Retrieval · Computer Science 2026-05-18 Parul Awasthy , Aashka Trivedi , Yushu Yang , Ken Barker , Yulong Li , Bhavani Iyer , Martin Franz , Juergen Bross , Meet Doshi , Vignesh P , Vishwajeet Kumar , Todd Ward , Abraham Daniels , Madison Lee , Luis Lastras , Jaydeep Sen , Radu Florian

Granite Vision: a lightweight, open-source multimodal model for enterprise Intelligence

We introduce Granite Vision, a lightweight large language model with vision capabilities, specifically designed to excel in enterprise use cases, particularly in visual document understanding. Our model is trained on a comprehensive…

Computer Vision and Pattern Recognition · Computer Science 2025-02-17 Granite Vision Team , Leonid Karlinsky , Assaf Arbelle , Abraham Daniels , Ahmed Nassar , Amit Alfassi , Bo Wu , Eli Schwartz , Dhiraj Joshi , Jovana Kondic , Nimrod Shabtay , Pengyuan Li , Roei Herzig , Shafiq Abedin , Shaked Perek , Sivan Harary , Udi Barzelay , Adi Raz Goldfarb , Aude Oliva , Ben Wieles , Bishwaranjan Bhattacharjee , Brandon Huang , Christoph Auer , Dan Gutfreund , David Beymer , David Wood , Hilde Kuehne , Jacob Hansen , Joseph Shtok , Ken Wong , Luis Angel Bathen , Mayank Mishra , Maksym Lysak , Michele Dolfi , Mikhail Yurochkin , Nikolaos Livathinos , Nimrod Harel , Ophir Azulai , Oshri Naparstek , Rafael Teixeira de Lima , Rameswar Panda , Sivan Doveh , Shubham Gupta , Subhro Das , Syed Zawad , Yusik Kim , Zexue He , Alexander Brooks , Gabe Goodhart , Anita Govindjee , Derek Leist , Ibrahim Ibrahim , Aya Soffer , David Cox , Kate Soule , Luis Lastras , Nirmit Desai , Shila Ofek-koifman , Sriram Raghavan , Tanveer Syeda-Mahmood , Peter Staar , Tal Drory , Rogerio Feris

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

In this paper, we introduce a new embedding model called M3-Embedding, which is distinguished for its versatility in \textit{Multi-Linguality}, \textit{Multi-Functionality}, and \textit{Multi-Granularity}. It provides a uniform support for…

Computation and Language · Computer Science 2025-12-15 Jianlv Chen , Shitao Xiao , Peitian Zhang , Kun Luo , Defu Lian , Zheng Liu

EmbeddingGemma: Powerful and Lightweight Text Representations

We introduce EmbeddingGemma, a new lightweight, open text embedding model based on the Gemma 3 language model family. Our innovative training recipe strategically captures knowledge from larger models via encoder-decoder initialization and…

Computation and Language · Computer Science 2025-11-04 Henrique Schechter Vera , Sahil Dua , Biao Zhang , Daniel Salz , Ryan Mullins , Sindhu Raghuram Panyam , Sara Smoot , Iftekhar Naim , Joe Zou , Feiyang Chen , Daniel Cer , Alice Lisak , Min Choi , Lucas Gonzalez , Omar Sanseviero , Glenn Cameron , Ian Ballantyne , Kat Black , Kaifeng Chen , Weiyi Wang , Zhe Li , Gus Martins , Jinhyuk Lee , Mark Sherwood , Juyeong Ji , Renjie Wu , Jingxiao Zheng , Jyotinder Singh , Abheesht Sharma , Divyashree Sreepathihalli , Aashi Jain , Adham Elarabawy , AJ Co , Andreas Doumanoglou , Babak Samari , Ben Hora , Brian Potetz , Dahun Kim , Enrique Alfonseca , Fedor Moiseev , Feng Han , Frank Palma Gomez , Gustavo Hernández Ábrego , Hesen Zhang , Hui Hui , Jay Han , Karan Gill , Ke Chen , Koert Chen , Madhuri Shanbhogue , Michael Boratko , Paul Suganthan , Sai Meher Karthik Duddu , Sandeep Mariserla , Setareh Ariafar , Shanfeng Zhang , Shijie Zhang , Simon Baumgartner , Sonam Goenka , Steve Qiu , Tanmaya Dabral , Trevor Walker , Vikram Rao , Waleed Khawaja , Wenlei Zhou , Xiaoqi Ren , Ye Xia , Yichang Chen , Yi-Ting Chen , Zhe Dong , Zhongli Ding , Francesco Visin , Gaël Liu , Jiageng Zhang , Kathleen Kenealy , Michelle Casbon , Ravin Kumar , Thomas Mesnard , Zach Gleicher , Cormac Brick , Olivier Lacombe , Adam Roberts , Qin Yin , Yunhsuan Sung , Raphael Hoffmann , Tris Warkentin , Armand Joulin , Tom Duerig , Mojtaba Seyedhosseini

EmbedLLM: Learning Compact Representations of Large Language Models

With hundreds of thousands of language models available on Huggingface today, efficiently evaluating and utilizing these models across various downstream, tasks has become increasingly critical. Many existing methods repeatedly learn…

Computation and Language · Computer Science 2024-10-18 Richard Zhuang , Tianhao Wu , Zhaojin Wen , Andrew Li , Jiantao Jiao , Kannan Ramchandran

EnterpriseEM: Fine-tuned Embeddings for Enterprise Semantic Search

Enterprises grapple with the significant challenge of managing proprietary unstructured data, hindering efficient information retrieval. This has led to the emergence of AI-driven information retrieval solutions, designed to adeptly extract…

Information Retrieval · Computer Science 2025-12-08 Kamalkumar Rathinasamy , Jayarama Nettar , Amit Kumar , Vishal Manchanda , Arun Vijayakumar , Ayush Kataria , Venkateshprasanna Manjunath , Chidambaram GS , Jaskirat Singh Sodhi , Shoeb Shaikh , Wasim Akhtar Khan , Prashant Singh , Tanishq Dattatray Ige , Vipin Tiwari , Rajab Ali Mondal , Harshini K , S Reka , Chetana Amancharla , Faiz ur Rahman , Harikrishnan P A , Indraneel Saha , Bhavya Tiwary , Navin Shankar Patel , Pradeep T S , Balaji A J , Priyapravas , Mohammed Rafee Tarafdar

Search-Adaptor: Embedding Customization for Information Retrieval

Embeddings extracted by pre-trained Large Language Models (LLMs) have significant potential to improve information retrieval and search. Beyond the zero-shot setup in which they are being conventionally used, being able to take advantage of…

Machine Learning · Computer Science 2024-08-26 Jinsung Yoon , Sercan O Arik , Yanfei Chen , Tomas Pfister

Towards a Flexible Embedding Learning Framework

Representation learning is a fundamental building block for analyzing entities in a database. While the existing embedding learning methods are effective in various data mining problems, their applicability is often limited because these…

Machine Learning · Computer Science 2020-09-24 Chin-Chia Michael Yeh , Dhruv Gelda , Zhongfang Zhuang , Yan Zheng , Liang Gou , Wei Zhang

Interfacing Foundation Models' Embeddings

Foundation models possess strong capabilities in reasoning and memorizing across modalities. To further unleash the power of foundation models, we present FIND, a generalized interface for aligning foundation models' embeddings with unified…

Computer Vision and Pattern Recognition · Computer Science 2024-07-16 Xueyan Zou , Linjie Li , Jianfeng Wang , Jianwei Yang , Mingyu Ding , Junyi Wei , Zhengyuan Yang , Feng Li , Hao Zhang , Shilong Liu , Arul Aravinthan , Yong Jae Lee , Lijuan Wang

Gemini Embedding: Generalizable Embeddings from Gemini

In this report, we introduce Gemini Embedding, a state-of-the-art embedding model leveraging the power of Gemini, Google's most capable large language model. Capitalizing on Gemini's inherent multilingual and code understanding…

Computation and Language · Computer Science 2025-03-12 Jinhyuk Lee , Feiyang Chen , Sahil Dua , Daniel Cer , Madhuri Shanbhogue , Iftekhar Naim , Gustavo Hernández Ábrego , Zhe Li , Kaifeng Chen , Henrique Schechter Vera , Xiaoqi Ren , Shanfeng Zhang , Daniel Salz , Michael Boratko , Jay Han , Blair Chen , Shuo Huang , Vikram Rao , Paul Suganthan , Feng Han , Andreas Doumanoglou , Nithi Gupta , Fedor Moiseev , Cathy Yip , Aashi Jain , Simon Baumgartner , Shahrokh Shahi , Frank Palma Gomez , Sandeep Mariserla , Min Choi , Parashar Shah , Sonam Goenka , Ke Chen , Ye Xia , Koert Chen , Sai Meher Karthik Duddu , Yichang Chen , Trevor Walker , Wenlei Zhou , Rakesh Ghiya , Zach Gleicher , Karan Gill , Zhe Dong , Mojtaba Seyedhosseini , Yunhsuan Sung , Raphael Hoffmann , Tom Duerig

Language Models are Universal Embedders

In the large language model (LLM) revolution, embedding is a key component of various systems, such as retrieving knowledge or memories for LLMs or building content moderation filters. As such cases span from English to other natural or…

Computation and Language · Computer Science 2025-05-23 Xin Zhang , Zehan Li , Yanzhao Zhang , Dingkun Long , Pengjun Xie , Meishan Zhang , Min Zhang

TabEmbed: Benchmarking and Learning Generalist Embeddings for Tabular Understanding

Foundation models have established unified representations for natural language processing, yet this paradigm remains largely unexplored for tabular data. Existing methods face fundamental limitations: LLM-based approaches lack…

Computation and Language · Computer Science 2026-05-07 Minjie Qiang , Mingming Zhang , Xiaoyi Bao , Xing Fu , Yu Cheng , Weiqiang Wang , Zhongqing Wang , Ningtao Wang

Embedding World Knowledge into Tabular Models: Towards Best Practices for Embedding Pipeline Design

Embeddings are a powerful way to enrich data-driven machine learning models with the world knowledge of large language models (LLMs). Yet, there is limited evidence on how to design effective LLM-based embedding pipelines for tabular…

Machine Learning · Computer Science 2026-03-19 Oksana Kolomenko , Ricardo Knauer , Erik Rodner

KaLM-Embedding: Superior Training Data Brings A Stronger Embedding Model

As retrieval-augmented generation prevails in large language models, embedding models are becoming increasingly crucial. Despite the growing number of general embedding models, prior work often overlooks the critical role of training data…

Computation and Language · Computer Science 2025-01-16 Xinshuo Hu , Zifei Shan , Xinping Zhao , Zetian Sun , Zhenyu Liu , Dongfang Li , Shaolin Ye , Xinyuan Wei , Qian Chen , Baotian Hu , Haofen Wang , Jun Yu , Min Zhang

Multilingual Universal Sentence Encoder for Semantic Retrieval

We introduce two pre-trained retrieval focused multilingual sentence encoding models, respectively based on the Transformer and CNN model architectures. The models embed text from 16 languages into a single semantic space using a multi-task…

Computation and Language · Computer Science 2019-07-10 Yinfei Yang , Daniel Cer , Amin Ahmad , Mandy Guo , Jax Law , Noah Constant , Gustavo Hernandez Abrego , Steve Yuan , Chris Tar , Yun-Hsuan Sung , Brian Strope , Ray Kurzweil

Efficient Ternary Weight Embedding Model: Bridging Scalability and Performance

Embedding models have become essential tools in both natural language processing and computer vision, enabling efficient semantic search, recommendation, clustering, and more. However, the high memory and computational demands of…

Computation and Language · Computer Science 2024-11-26 Jiayi Chen , Chen Wu , Shaoqun Zhang , Nan Li , Liangjie Zhang , Qi Zhang

ELITE: Embedding-Less retrieval with Iterative Text Exploration

Large Language Models (LLMs) have achieved impressive progress in natural language processing, but their limited ability to retain long-term context constrains performance on document-level or multi-turn tasks. Retrieval-Augmented…

Computation and Language · Computer Science 2025-05-20 Zhangyu Wang , Siyuan Gao , Rong Zhou , Hao Wang , Li Ning

Task-Adaptive Embedding Refinement via Test-time LLM Guidance

We explore the effectiveness of an LLM-guided query refinement paradigm for extending the usability of embedding models to challenging zero-shot search and classification tasks. Our approach refines the embedding representation of a user…

Computation and Language · Computer Science 2026-05-13 Ariel Gera , Shir Ashury-Tahan , Gal Bloch , Ohad Eytan , Assaf Toledo

LLM-Augmented Retrieval: Enhancing Retrieval Models Through Language Models and Doc-Level Embedding

Recently embedding-based retrieval or dense retrieval have shown state of the art results, compared with traditional sparse or bag-of-words based approaches. This paper introduces a model-agnostic doc-level embedding framework through large…

Information Retrieval · Computer Science 2024-04-10 Mingrui Wu , Sheng Cao