Related papers: DOCENT: Learning Self-Supervised Entity Representa…

Self-Supervised Visual Representations for Cross-Modal Retrieval

Cross-modal retrieval methods have been significantly improved in last years with the use of deep neural networks and large-scale annotated datasets such as ImageNet and Places. However, collecting and annotating such datasets requires a…

Computer Vision and Pattern Recognition · Computer Science 2019-02-04 Yash Patel , Lluis Gomez , Marçal Rusiñol , Dimosthenis Karatzas , C. V. Jawahar

Representation Learning of Entities and Documents from Knowledge Base Descriptions

In this paper, we describe TextEnt, a neural network model that learns distributed representations of entities and documents directly from a knowledge base (KB). Given a document in a KB consisting of words and entity annotations, we train…

Computation and Language · Computer Science 2018-06-11 Ikuya Yamada , Hiroyuki Shindo , Yoshiyasu Takefuji

Leveraging Natural Supervision for Language Representation Learning and Generation

Recent breakthroughs in Natural Language Processing (NLP) have been driven by language models trained on a massive amount of plain text. While powerful, deriving supervision from textual resources is still an open question. For example,…

Computation and Language · Computer Science 2022-07-22 Mingda Chen

ProgGen: Generating Named Entity Recognition Datasets Step-by-step with Self-Reflexive Large Language Models

Although Large Language Models (LLMs) exhibit remarkable adaptability across domains, these models often fall short in structured knowledge extraction tasks such as named entity recognition (NER). This paper explores an innovative,…

Computation and Language · Computer Science 2024-06-11 Yuzhao Heng , Chunyuan Deng , Yitong Li , Yue Yu , Yinghao Li , Rongzhi Zhang , Chao Zhang

LUKE: Deep Contextualized Entity Representations with Entity-aware Self-attention

Entity representations are useful in natural language tasks involving entities. In this paper, we propose new pretrained contextualized representations of words and entities based on the bidirectional transformer. The proposed model treats…

Computation and Language · Computer Science 2020-10-05 Ikuya Yamada , Akari Asai , Hiroyuki Shindo , Hideaki Takeda , Yuji Matsumoto

Jointly Learning to Label Sentences and Tokens

Learning to construct text representations in end-to-end systems can be difficult, as natural languages are highly compositional and task-specific annotated datasets are often limited in size. Methods for directly supervising language…

Computation and Language · Computer Science 2018-11-15 Marek Rei , Anders Søgaard

In-context Pretraining: Language Modeling Beyond Document Boundaries

Large language models (LMs) are currently trained to predict tokens given document prefixes, enabling them to directly perform long-form generation and prompting-style tasks which can be reduced to document completion. Existing pretraining…

Computation and Language · Computer Science 2024-06-25 Weijia Shi , Sewon Min , Maria Lomeli , Chunting Zhou , Margaret Li , Gergely Szilvasy , Rich James , Xi Victoria Lin , Noah A. Smith , Luke Zettlemoyer , Scott Yih , Mike Lewis

TextTopicNet - Self-Supervised Learning of Visual Features Through Embedding Images on Semantic Text Spaces

The immense success of deep learning based methods in computer vision heavily relies on large scale training datasets. These richly annotated datasets help the network learn discriminative visual features. Collecting and annotating such…

Computer Vision and Pattern Recognition · Computer Science 2018-07-09 Yash Patel , Lluis Gomez , Raul Gomez , Marçal Rusiñol , Dimosthenis Karatzas , C. V. Jawahar

Learning Representations by Predicting Bags of Visual Words

Self-supervised representation learning targets to learn convnet-based image representations from unlabeled data. Inspired by the success of NLP methods in this area, in this work we propose a self-supervised approach based on spatially…

Computer Vision and Pattern Recognition · Computer Science 2020-02-28 Spyros Gidaris , Andrei Bursuc , Nikos Komodakis , Patrick Pérez , Matthieu Cord

EntEval: A Holistic Evaluation Benchmark for Entity Representations

Rich entity representations are useful for a wide class of problems involving entities. Despite their importance, there is no standardized benchmark that evaluates the overall quality of entity representations. In this work, we propose…

Computation and Language · Computer Science 2019-11-12 Mingda Chen , Zewei Chu , Yang Chen , Karl Stratos , Kevin Gimpel

LMEnt: A Suite for Analyzing Knowledge in Language Models from Pretraining Data to Representations

Language models (LMs) increasingly drive real-world applications that require world knowledge. However, the internal processes through which models turn data into representations of knowledge and beliefs about the world, are poorly…

Computation and Language · Computer Science 2025-09-04 Daniela Gottesman , Alon Gilae-Dotan , Ido Cohen , Yoav Gur-Arieh , Marius Mosbach , Ori Yoran , Mor Geva

Semantic Sort: A Supervised Approach to Personalized Semantic Relatedness

We propose and study a novel supervised approach to learning statistical semantic relatedness models from subjectively annotated training examples. The proposed semantic model consists of parameterized co-occurrence statistics associated…

Computation and Language · Computer Science 2013-11-12 Ran El-Yaniv , David Yanay

Leveraging Contextual Information for Effective Entity Salience Detection

In text documents such as news articles, the content and key events usually revolve around a subset of all the entities mentioned in a document. These entities, often deemed as salient entities, provide useful cues of the aboutness of a…

Computation and Language · Computer Science 2024-04-04 Rajarshi Bhowmik , Marco Ponza , Atharva Tendle , Anant Gupta , Rebecca Jiang , Xingyu Lu , Qian Zhao , Daniel Preotiuc-Pietro

SESA: Supervised Explicit Semantic Analysis

In recent years supervised representation learning has provided state of the art or close to the state of the art results in semantic analysis tasks including ranking and information retrieval. The core idea is to learn how to embed items…

Computation and Language · Computer Science 2017-08-11 Dasha Bogdanova , Majid Yazdani

A Large-Scale Analysis on Self-Supervised Video Representation Learning

Self-supervised learning is an effective way for label-free model pre-training, especially in the video domain where labeling is expensive. Existing self-supervised works in the video domain use varying experimental setups to demonstrate…

Computer Vision and Pattern Recognition · Computer Science 2023-11-22 Akash Kumar , Ashlesha Kumar , Vibhav Vineet , Yogesh Singh Rawat

SelfDocSeg: A Self-Supervised vision-based Approach towards Document Segmentation

Document layout analysis is a known problem to the documents research community and has been vastly explored yielding a multitude of solutions ranging from text mining, and recognition to graph-based representation, visual feature…

Computer Vision and Pattern Recognition · Computer Science 2023-08-22 Subhajit Maity , Sanket Biswas , Siladittya Manna , Ayan Banerjee , Josep Lladós , Saumik Bhattacharya , Umapada Pal

SelfDoc: Self-Supervised Document Representation Learning

We propose SelfDoc, a task-agnostic pre-training framework for document image understanding. Because documents are multimodal and are intended for sequential reading, our framework exploits the positional, textual, and visual information of…

Computer Vision and Pattern Recognition · Computer Science 2021-06-08 Peizhao Li , Jiuxiang Gu , Jason Kuen , Vlad I. Morariu , Handong Zhao , Rajiv Jain , Varun Manjunatha , Hongfu Liu

Self-Supervised Visual Representation Learning Using Lightweight Architectures

In self-supervised learning, a model is trained to solve a pretext task, using a data set whose annotations are created by a machine. The objective is to transfer the trained weights to perform a downstream task in the target domain. We…

Machine Learning · Computer Science 2021-10-22 Prathamesh Sonawane , Sparsh Drolia , Saqib Shamsi , Bhargav Jain

Self-Supervision by Prediction for Object Discovery in Videos

Despite their irresistible success, deep learning algorithms still heavily rely on annotated data. On the other hand, unsupervised settings pose many challenges, especially about determining the right inductive bias in diverse scenarios.…

Computer Vision and Pattern Recognition · Computer Science 2021-03-11 Beril Besbinar , Pascal Frossard

Bring Your Own Data! Self-Supervised Evaluation for Large Language Models

With the rise of Large Language Models (LLMs) and their ubiquitous deployment in diverse domains, measuring language model behavior on realistic data is imperative. For example, a company deploying a client-facing chatbot must ensure that…

Computation and Language · Computer Science 2023-06-30 Neel Jain , Khalid Saifullah , Yuxin Wen , John Kirchenbauer , Manli Shu , Aniruddha Saha , Micah Goldblum , Jonas Geiping , Tom Goldstein