Related papers: An improved parser for data-oriented lexical-funct…

Aspects of Pattern-Matching in Data-Oriented Parsing

Data-Oriented Parsing (dop) ranks among the best parsing schemes, pairing state-of-the art parsing accuracy to the psycholinguistic insight that larger chunks of syntactic structures are relevant grammatical and probabilistic units. Parsing…

Computation and Language · Computer Science 2007-05-23 Guy De Pauw

A Data-Oriented Approach to Semantic Interpretation

In Data-Oriented Parsing (DOP), an annotated language corpus is used as a stochastic grammar. The most probable analysis of a new input sentence is constructed by combining sub-analyses from the corpus in the most probable way. This…

cmp-lg · Computer Science 2008-02-03 Rens Bod , Remko Bonnema , Remko Scha

Efficient Algorithms for Parsing the DOP Model

Excellent results have been reported for Data-Oriented Parsing (DOP) of natural language texts (Bod, 1993). Unfortunately, existing algorithms are both computationally intensive and difficult to implement. Previous algorithms are expensive…

cmp-lg · Computer Science 2008-02-03 Joshua Goodman

Two Questions about Data-Oriented Parsing

In this paper I present ongoing work on the data-oriented parsing (DOP) model. In previous work, DOP was tested on a cleaned-up set of analyzed part-of-speech strings from the Penn Treebank, achieving excellent test results. This left,…

cmp-lg · Computer Science 2008-02-03 Rens Bod

Data-Oriented Language Processing. An Overview

During the last few years, a new approach to language processing has started to emerge, which has become known under various labels such as "data-oriented parsing", "corpus-based interpretation", and "tree-bank grammar" (cf. van den Berg et…

cmp-lg · Computer Science 2008-02-03 Rens Bod , Remko Scha

Learning Efficient Disambiguation

This dissertation analyses the computational properties of current performance-models of natural language parsing, in particular Data Oriented Parsing (DOP), points out some of their major shortcomings and suggests suitable solutions. It…

Computation and Language · Computer Science 2007-05-23 Khalil Sima'an

LiteSearch: Efficacious Tree Search for LLM

Recent research suggests that tree search algorithms (e.g. Monte Carlo Tree Search) can dramatically boost LLM performance on complex mathematical reasoning tasks. However, they often require more than 10 times the computational resources…

Computation and Language · Computer Science 2024-07-02 Ante Wang , Linfeng Song , Ye Tian , Baolin Peng , Dian Yu , Haitao Mi , Jinsong Su , Dong Yu

RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

Retrieval-augmented language models can better adapt to changes in world state and incorporate long-tail knowledge. However, most existing methods retrieve only short contiguous chunks from a retrieval corpus, limiting holistic…

Computation and Language · Computer Science 2024-02-01 Parth Sarthi , Salman Abdullah , Aditi Tuli , Shubh Khanna , Anna Goldie , Christopher D. Manning

Can Subcategorisation Probabilities Help a Statistical Parser?

Research into the automatic acquisition of lexical information from corpora is starting to produce large-scale computational lexicons containing data on the relative frequencies of subcategorisation alternatives for individual verbal…

cmp-lg · Computer Science 2007-05-23 John Carroll , Guido Minnen , Ted Briscoe

Enhancing deep neural networks with morphological information

Deep learning approaches are superior in NLP due to their ability to extract informative features and patterns from languages. The two most successful neural architectures are LSTM and transformers, used in large pretrained language models…

Computation and Language · Computer Science 2022-03-03 Matej Klemen , Luka Krsnik , Marko Robnik-Šikonja

Monte Carlo Tree Search Boosts Reasoning via Iterative Preference Learning

We introduce an approach aimed at enhancing the reasoning capabilities of Large Language Models (LLMs) through an iterative preference learning process inspired by the successful strategy employed by AlphaZero. Our work leverages Monte…

Artificial Intelligence · Computer Science 2024-06-19 Yuxi Xie , Anirudh Goyal , Wenyue Zheng , Min-Yen Kan , Timothy P. Lillicrap , Kenji Kawaguchi , Michael Shieh

Improving Data Driven Wordclass Tagging by System Combination

In this paper we examine how the differences in modelling between different data driven systems performing the same NLP task can be exploited to yield a higher accuracy than the best individual system. We do this by means of an experiment…

cmp-lg · Computer Science 2007-05-23 Hans van Halteren , Jakub Zavrel , Walter Daelemans

ROGRAG: A Robustly Optimized GraphRAG Framework

Large language models (LLMs) commonly struggle with specialized or emerging topics which are rarely seen in the training corpus. Graph-based retrieval-augmented generation (GraphRAG) addresses this by structuring domain knowledge as a graph…

Information Retrieval · Computer Science 2025-06-05 Zhefan Wang , Huanjun Kong , Jie Ying , Wanli Ouyang , Nanqing Dong

Efficient probabilistic top-down and left-corner parsing

This paper examines efficient predictive broad-coverage parsing without dynamic programming. In contrast to bottom-up methods, depth-first top-down parsing produces partial parses that are fully connected trees spanning the entire left…

Computation and Language · Computer Science 2007-05-23 Brian Roark , Mark Johnson

ListOps: A Diagnostic Dataset for Latent Tree Learning

Latent tree learning models learn to parse a sentence without syntactic supervision, and use that parse to build the sentence representation. Existing work on such models has shown that, while they perform well on tasks like sentence…

Computation and Language · Computer Science 2018-04-18 Nikita Nangia , Samuel R. Bowman

Confidence-Calibrated Ensemble Dense Phrase Retrieval

In this paper, we consider the extent to which the transformer-based Dense Passage Retrieval (DPR) algorithm, developed by (Karpukhin et. al. 2020), can be optimized without further pre-training. Our method involves two particular insights:…

Computation and Language · Computer Science 2023-06-29 William Yang , Noah Bergam , Arnav Jain , Nima Sheikhoslami

Chinese Lexical Analysis with Deep Bi-GRU-CRF Network

Lexical analysis is believed to be a crucial step towards natural language understanding and has been widely studied. Recent years, end-to-end lexical analysis models with recurrent neural networks have gained increasing attention. In this…

Computation and Language · Computer Science 2018-07-06 Zhenyu Jiao , Shuqi Sun , Ke Sun

Estimating Lexical Priors for Low-Frequency Syncretic Forms

Given a previously unseen form that is morphologically n-ways ambiguous, what is the best estimator for the lexical prior probabilities for the various functions of the form? We argue that the best estimator is provided by computing the…

cmp-lg · Computer Science 2008-02-03 Harald Baayen , Richard Sproat

Part-of-Speech Tagging with Minimal Lexicalization

We use a Dynamic Bayesian Network to represent compactly a variety of sublexical and contextual features relevant to Part-of-Speech (PoS) tagging. The outcome is a flexible tagger (LegoTag) with state-of-the-art performance (3.6% error on a…

Computation and Language · Computer Science 2009-09-29 Virginia Savova , Leonid Peshkin

Loop Neural Networks for Parameter Sharing

The success of large-scale language models like GPT can be attributed to their ability to efficiently predict the next token in a sequence. However, these models rely on constant computational effort regardless of the complexity of the…

Artificial Intelligence · Computer Science 2024-11-11 Kei-Sing Ng , Qingchen Wang