English
Related papers

Related papers: A Flexible Structured-based Representation for XML…

200 papers

This paper presents some experiments in clustering homogeneous XMLdocuments to validate an existing classification or more generally anorganisational structure. Our approach integrates techniques for extracting knowledge from documents with…

Information Retrieval · Computer Science 2007-05-23 Thierry Despeyroux , Yves Lechevallier , Brigitte Trousse , Anne-Marie Vercoustre

This paper presents some experiments in clustering homogeneous XMLdocuments to validate an existing classification or more generally anorganisational structure. Our approach integrates techniques for extracting knowledge from documents with…

Information Retrieval · Computer Science 2007-05-23 Thierry Despeyroux , Yves Lechevallier , Brigitte Trousse , Anne-Marie Vercoustre

This paper describes the approach taken to the XML Mining track at INEX 2008 by a group at the Queensland University of Technology. We introduce the K-tree clustering algorithm in an Information Retrieval context by adapting it for document…

Information Retrieval · Computer Science 2010-01-07 Christopher M. De Vries , Shlomo Geva

The need for discovering knowledge from XML documents according to both structure and content features has become challenging, due to the increase in application contexts for which handling both structure and content information in XML data…

Databases · Computer Science 2015-04-17 Olfa Arfaoui , Minyar Sassi Hidri

We propose specific data structures designed to the indexing and retrieval of information elements in heterogeneous XML data bases. The indexing scheme is well suited to the management of various contextual searches, expressed either at a…

Information Retrieval · Computer Science 2008-12-18 Eugen Popovici , Gilbas Ménier , Pierre-François Marteau

Document clustering as an unsupervised approach extensively used to navigate, filter, summarize and manage large collection of document repositories like the World Wide Web (WWW). Recently, focuses in this domain shifted from traditional…

Information Retrieval · Computer Science 2012-01-11 Muhammad Rafi , M. Maujood , M. M. Fazal , S. M. Ali

Importance of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next…

Information Retrieval · Computer Science 2011-12-30 Muhammad Rafi , M. Shahid Shaikh , Amir Farooq

With the rising quantity of textual data available in electronic format, the need to organize it become a highly challenging task. In the present paper, we explore a document organization framework that exploits an intelligent hierarchical…

Information Retrieval · Computer Science 2015-04-02 Rajendra Kumar Roul , Shubham Rohan Asthana , Sanjay Kumar Sahay

Analytical processing on XML repositories is usually enabled by designing complex data transformations that shred the documents into a common data warehousing schema. This can be very time-consuming and costly, especially if the underlying…

Databases · Computer Science 2009-09-15 Andrey Balmin , Latha Colby , Emiran Curtmola , Quanzhong Li , Fatma Ozcan

The eXtensible Markup Language (XML) provides a powerful and flexible means of encoding and exchanging data. As it turns out, its main advantage as an encoding format (namely, its requirement that all open and close markup tags are present…

Databases · Computer Science 2015-05-13 Gregory Leighton , Denilson Barbosa

XML is based on two essential aspects: the modelization of data in a tree like structure and the separation between the information itself and the way it is displayed. XML structures are easily serializable. The separation between an…

Software Engineering · Computer Science 2009-02-19 Claude Pasquier , Laurent Théry

Document parsing (DP) transforms unstructured or semi-structured documents into structured, machine-readable representations, enabling downstream applications such as knowledge base construction and retrieval-augmented generation (RAG).…

We report about the current state of development of a document suite and its applications. This collection of tools for the flexible and robust processing of documents in German is based on the use of XML as unifying formalism for encoding…

Computation and Language · Computer Science 2007-05-23 Dietmar Roesner , Manuela Kunze

XML document markup is highly repetitive and therefore well compressible using dictionary-based methods such as DAGs or grammars. In the context of selectivity estimation, grammar-compressed trees were used before as synopsis for structural…

Databases · Computer Science 2010-12-30 Sebastian Maneth , Tom Sebastian

With the advancement of technology and reduced storage costs, individuals and organizations are tending towards the usage of electronic media for storing textual information and documents. It is time consuming for readers to retrieve…

Information Retrieval · Computer Science 2010-07-27 Yasir Safeer , Atika Mustafa , Anis Noor Ali

To date, most of the XML native databases (DB) flexible querying systems are based on exploiting the tree structure of their semi structured data (SSD). However, it becomes important to test the efficiency of Formal Concept Analysis (FCA)…

Information Retrieval · Computer Science 2013-12-09 Olfa Arfaoui , Minyar Sassi-Hidri

Dictionaries are often developed using tools that save to Extensible Markup Language (XML)-based standards. These standards often allow high-level repeating elements to represent lexical entries, and utilize descendants of these repeating…

Computation and Language · Computer Science 2016-02-18 Paul Rodrigues , David Zajic , David Doermann , Michael Bloodgood , Peng Ye

The growing amount of XML encoded data exchanged over the Internet increases the importance of XML based publish-subscribe (pub-sub) and content based routing systems. The input in such systems typically consists of a stream of XML…

Hardware Architecture · Computer Science 2009-09-15 Abhishek Mitra , Marcos Vieira , Petko Bakalov , Walid Najjar , Vassilis Tsotras

Document clustering is an unsupervised approach in which a large collection of documents (corpus) is subdivided into smaller, meaningful, identifiable, and verifiable sub-groups (clusters). Meaningful representation of documents and…

Information Retrieval · Computer Science 2014-12-08 Muhammad Rafi , Farnaz Amin , Mohammad Shahid Shaikh

Nowadays, document clustering is considered as a data intensive task due to the dramatic, fast increase in the number of available documents. Nevertheless, the features that represent those documents are also too large. The most common…

Databases · Computer Science 2015-05-13 Abdelrahman Elsayed , Hoda M. O. Mokhtar , Osama Ismail
‹ Prev 1 2 3 10 Next ›