English
Related papers

Related papers: Mining Semi-structured Data

200 papers

We propose specific data structures designed to the indexing and retrieval of information elements in heterogeneous XML data bases. The indexing scheme is well suited to the management of various contextual searches, expressed either at a…

Information Retrieval · Computer Science 2008-12-18 Eugen Popovici , Gilbas Ménier , Pierre-François Marteau

The continuous growth in the XML information repositories has been matched by increasing efforts in development of XML retrieval systems, in large parts aiming at supporting content-oriented XML retrieval. These systems exploit the…

Information Retrieval · Computer Science 2011-11-29 Awny Sayed

The information available on web pages mostly contains semi-structured text documents which are represented either in XML, or HTML, or XHTML format that lacks formatted document structure. The document does not discriminate between the text…

Information Retrieval · Computer Science 2014-03-11 Sandeep Sirsat

With the emergence of XML as de facto format for storing and exchanging information over the Internet, the search for ever more innovative and effective techniques for their querying is a major and current concern of the XML database…

Databases · Computer Science 2019-06-20 Maurice Tchoupé Tchendji , Adolphe Gaius Nkuefone , Thomas Tébougang Tchendji

The abundant semi-structured data on the Web, such as HTML-based tables and lists, provide commercial search engines a rich information source for question answering (QA). Different from plain text passages in Web documents, Web tables and…

Computation and Language · Computer Science 2020-10-15 Xingyao Zhang , Linjun Shou , Jian Pei , Ming Gong , Lijie Wen , Daxin Jiang

Analytical processing on XML repositories is usually enabled by designing complex data transformations that shred the documents into a common data warehousing schema. This can be very time-consuming and costly, especially if the underlying…

Databases · Computer Science 2009-09-15 Andrey Balmin , Latha Colby , Emiran Curtmola , Quanzhong Li , Fatma Ozcan

With XML becoming a standard for business information representation and exchange, stor-ing, indexing, and querying XML documents have rapidly become major issues in database research. In this context, query processing and optimization are…

Databases · Computer Science 2017-01-30 Hadj Mahboubi , Jérôme Darmont

This paper reports on the INRIA group's approach to XML mining while participating in the INEX XML Mining track 2005. We use a flexible representation of XML documents that allows taking into account the structure only or both the structure…

Information Retrieval · Computer Science 2007-05-23 Anne-Marie Vercoustre , Mounir Fegas , Saba Gul , Yves Lechevallier

Tables are common and important in scientific documents, yet most text-based document search systems do not capture structures and semantics specific to tables. How to bridge different types of mismatch between keywords queries and…

Information Retrieval · Computer Science 2017-07-13 Kyle Yingkai Gao , Jamie Callan

In this paper, we investigate the problem of mining numerical data in the framework of Formal Concept Analysis. The usual way is to use a scaling procedure --transforming numerical attributes into binary ones-- leading either to a loss of…

Artificial Intelligence · Computer Science 2011-11-28 Mehdi Kaytoue , Sergei O. Kuznetsov , Amedeo Napoli

In this paper, we present the guidelines for an XML-based approach for the sociological study of Web data such as the analysis of mailing lists or databases available online. The use of an XML warehouse is a flexible solution for storing…

Today's database is associated with interoperability between different domains and applications. This consequently results in the importance of data portability in database. XML format fits the requirements and it has been increasingly used…

Databases · Computer Science 2010-10-07 Mikael Fernandus Simalango

We show that a general model of lexical information conforms to an abstract model that reflects the hierarchy of information found in a typical dictionary entry. We show that this model can be mapped into a well-formed XML document, and how…

Computation and Language · Computer Science 2007-07-24 Laurent Romary , Nancy Ide , Adam Kilgarriff

Querying over XML elements using keyword search is steadily gaining popularity. The traditional similarity measure is widely employed in order to effectively retrieve various XML documents. A number of authors have already proposed…

Information Retrieval · Computer Science 2010-12-20 Yang Wang , Zhikui Chen , Xiaodi Huang

Understanding large, structured documents like scholarly articles, requests for proposals or business reports is a complex and difficult task. It involves discovering a document's overall purpose and subject(s), understanding the function…

Computation and Language · Computer Science 2018-07-27 Muhammad Mahbubur Rahman , Tim Finin

Understanding and extracting of information from large documents, such as business opportunities, academic articles, medical documents and technical reports, poses challenges not present in short documents. Such large documents may be…

Computation and Language · Computer Science 2019-10-10 Muhammad Mahbubur Rahman , Tim Finin

E-commerce search and recommendation usually operate on structured data such as product catalogs and taxonomies. However, creating better search and recommendation systems often requires a large variety of unstructured data including…

Information Retrieval · Computer Science 2023-12-07 Haixun Wang , Taesik Na

XML has become the de-facto standard for data representation and exchange, resulting in large scale repositories and warehouses of XML data. In order for users to understand and explore these large collections, a summarized, bird's eye view…

Information Retrieval · Computer Science 2009-10-14 Maya Ramanath , Kondreddi Sarath Kumar , Georgiana Ifrim

This short paper gives an introduction to a research project to analyze how digital documents are structured and described. Using a phenomenological approach, this research will reveal common patterns that are used in data, independent from…

Digital Libraries · Computer Science 2014-08-12 Jakob Voß

This paper addresses the challenge of improving information retrieval from semi-structured eXtensible Markup Language (XML) documents. Traditional information retrieval systems (IRS) often overlook user-specific needs and return identical…

Information Retrieval · Computer Science 2026-03-24 Ounnaci Iddir , Ahmed-ouamer Rachid , Tai Dinh
‹ Prev 1 2 3 10 Next ›