English
Related papers

Related papers: Triangular clustering in document networks

200 papers

Many complex networks from the World-Wide-Web to biological networks are growing taking into account the heterogeneous features of the nodes. The feature of a node might be a discrete quantity such as a classification of a URL document as…

Physics and Society · Physics 2013-05-30 Luca Ferretti , Michele Cortelezzi , Bin Yang , Giacomo Marmorini , Ginestra Bianconi

Importance of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next…

Information Retrieval · Computer Science 2011-12-30 Muhammad Rafi , M. Shahid Shaikh , Amir Farooq

In this paper, we explore a set of novel features for authorship attribution of documents. These features are derived from a word network representation of natural language text. As has been noted in previous studies, natural language tends…

Computation and Language · Computer Science 2013-11-14 Shibamouli Lahiri , Rada Mihalcea

Triangles are an important building block and distinguishing feature of real-world networks, but their structure is still poorly understood. Despite numerous reports on the abundance of triangles, there is very little information on what…

Social and Information Networks · Computer Science 2013-03-06 Nurcan Durak , Ali Pinar , Tamara G. Kolda , C. Seshadhri

There are different ways to define similarity for grouping similar texts into clusters, as the concept of similarity may depend on the purpose of the task. For instance, in topic extraction similar texts mean those within the same semantic…

Computation and Language · Computer Science 2013-03-05 Diego R. Amancio , Osvaldo N. Oliveira , Luciano da F. Costa

A fundamental property of complex networks is the tendency for edges to cluster. The extent of the clustering is typically quantified by the clustering coefficient, which is the probability that a length-2 path is closed, i.e., induces a…

Social and Information Networks · Computer Science 2018-05-23 Hao Yin , Austin R. Benson , Jure Leskovec

This work is pertaining to the diversified ranking of web-resources and interconnected documents that rely on a network-like structure, e.g. web-pages. A practical example of this would be a query for the k most relevant web-pages that are…

Information Retrieval · Computer Science 2016-07-27 George Tsatsanifos

Document clustering is an unsupervised approach in which a large collection of documents (corpus) is subdivided into smaller, meaningful, identifiable, and verifiable sub-groups (clusters). Meaningful representation of documents and…

Information Retrieval · Computer Science 2014-12-08 Muhammad Rafi , Farnaz Amin , Mohammad Shahid Shaikh

High triangle density -- the graph property stating that a constant fraction of two-hop paths belong to a triangle -- is a common signature of social networks. This paper studies triangle-dense graphs from a structural perspective. We prove…

Data Structures and Algorithms · Computer Science 2014-02-10 Rishi Gupta , Tim Roughgarden , C. Seshadhri

Text clustering holds significant value across various domains due to its ability to identify patterns and group related information. Current approaches which rely heavily on a computed similarity measure between documents are often limited…

Information Retrieval · Computer Science 2025-04-09 Laurence Hirsch , Robin Hirsch , Bayode Ogunleye

A text network refers to a data type that each vertex is associated with a text document and the relationship between documents is represented by edges. The proliferation of text networks such as hyperlinked webpages and academic citation…

Social and Information Networks · Computer Science 2016-10-04 Junxian He , Ying Huang , Changfeng Liu , Jiaming Shen , Yuting Jia , Xinbing Wang

Coherence is an important aspect of text quality, and various approaches have been applied to coherence modeling. However, existing methods solely focus on a single document's coherence patterns, ignoring the underlying correlation between…

Computation and Language · Computer Science 2023-06-13 Wei Liu , Xiyan Fu , Michael Strube

Keyword-based information processing has limitations due to simple treatment of words. In this paper, we introduce named entities as objectives into document clustering, which are the key elements defining document semantics and in many…

Information Retrieval · Computer Science 2018-07-23 Tru H. Cao , Vuong M. Ngo , Dung T. Hong , Tho T. Quan

This paper explores intellectual and social proximity among scholarly journals by using network fusion techniques. Similarities among journals are initially represented by means of a three-layer network based on co-citations, common authors…

Digital Libraries · Computer Science 2021-11-22 Federica Baccini , Lucio Barabesi , Alberto Baccini , Mahdi Khelfaoui , Yves Gingras

A recurrent neural network that has been trained to separately model the language of several documents by unknown authors is used to measure similarity between the documents. It is able to find clues of common authorship even when the…

Computation and Language · Computer Science 2016-08-17 Douglas Bagnall

We develop a full theoretical approach to clustering in complex networks. A key concept is introduced, the edge multiplicity, that measures the number of triangles passing through an edge. This quantity extends the clustering coefficient in…

Disordered Systems and Neural Networks · Physics 2009-11-11 M. Angeles Serrano , Marian Boguna

We are interested in the widespread problem of clustering documents and finding topics in large collections of written documents in the presence of metadata and hyperlinks. To tackle the challenge of accounting for these different types of…

Social and Information Networks · Computer Science 2021-07-01 Charles C. Hyland , Yuanming Tao , Lamiae Azizi , Martin Gerlach , Tiago P. Peixoto , Eduardo G. Altmann

Text documents are structured on multiple levels of detail: individual words are related by syntax, but larger units of text are related by discourse structure. Existing language models generally fail to account for discourse structure, but…

Computation and Language · Computer Science 2016-02-23 Yangfeng Ji , Trevor Cohn , Lingpeng Kong , Chris Dyer , Jacob Eisenstein

For the study of citation networks, a challenging problem is modeling the high clustering. Existing studies indicate that the promising way to model the high clustering is a copying strategy, i.e., a paper copies the references of its…

Physics and Society · Physics 2015-03-19 Fu-Xin Ren , Xue-Qi Cheng , Hua-Wei Shen

Real-world networks often exhibit strong transitivity with nontrivial local clustering spectra and degree correlations. Such features are not easily modeled in tractable network models, creating an obstacle to the theoretical understanding…

Physics and Society · Physics 2026-05-26 Lorenzo Cirigliano , Gareth J. Baxter , Gábor Timár
‹ Prev 1 2 3 10 Next ›