Related papers: Triangular clustering in document networks

Features and heterogeneities in growing network models

Many complex networks from the World-Wide-Web to biological networks are growing taking into account the heterogeneous features of the nodes. The feature of a node might be a discrete quantity such as a classification of a URL document as…

Physics and Society · Physics 2013-05-30 Luca Ferretti , Michele Cortelezzi , Bin Yang , Giacomo Marmorini , Ginestra Bianconi

Document Clustering based on Topic Maps

Importance of document clustering is now widely acknowledged by researchers for better management, smart navigation, efficient filtering, and concise summarization of large collection of documents like World Wide Web (WWW). The next…

Information Retrieval · Computer Science 2011-12-30 Muhammad Rafi , M. Shahid Shaikh , Amir Farooq

Authorship Attribution Using Word Network Features

In this paper, we explore a set of novel features for authorship attribution of documents. These features are derived from a word network representation of natural language text. As has been noted in previous studies, natural language tends…

Computation and Language · Computer Science 2013-11-14 Shibamouli Lahiri , Rada Mihalcea

Degree Relations of Triangles in Real-world Networks and Models

Triangles are an important building block and distinguishing feature of real-world networks, but their structure is still poorly understood. Despite numerous reports on the abundance of triangles, there is very little information on what…

Social and Information Networks · Computer Science 2013-03-06 Nurcan Durak , Ali Pinar , Tamara G. Kolda , C. Seshadhri

Structure-semantics interplay in complex networks and its effects on the predictability of similarity in texts

There are different ways to define similarity for grouping similar texts into clusters, as the concept of similarity may depend on the purpose of the task. For instance, in topic extraction similar texts mean those within the same semantic…

Computation and Language · Computer Science 2013-03-05 Diego R. Amancio , Osvaldo N. Oliveira , Luciano da F. Costa

Higher-order clustering in networks

A fundamental property of complex networks is the tendency for edges to cluster. The extent of the clustering is typically quantified by the clustering coefficient, which is the probability that a length-2 path is closed, i.e., induces a…

Social and Information Networks · Computer Science 2018-05-23 Hao Yin , Austin R. Benson , Jure Leskovec

Verso folio: Diversified Ranking for Large Graphs with Context-Aware Considerations

This work is pertaining to the diversified ranking of web-resources and interconnected documents that rely on a network-like structure, e.g. web-pages. A practical example of this would be a query for the k most relevant web-pages that are…

Information Retrieval · Computer Science 2016-07-27 George Tsatsanifos

Document clustering using graph based document representation with constraints

Document clustering is an unsupervised approach in which a large collection of documents (corpus) is subdivided into smaller, meaningful, identifiable, and verifiable sub-groups (clusters). Meaningful representation of documents and…

Information Retrieval · Computer Science 2014-12-08 Muhammad Rafi , Farnaz Amin , Mohammad Shahid Shaikh

Decompositions of Triangle-Dense Graphs

High triangle density -- the graph property stating that a constant fraction of two-hop paths belong to a triangle -- is a common signature of social networks. This paper studies triangle-dense graphs from a structural perspective. We prove…

Data Structures and Algorithms · Computer Science 2014-02-10 Rishi Gupta , Tim Roughgarden , C. Seshadhri

Document clustering with evolved multiword search queries

Text clustering holds significant value across various domains due to its ability to identify patterns and group related information. Current approaches which rely heavily on a computed similarity measure between documents are often limited…

Information Retrieval · Computer Science 2025-04-09 Laurence Hirsch , Robin Hirsch , Bayode Ogunleye

Text Network Exploration via Heterogeneous Web of Topics

A text network refers to a data type that each vertex is associated with a text document and the relationship between documents is represented by edges. The proliferation of text networks such as hyperlinked webpages and academic citation…

Social and Information Networks · Computer Science 2016-10-04 Junxian He , Ying Huang , Changfeng Liu , Jiaming Shen , Yuting Jia , Xinbing Wang

Modeling Structural Similarities between Documents for Coherence Assessment with Graph Convolutional Networks

Coherence is an important aspect of text quality, and various approaches have been applied to coherence modeling. However, existing methods solely focus on a single document's coherence patterns, ignoring the underlying correlation between…

Computation and Language · Computer Science 2023-06-13 Wei Liu , Xiyan Fu , Michael Strube

Semantic Document Clustering on Named Entity Features

Keyword-based information processing has limitations due to simple treatment of words. In this paper, we introduce named entities as objectives into document clustering, which are the key elements defining document semantics and in many…

Information Retrieval · Computer Science 2018-07-23 Tru H. Cao , Vuong M. Ngo , Dung T. Hong , Tho T. Quan

This paper explores intellectual and social proximity among scholarly journals by using network fusion techniques. Similarities among journals are initially represented by means of a three-layer network based on co-citations, common authors…

Digital Libraries · Computer Science 2021-11-22 Federica Baccini , Lucio Barabesi , Alberto Baccini , Mahdi Khelfaoui , Yves Gingras

Authorship clustering using multi-headed recurrent neural networks

A recurrent neural network that has been trained to separately model the language of several documents by unknown authors is used to measure similarity between the documents. It is able to find clues of common authorship even when the…

Computation and Language · Computer Science 2016-08-17 Douglas Bagnall

Clustering in complex networks. I. General formalism

We develop a full theoretical approach to clustering in complex networks. A key concept is introduced, the edge multiplicity, that measures the number of triangles passing through an edge. This quantity extends the clustering coefficient in…

Disordered Systems and Neural Networks · Physics 2009-11-11 M. Angeles Serrano , Marian Boguna

Multilayer Networks for Text Analysis with Multiple Data Types

We are interested in the widespread problem of clustering documents and finding topics in large collections of written documents in the presence of metadata and hyperlinks. To tackle the challenge of accounting for these different types of…

Social and Information Networks · Computer Science 2021-07-01 Charles C. Hyland , Yuanming Tao , Lamiae Azizi , Martin Gerlach , Tiago P. Peixoto , Eduardo G. Altmann

Document Context Language Models

Text documents are structured on multiple levels of detail: individual words are related by syntax, but larger units of text are related by discourse structure. Existing language models generally fail to account for discourse structure, but…

Computation and Language · Computer Science 2016-02-23 Yangfeng Ji , Trevor Cohn , Lingpeng Kong , Chris Dyer , Jacob Eisenstein

Modeling the clustering in citation networks

For the study of citation networks, a challenging problem is modeling the high clustering. Existing studies indicate that the promising way to model the high clustering is a copying strategy, i.e., a paper copies the references of its…

Physics and Society · Physics 2015-03-19 Fu-Xin Ren , Xue-Qi Cheng , Hua-Wei Shen

Strongly clustered random graphs via triadic closure: Degree correlations and clustering spectrum

Real-world networks often exhibit strong transitivity with nontrivial local clustering spectra and degree correlations. Such features are not easily modeled in tractable network models, creating an obstacle to the theoretical understanding…

Physics and Society · Physics 2026-05-26 Lorenzo Cirigliano , Gareth J. Baxter , Gábor Timár