English

Multi-label Node Classification On Graph-Structured Data

Machine Learning 2024-03-01 v4

Abstract

Graph Neural Networks (GNNs) have shown state-of-the-art improvements in node classification tasks on graphs. While these improvements have been largely demonstrated in a multi-class classification scenario, a more general and realistic scenario in which each node could have multiple labels has so far received little attention. The first challenge in conducting focused studies on multi-label node classification is the limited number of publicly available multi-label graph datasets. Therefore, as our first contribution, we collect and release three real-world biological datasets and develop a multi-label graph generator to generate datasets with tunable properties. While high label similarity (high homophily) is usually attributed to the success of GNNs, we argue that a multi-label scenario does not follow the usual semantics of homophily and heterophily so far defined for a multi-class scenario. As our second contribution, we define homophily and Cross-Class Neighborhood Similarity for the multi-label scenario and provide a thorough analyses of the collected 99 multi-label datasets. Finally, we perform a large-scale comparative study with 88 methods and 99 datasets and analyse the performances of the methods to assess the progress made by current state of the art in the multi-label node classification scenario. We release our benchmark at https://github.com/Tianqi-py/MLGNC.

Keywords

Cite

@article{arxiv.2304.10398,
  title  = {Multi-label Node Classification On Graph-Structured Data},
  author = {Tianqi Zhao and Ngan Thi Dong and Alan Hanjalic and Megha Khosla},
  journal= {arXiv preprint arXiv:2304.10398},
  year   = {2024}
}

Comments

Published in TMLR 2023. Link: https://openreview.net/forum?id=EZhkV2BjDP