English

Information-Theoretic Requirements for Gradient-Based Task Affinity Estimation in Multi-Task Learning

Machine Learning 2026-04-10 v1 Molecular Networks

Abstract

Multi-task learning shows strikingly inconsistent results -- sometimes joint training helps substantially, sometimes it actively harms performance -- yet the field lacks a principled framework for predicting these outcomes. We identify a fundamental but unstated assumption underlying gradient-based task analysis: tasks must share training instances for gradient conflicts to reveal genuine relationships. When tasks are measured on the same inputs, gradient alignment reflects shared mechanistic structure; when measured on disjoint inputs, any apparent signal conflates task relationships with distributional shift. We discover this sample overlap requirement exhibits a sharp phase transition: below 30% overlap, gradient-task correlations are statistically indistinguishable from noise; above 40%, they reliably recover known biological structure. Comprehensive validation across multiple datasets achieves strong correlations and recovers biological pathway organization. Standard benchmarks systematically violate this requirement -- MoleculeNet operates at <5% overlap, TDC at 8-14% -- far below the threshold where gradient analysis becomes meaningful. This provides the first principled explanation for seven years of inconsistent MTL results.

Keywords

Cite

@article{arxiv.2604.07848,
  title  = {Information-Theoretic Requirements for Gradient-Based Task Affinity Estimation in Multi-Task Learning},
  author = {Jasper Zhang and Bryan Cheng},
  journal= {arXiv preprint arXiv:2604.07848},
  year   = {2026}
}

Comments

8 pages, 4 figures. Accepted at workshop on AI for Accelerated Materials Design, Foundation Models for Science: Real-World Impact and Science-First Design, and Generative and Experimental Perspectives for Biomolecular Design at ICLR 2026