Casual as an Anchor: Resolving Supervision Misalignment in Formality Transfer Dataset

Authors: Hyojeong Yu, Hyukhun Koh, Minsung Kim, Kyomin Jung

Computation & Language2026-05v1license

Abstract

Formality transfer is commonly framed as a symmetric bidirectional task between informal and formal registers. We argue that this framing conceals a supervision design flaw in existing benchmarks such as GYAFC: binary human rewrites encode relative stylistic shifts rather than absolute human notions of formality. Consequently, models learn to generate pseudo-formal outputs that satisfy benchmark labels while failing to produce genuinely formal language. We quantify this misalignment by re-evaluating benchmark formal labels under a human-aligned definition of formality, revealing substantial discrepancies that propagate to consistent informal-to-formal failures across model families. To address this issue, we reconceptualize formality transfer as a graded dimension rather than a binary attribute. We introduce a three-level spectrum: informal, casual, and formal, where casual serves as an explicit intermediate state that clarifies supervision signals. Based on this framework, we introduce 3LF, a dataset providing parallel supervision across all three levels. Training on 3LF substantially reduces informal-to-formal failures and improves alignment with human perception. For example, GPT-4.1-nano improves from 0.06 to 0.88 F1 in the informal-to- formal direction despite 3LF being significantly smaller than GYAFC. We further demonstrate that these gains cannot be reproduced through in-context learning alone and provide qualitative analyses of ambiguity-driven errors and meaning distortions. Overall, our findings demonstrate how supervision design shapes stylistic alignment and highlight the importance of alignment-aware benchmark construction in controllable text generation.

Comments: HEAL@CHI 2026 Workshop Paper

Cite

@article{arxiv.2605.29365,
  title  = {Casual as an Anchor: Resolving Supervision Misalignment in Formality Transfer Dataset},
  author = {Hyojeong Yu and Hyukhun Koh and Minsung Kim and Kyomin Jung},
  journal= {arXiv preprint arXiv:2605.29365},
  year   = {2026}
}

← Computation & Language · Home