Author

Conor Fallon

results may include different authors with the same name

1 papers

Same Meaning, Different Scores: Lexical and Syntactic Sensitivity in LLM Evaluation

The rapid advancement of Large Language Models (LLMs) has established standardized evaluation benchmarks as the primary instrument for model comparison. Yet, their reliability is increasingly questioned due to sensitivity to shallow…

Computation and Language · Computer Science 2026-02-20 Bogdan Kostić , Conor Fallon , Julian Risch , Alexander Löser