Thomas Butler
Large language models may encode sensitive information or outdated knowledge that needs to be removed, to ensure responsible and compliant model responses. Unlearning has emerged as an efficient alternative to full retraining, aiming to…
More than 80% of the 1.6B English speakers do not use Standard American English (SAE), yet LLMs often fail to correctly identify non-SAE dialects and generate stereotyped responses for their speakers. We introduce DialectLLM, the first…
Despite the widespread multilingual deployment of large language models, post-training pipelines remain predominantly English-centric, contributing to performance disparities across languages. We present a systematic, controlled study of…
Language identification is a crucial first step in multilingual systems such as chatbots and virtual assistants, enabling linguistically and culturally accurate user experiences. Errors at this stage can cascade into downstream failures,…
Small molecules in biological samples are studied to provide information about disease states, environmental toxins, natural product drug discovery, and many other applications. The primary window into the composition of small molecule…
Identifying a small molecule from its mass spectrum is the primary open problem in computational metabolomics. This is typically cast as information retrieval: an unknown spectrum is matched against spectra predicted computationally from a…
Automatic unreliable news detection is a research problem with great potential impact. Recently, several papers have shown promising results on large-scale news datasets with models that only use the article itself without resorting to any…
We demonstrate that demographic noise can induce persistent spatial pattern formation and temporal oscillations in the Levin-Segel predator-prey model for plankton-herbivore population dynamics. Although the model exhibits a Turing…
Models of diffusion driven pattern formation that rely on the Turing mechanism are utilized in many areas of science. However, many such models suffer from the defect of requiring fine tuning of parameters or an unrealistic separation of…
We calculate the optimality of a doublet precursor to the canonical genetic code with respect to mitigating the effects of point mutations and compare our results to corresponding ones for the canonical genetic code. We find that the…
The existence of beyond mean field quasi-cycle oscillations in a simple spatial model of predator prey interactions is derived from a path integral formalism. The results agree substantially with those obtained from analysis of similar…
A molecular dynamics calculation of the amino acid polar requirement is presented and used to score the canonical genetic code. Monte Carlo simulation shows that this computational polar requirement has been optimized by the canonical…