Related papers: Does Preference Always Help? A Holistic Study on P…
In practical multi-criterion decision-making, it is cumbersome if a decision maker (DM) is asked to choose among a set of trade-off alternatives covering the whole Pareto-optimal front. This is a paradox in conventional evolutionary…
Decomposition has become an increasingly popular technique for evolutionary multi-objective optimization (EMO). A decomposition-based EMO algorithm is usually designed to approximate a whole Pareto-optimal front (PF). However, in practice,…
Evolutionary Multi-Objective Optimization Algorithms (EMOAs) are widely employed to tackle problems with multiple conflicting objectives. Recent research indicates that not all objectives are equally important to the decision-maker (DM). In…
The research area of evolutionary multiobjective optimization (EMO) is reaching better understandings of the properties and capabilities of EMO algorithms, and accumulating much evidence of their worth in practical scenarios. An urgent…
There are a lot of real-world black-box optimization problems that need to optimize multiple criteria simultaneously. However, in a multi-objective optimization (MOO) problem, identifying the whole Pareto front requires the prohibitive…
We consider a multi-objective optimization problem with objective functions that are expensive to evaluate. The decision maker (DM) has unknown preferences, and so the standard approach is to generate an approximation of the Pareto front…
Most existing studies on evolutionary multi-objective optimization focus on approximating the whole Pareto-optimal front. Nevertheless, rather than the whole front, which demands for too many points (especially in a high-dimensional space),…
Post-training of LLMs with RLHF, and subsequently preference optimization algorithms such as DPO, IPO, etc., made a big difference in improving human alignment. However, all such techniques can only work with a single (human) objective. In…
One drawback of evolutionary multiobjective optimization algorithms (EMOA) has traditionally been high computational cost to create an approximation of the Pareto front: number of required objective function evaluations usually grows high.…
Preference optimization (PO) is indispensable for large language models (LLMs), with methods such as direct preference optimization (DPO) and proximal policy optimization (PPO) achieving great success. A common belief is that DPO is…
The alignment of language models~(LMs) with human preferences is critical for building reliable AI systems. The problem is typically framed as optimizing an LM policy to maximize the expected reward that reflects human preferences.…
Some quality indicators have been proposed for benchmarking preference-based evolutionary multi-objective optimization algorithms using a reference point. Although a systematic review and analysis of the quality indicators are helpful for…
Most multi-objective optimisation algorithms maintain an archive explicitly or implicitly during their search. Such an archive can be solely used to store high-quality solutions presented to the decision maker, but in many cases may…
Normalization of objectives plays a crucial role in evolutionary multi-objective optimization (EMO) to handle objective functions with different scales, which can be found in real-world problems. Although the effect of normalization methods…
What is motivation and how does it work? Where do goals come from and how do they vary within and between species and individuals? Why do we prefer some things over others? MEDO is a theoretical framework for understanding these questions…
Direct Preference Optimization (DPO) has emerged as a simple and effective approach for aligning large language models (LLMs) with human preferences, bypassing the need for a learned reward model. Despite its growing adoption, a fundamental…
Multi-Objective Alignment (MOA) aims to align LLMs' responses with multiple human preference objectives, with Direct Preference Optimization (DPO) emerging as a prominent approach. However, we find that DPO-based MOA approaches suffer from…
Aligning the output of Large Language Models (LLMs) with human preferences (e.g., by means of reinforcement learning with human feedback, or RLHF) is essential for ensuring their effectiveness in real-world scenarios. Despite significant…
We consider black-box global optimization of time-consuming-to-evaluate functions on behalf of a decision-maker (DM) whose preferences must be learned. Each feasible design is associated with a time-consuming-to-evaluate vector of…
Optimization problems find widespread use in both single-objective and multi-objective scenarios. In practical applications, users aspire for solutions that converge to the region of interest (ROI) along the Pareto front (PF). While the…