Related papers: Learned Collusion

Algorithmic collusion under competitive design

We study a simple model of algorithmic collusion in which Q-learning algorithms are designed in a strategic fashion. We let players (\textit{designers}) choose their exploration policy simultaneously prior to letting their algorithms…

Theoretical Economics · Economics 2024-09-13 Ivan Conjeaud

The Bounds of Algorithmic Collusion; $Q$-learning, Gradient Learning, and the Folk Theorem

We explore the behaviour emerging from learning agents repeatedly interacting strategically for a wide range of learning dynamics, including $Q$-learning, projected gradient, replicator and log-barrier dynamics. Going beyond the better…

Computer Science and Game Theory · Computer Science 2026-03-04 Galit Askenazi-Golan , Domenico Mergoni Cecchelli , Edward Plumb , Clemens Possnig

Two-Step Q-Learning

Q-learning is a stochastic approximation version of the classic value iteration. The literature has established that Q-learning suffers from both maximization bias and slower convergence. Recently, multi-step algorithms have shown practical…

Machine Learning · Computer Science 2024-07-03 Antony Vijesh , Shreyas S R

Artificial Intelligence and Algorithmic Price Collusion in Two-sided Markets

Algorithmic price collusion facilitated by artificial intelligence (AI) algorithms raises significant concerns. We examine how AI agents using Q-learning engage in tacit collusion in two-sided markets. Our experiments reveal that AI-driven…

General Economics · Economics 2024-07-08 Cristian Chica , Yinglong Guo , Gilad Lerman

Learning Negotiating Behavior Between Cars in Intersections using Deep Q-Learning

This paper concerns automated vehicles negotiating with other vehicles, typically human driven, in crossings with the goal to find a decision algorithm by learning typical behaviors of other vehicles. The vehicle observes distance and speed…

Machine Learning · Computer Science 2018-10-25 Tommy Tram , Anton Jansson , Robin Grönberg , Mohammad Ali , Jonas Sjöberg

Self-Play Q-learners Can Provably Collude in the Iterated Prisoner's Dilemma

A growing body of computational studies shows that simple machine learning agents converge to cooperative behaviors in social dilemmas, such as collusive price-setting in oligopoly markets, raising questions about what drives this outcome.…

Computer Science and Game Theory · Computer Science 2025-12-23 Quentin Bertrand , Juan Duque , Emilio Calvano , Gauthier Gidel

Balanced Q-learning: Combining the Influence of Optimistic and Pessimistic Targets

The optimistic nature of the Q-learning target leads to an overestimation bias, which is an inherent problem associated with standard $Q-$learning. Such a bias fails to account for the possibility of low returns, particularly in risky…

Machine Learning · Computer Science 2021-11-05 Thommen George Karimpanal , Hung Le , Majid Abdolshah , Santu Rana , Sunil Gupta , Truyen Tran , Svetha Venkatesh

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning

Q-learning suffers from overestimation bias, because it approximates the maximum action value using the maximum estimated action value. Algorithms have been proposed to reduce overestimation bias, but we lack an understanding of how bias…

Machine Learning · Computer Science 2021-08-10 Qingfeng Lan , Yangchen Pan , Alona Fyshe , Martha White

A reinforcement learning algorithm for building collaboration in multi-agent systems

This paper presents a proof-of concept study for demonstrating the viability of building collaboration among multiple agents through standard Q learning algorithm embedded in particle swarm optimisation. Collaboration is formulated to be…

Artificial Intelligence · Computer Science 2018-04-06 Mehmet Emin Aydin , Ryan Fellows

Beyond Human Intervention: Algorithmic Collusion through Multi-Agent Learning Strategies

Collusion in market pricing is a concept associated with human actions to raise market prices through artificially limited supply. Recently, the idea of algorithmic collusion was put forward, where the human action in the pricing process is…

Theoretical Economics · Economics 2025-01-29 Suzie Grondin , Arthur Charpentier , Philipp Ratz

Smoothed Q-learning

In Reinforcement Learning the Q-learning algorithm provably converges to the optimal solution. However, as others have demonstrated, Q-learning can also overestimate the values and thereby spend too long exploring unhelpful states. Double…

Machine Learning · Computer Science 2023-03-16 David Barber

Artificial Intelligence and Spontaneous Collusion

We develop a tractable model for studying strategic interactions between learning algorithms. We uncover a mechanism responsible for the emergence of algorithmic collusion. We observe that algorithms periodically coordinate on actions that…

Theoretical Economics · Economics 2023-09-20 Martino Banchio , Giacomo Mantegazza

Learning to Charge More: A Theoretical Study of Collusion by Q-Learning Agents

There is growing experimental evidence that $Q$-learning agents may learn to charge supracompetitive prices. We provide the first theoretical explanation for this behavior in infinite repeated games. Firms update their pricing policies…

General Economics · Economics 2025-05-30 Cristian Chica , Yinglong Guo , Gilad Lerman

Q-Learning in enormous action spaces via amortized approximate maximization

Applying Q-learning to high-dimensional or continuous action spaces can be difficult due to the required maximization over the set of possible actions. Motivated by techniques from amortized inference, we replace the expensive maximization…

Machine Learning · Computer Science 2020-01-23 Tom Van de Wiele , David Warde-Farley , Andriy Mnih , Volodymyr Mnih

Stability of Multi-Agent Learning in Competitive Networks: Delaying the Onset of Chaos

The behaviour of multi-agent learning in competitive network games is often studied within the context of zero-sum games, in which convergence guarantees may be obtained. However, outside of this class the behaviour of learning is known to…

Computer Science and Game Theory · Computer Science 2023-12-20 Aamal Hussain , Francesco Belardinelli

Shaping the learning signal in a combined Q-learning rule to improve structured cooperation

Q-learning provides a standard reinforcement learning framework for studying cooperation by specifying how agents update action values from repeated local interactions outcomes. Although previous work has shown that reputation can promote…

Physics and Society · Physics 2026-02-03 Chunpeng Du , Zongyang Li , Yali Zhang , Yikang Lu , Attila Szolnoki

Learning Quantitative Automata Modulo Theories

Quantitative automata are useful representations for numerous applications, including modeling probability distributions over sequences to Markov chains and reward machines. Actively learning such automata typically occurs using explicitly…

Formal Languages and Automata Theory · Computer Science 2024-11-19 Eric Hsiung , Swarat Chaudhuri , Joydeep Biswas

Quinoa: a Q-function You Infer Normalized Over Actions

We present an algorithm for learning an approximate action-value soft Q-function in the relative entropy regularised reinforcement learning setting, for which an optimal improved policy can be recovered in closed form. We use recent…

Machine Learning · Computer Science 2019-11-06 Jonas Degrave , Abbas Abdolmaleki , Jost Tobias Springenberg , Nicolas Heess , Martin Riedmiller

Taming the Noise in Reinforcement Learning via Soft Updates

Model-free reinforcement learning algorithms, such as Q-learning, perform poorly in the early stages of learning in noisy environments, because much effort is spent unlearning biased estimates of the state-action value function. The bias…

Machine Learning · Computer Science 2018-02-01 Roy Fox , Ari Pakman , Naftali Tishby

On Information Asymmetry in Competitive Multi-Agent Reinforcement Learning: Convergence and Optimality

In this work, we study the system of interacting non-cooperative two Q-learning agents, where one agent has the privilege of observing the other's actions. We show that this information asymmetry can lead to a stable outcome of population…

Machine Learning · Computer Science 2021-01-26 Ezra Tampubolon , Haris Ceribasic , Holger Boche