Related papers: Some Experiments with Real-Time Decision Algorithm…
We present an anytime algorithm which computes policies for decision problems represented as multi-stage influence diagrams. Our algorithm constructs policies incrementally, starting from a policy which makes no use of the available…
We present a new algorithm for exactly solving decision making problems represented as influence diagrams. We do not require the usual assumptions of no forgetting and regularity; this allows us to solve problems with simultaneous decisions…
We present on-line policy gradient algorithms for computing the locally optimal policy of a constrained, average cost, finite state Markov Decision Process. The stochastic approximation algorithms require estimation of the gradient of the…
Deliberating on large or continuous state spaces have been long standing challenges in reinforcement learning. Temporal Abstraction have somewhat made this possible, but efficiently planing using temporal abstraction still remains an issue.…
Active inference has emerged as an alternative approach to control problems given its intuitive (probabilistic) formalism. However, despite its theoretical utility, computational implementations have largely been restricted to…
With the rapid growth of Internet services, recommendation systems play a central role in delivering personalized content. Faced with massive user requests and complex model architectures, the key challenge for real-time recommendation…
The best algorithm for a computational problem generally depends on the "relevant inputs," a concept that depends on the application domain and often defies formal articulation. While there is a large literature on empirical approaches to…
Although a number of related algorithms have been developed to evaluate influence diagrams, exploiting the conditional independence in the diagram, the exact solution has remained intractable for many important problems. In this paper we…
Suppose an online platform wants to compare a treatment and control policy, e.g., two different matching algorithms in a ridesharing system, or two different inventory management algorithms in an online retail site. Standard randomized…
We study the use of Temporal-Difference learning for estimating the structural parameters in dynamic discrete choice models. Our algorithms are based on the conditional choice probability approach but use functional approximations to…
The analysis of practical probabilistic models on the computer demands a convenient representation for the available knowledge and an efficient algorithm to perform inference. An appealing representation is the influence diagram, a network…
Anytime inference is inference performed incrementally, with the accuracy of the inference being controlled by a tunable parameter, usually time. Such anytime inference algorithms are also usually interruptible, gradually converging to the…
We extend the standard reinforcement learning framework to random time horizons. While the classical setting typically assumes finite and deterministic or infinite runtimes of trajectories, we argue that multiple real-world applications…
We present a physics-inspired method for inferring dynamic rankings in directed temporal networks - networks in which each directed and timestamped edge reflects the outcome and timing of a pairwise interaction. The inferred ranking of each…
We present new deterministic algorithms for several cases of the maximum rank matrix completion problem (for short matrix completion), i.e. the problem of assigning values to the variables in a given symbolic matrix as to maximize the…
In this article, we discuss two algorithms tailored to discrete-time deterministic finite-horizon nonlinear optimal control problems or so-called deterministic trajectory optimization problems. Both algorithms can be derived from an…
Probabilistic timed automata are classical timed automata extended with discrete probability distributions over edges. We introduce clock-dependent probabilistic timed automata, a variant of probabilistic timed automata in which transition…
We empirically evaluate the finite-time performance of several simulation-optimization algorithms on a testbed of problems with the goal of motivating further development of algorithms with strong finite-time performance. We investigate if…
In previous work (Fertig and Breese, 1989; Fertig and Breese, 1990) we defined a mechanism for performing probabilistic reasoning in influence diagrams using interval rather than point-valued probabilities. In this paper we extend these…
Deciding the best future execution time is a critical task in many business activities while evolving time series forecasting, and optimal timing strategy provides such a solution, which is driven by observed data. This solution has plenty…