Dimitri P. Bertsekas,德梅萃 P.博塞克斯(Dimitri P. Bertseka),美国MIT终身教授,美国国家工程院院士,清华大学复杂与网络化系统研究中心客座教授。电气工程与计算机科学领域国际知名作者,著有《非线性规划》《网络优化》《动态规划》《凸优化》《强化学习与最优控制》等十几本畅销教材和专著。
目录
1 Exact and Approximate Dynamic Programming Principles
1.1 AlphaZero, Off-Line Training, and On-Line Play
1.2 Deterministic Dynamic Programming
1.2.1 Finite Horizon Problem Formulation
1.2.2 The Dynamic Programming Algorithm
1.2.3 Approximation in Value Space
1.3 Stochastic Dynamic Programming
1.3.1 Finite Horizon Problems
1.3.2 Approximation in Value Space for Stochastic DP
1.3.3 Infinite Horizon Problems-An Overview
1.3.4 Infinite Horizon-Approximation in Value Space
1.3.5 Infinite Horizon-Policy Iteration, Rollout, andNewton's Method
1.4 Examples, Variations, and Simplifications
1.4.1 A Few Words About Modeling
1.4.2 Problems with a Termination State
1.4.3 State Augmentation, Time Delays, Forecasts, and Uncontrollable State Components
1.4.4 Partial State Information and Belief States
1.4.5 Multiagent Problems and Multiagent Rollout
1.4.6 Problems with Unknown Parameters-AdaptiveControl
1.4.7 Adaptive Control by Rollout and On-LineReplanning
1.5 Reinforcement Learning and Optimal Control-SomeTerminology
1.6 Notes and Sources
2 General Principles of Approximation in Value Space
2.1 Approximation in Value and Policy Space
2.1.1 Approximation in Value Space-One-Step and Multistep Lookahead
2.1.2 Approximation in Policy Space
2.1.3 Combined Approximation in Value and Policy Space
2.2 Approaches for Value Space Approximation
2.2.1 Off-Line and On-Line Implementations
2.2.2 Model-Based and Model-Free Implementations
2.2.3 Methods for Cost-to-Go Approximation
2.2.4 Methods for Expediting the Lookahead Minimization
2.3 Deterministic Rollout and the Policy Improvement Principle
2.3.1 On-Line Rollout for Deterministic Discrete Optimization
2.3.2 Using Multiple Base Heuristics-Parallel Rollout
2.3.3 The Simplified Rollout Algorithm
2.3.4 The Fortified Rollout Algorithm
2.3.5 Rollout with Multistep Lookahead
2.3.6 Rollout with an Expert
2.3.7 Rollout with Small Stage Costs and Long Horizon-Continuous-Time Rollout
2.4 Stochastic Rollout and Monte Carlo Tree Search
2.4.1 Simulation-Based Implementation of the Rollout Algorithm
2.4.2 Monte Carlo Tree Search
2.4.3 Randomized Policy Improvement by Monte Carlo Tree Search
2.4.4 The Effect of Errors in Rollout-Variance Reduction
2.4.5 Rollout Parallelization
2.5 Rollout for Infinite-Spaces Problems-Optimization Heuristics
2.5.1 Rollout for Infinite-Spaces Deterministic Problems
2.5.2 Rollout Based on Stochastic Programming
2.6 Notes and Sources