首页 / 词典 / good


  • 网络Markov决策过程
  1. Parallel Q-learning Algorithms for MDPs Based on Performance Potentials


  2. S (λ): A reinforcement learning algorithm based on average-payoff MDPs


  3. Discounted and Undiscounted MDPs : a Case Study Based on SARSA (λ) Algorithms


  4. We discuss the reinforcement learning-based optimization methods of Markov decision processes ( MDPs ) using the Markov performance potentials .


  5. On-Policy Modeless Reinforcement Learning Algorithms for Average - Payoff MDPs


  6. The concept of Markov performance potentials , which is introduced by Cao , offers a new framework and approach for the optimization of MDPs .


  7. This article based on the ideas and methods of GIS , connecting with features of mining area disaster designed the Mining area Disaster Production System ( MDPS ) .


  8. Motivated by the need of practical large-scale Markov systems , we considered in this paper the learning optimization problems for Markov decision processes ( MDPs ) .


  9. Through these solutions we can integrate theory implementation of message transmission based on MDPs , to make it be a safe , stable , maintainable message transmission system .


  10. The Effect of Synthetic Muramyl Dipeptide ( MDP ) and Its Derivatives ( MDPs ) on Immune Function in Mice


  11. Many sequential decision problems , such as flexible manufacturing systems , traffic command systems and queuing systems etc. , can be modeled as Markov decision processes ( MDPs ) .


  12. This paper elaborates on the low learning efficiency in reinforcement learning due to improper generalization and random exploration policy under deterministic MDPS and proposes a hierarchical reinforcement learning algorithm based on system model .


  13. In this paper , a neuro-dynamic programming ( NDP ) method is discussed via an actor-critic algorithm for Markov decision processes ( MDPs ) based on the learning of performance potentials .


  14. Moderate deviation principles ( MDPs ) are proved for the occupation time process of a super Brownian motion with immigration , where the immigration is governed by the Lebesgue measure .


  15. The extension of reinforcement learning to MDPs with large state , action space and high complexity has inevitably encountered the problem of the curse of dimensionality , which results in slow convergence and long training time .


  16. The problems on discounting reinforcement learning are analyzed . Several experiments have been performed for comparing the influence of different discounting factors on SARSA (λ) algorithm based on MDPs . The role of average reward scalar to undiscounted SARSA (λ) algorithm is also discussed .
