Strict greedy design paradigm applied to the stochastic multi-armed bandit problem

doi:10.3969/j.issn.1001-3881.2015.06.001

首页 > 过刊浏览>2015年第43卷第6期 >1-6. DOI:10.3969/j.issn.1001-3881.2015.06.001

Strict greedy design paradigm applied to the stochastic multi-armed bandit problem
DOI:
                        10.3969/j.issn.1001-3881.2015.06.001
                    
作者:
                        
                        
                    
作者单位:
作者简介:
通讯作者:
中图分类号:
基金项目:

Strict greedy design paradigm applied to the stochastic multi-armed bandit problem

Author:

Affiliation:

Fund Project:

摘要

图/表

访问统计

参考文献

相似文献

引证文献

资源附件

文章评论

摘要:

Abstract:

The process of making decisions is something humans do inherently and routinely, to the extent that it appears commonplace. However, in order to achieve good overall performance, decisions must take into account both the outcomes of past decisions and opportunities of future ones. Reinforcement learning, which is fundamental to sequential decisionmaking, consists of the following components: ① A set of decisions epochs; ② A set of environment states; ③ A set of available actions to transition states; ④ Stateaction dependent immediate rewards for each action.〓At each decision, the environment state provides the decision maker with a set of available actions from which to choose. As a result of selecting a particular action in the state, the environment generates an immediate reward for the decision maker and shifts to a different state and decision. The ultimate goal for the decision maker is to maximize the total reward after a sequence of time steps.〓This paper will focus on an archetypal example of reinforcement learning, the stochastic multiarmed bandit problem. After introducing the dilemma, I will briefly cover the most common methods used to solve it, namely the UCB and εn-greedy algorithms. I will also introduce my own greedy implementation, the strictgreedy algorithm, which more tightly follows the greedy pattern in algorithm design, and show that it runs comparably to the two accepted algorithms.

参考文献

相似文献

引证文献

引用本文

Joey Hong. Strict greedy design paradigm applied to the stochastic multi-armed bandit problem[J].机床与液压,2015,43(6):1-6.
Joey Hong. Strict greedy design paradigm applied to the stochastic multi-armed bandit problem[J]. Machine Tool & Hydraulics,2015,43(6):1-6

复制

文章指标

点击次数:
下载次数:
HTML阅读次数:
引用次数:

历史

收稿日期:
最后修改日期:
录用日期:
在线发布日期: 2016-08-11
出版日期:

欢迎访问机床与液压官方网站!

网站首页

杂志简介

编委会

投稿须知

广告合作

联系我们

ENGLISH

引用本文

分享

文章指标

历史