Home
JG_blog
Cancel

(Haarnoja 2018 ICML) Soft Actor-Critic; Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

목차 Introduction Maximum Entropy Derivation of Soft Policy Iteration Soft Actor-Critic Algorithm Enforcing Action Bounds Reference Introduction 현실에서 강화학습을 적용하기 어려운 이유는 두 가지가 있다. 첫번...

(Lillicrap 2015 ICLR) Continuous Control With Deep Reinforcement Learning

목차 Introduction Contribution Stochastic vs Deterministic policy Deterministic policy Additional Algorithm Reference Introduction 이 논문은 DQN을 응용한 논문이다. DQN은 low-dimensional action s...

(Wu 2017 arxiv) Scalable trust-region method for deep reinforcementlearning using Kronecker-factored approximation

목차 Introduction Natural gradient using Kronecker-factored approximation Reference Introduction 이 논문은 TRPO에서 파생되었다. 이 논문의 주된 contribution은 TRPO의 computation time과 sample efficiency를 해결하기 위...

(ICLR 2017 Wang) Sample Efficient Actor-Critic with Experience Replay

목차 Introduction Discrete Actor Critic with Experience Replay Multi-Step Estimation of the State-Action Value Fucntion Importance Weight Truncation With Bias Correction E...

(Nips 2016 Vezhnevets) Strategic Attentive Writer for Learning Macro-Actions

목차 Introduction The state of the network Attentive planning Action-plan update Commitment-plan update Learning Experiment Reference Introduction 현재 많은 RL 논문들이 나왔고 이들은 low-level의...

(Gregor 2015 ICML) DRAW; A Recurrent Neural Network For Image Generation

목차 Introduction DRAW Architecture Read and Write operations Reference Introduction 이 논문은 generative model 논문으로 VAE와 비슷한 느낌이다. 하지만 VAE와 다르게 RNN 구조가 추가된다. 왜 RNN 구조가 추가 되었나? 사람들이 그림을 그릴 때에서...

16. (Schulman 2017 arxiv) Proximal Policy Optimization Algorithms

목차 Introduction Clipped Surrogate Objective Adaptive KL Penalty Coefficient Algorithm Generalized Advantage Estimation Entropy Experiment Reference Introduct...

15. (Schulman 2017 ICML) Trust Region Policy Optimization

목차 Preliminaries Monotonic Improvement Guarantee for General Stochastic Policies Optimization of Parameterized Policies Sample-Based Estimation of the Objective and Constraint Training ...

14. (Mnih 2016 ICML) Asynchronous Methods for Deep Reinforcement Learning

목차 Asynchronous Advantage Actor Critic N-step TD Entropy Algorithm Reference Asynchronous Advantage Actor Critic Replay buffer를 사용함으로써 memory와 real interaction마다 computation을 요구하는 문제...

13. Advanced Actor-Critic(A2C)

목차 Advanced Actor Critic Algorithm Reference Advanced Actor Critic [\nabla_\theta J_\theta \simeq \underset{t=0}{\overset{\infty}{\sum}} \int_{s_t,a_t} \nabla_\theta \text{ln} P_\theta (...