Cancel

(Haarnoja 2018 ICML) Soft Actor-Critic; Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor

목차 Introduction Maximum Entropy Derivation of Soft Policy Iteration Soft Actor-Critic Algorithm Enforcing Action Bounds Reference Introduction 현실에서 강화학습을 적용하기 어려운 이유는 두 가지가 있다. 첫번...

Jun 26, 2021 2021-06-26T01:50:00+09:00 4 min

(Lillicrap 2015 ICLR) Continuous Control With Deep Reinforcement Learning

목차 Introduction Contribution Stochastic vs Deterministic policy Deterministic policy Additional Algorithm Reference Introduction 이 논문은 DQN을 응용한 논문이다. DQN은 low-dimensional action s...

Jun 20, 2021 2021-06-20T23:50:00+09:00 6 min

(Wu 2017 arxiv) Scalable trust-region method for deep reinforcementlearning using Kronecker-factored approximation

목차 Introduction Natural gradient using Kronecker-factored approximation Reference Introduction 이 논문은 TRPO에서 파생되었다. 이 논문의 주된 contribution은 TRPO의 computation time과 sample efficiency를 해결하기 위...

Jun 18, 2021 2021-06-18T01:30:00+09:00 3 min

(ICLR 2017 Wang) Sample Efficient Actor-Critic with Experience Replay

목차 Introduction Discrete Actor Critic with Experience Replay Multi-Step Estimation of the State-Action Value Fucntion Importance Weight Truncation With Bias Correction E...

Jun 14, 2021 2021-06-14T02:30:00+09:00 4 min

(Nips 2016 Vezhnevets) Strategic Attentive Writer for Learning Macro-Actions

목차 Introduction The state of the network Attentive planning Action-plan update Commitment-plan update Learning Experiment Reference Introduction 현재 많은 RL 논문들이 나왔고 이들은 low-level의...

Jun 6, 2021 2021-06-06T00:23:00+09:00 6 min

(Gregor 2015 ICML) DRAW; A Recurrent Neural Network For Image Generation

목차 Introduction DRAW Architecture Read and Write operations Reference Introduction 이 논문은 generative model 논문으로 VAE와 비슷한 느낌이다. 하지만 VAE와 다르게 RNN 구조가 추가된다. 왜 RNN 구조가 추가 되었나? 사람들이 그림을 그릴 때에서...

Jun 4, 2021 2021-06-04T00:09:00+09:00 5 min

16. (Schulman 2017 arxiv) Proximal Policy Optimization Algorithms

목차 Introduction Clipped Surrogate Objective Adaptive KL Penalty Coefficient Algorithm Generalized Advantage Estimation Entropy Experiment Reference Introduct...

May 23, 2021 2021-05-23T04:00:00+09:00 4 min

15. (Schulman 2017 ICML) Trust Region Policy Optimization

목차 Preliminaries Monotonic Improvement Guarantee for General Stochastic Policies Optimization of Parameterized Policies Sample-Based Estimation of the Objective and Constraint Training ...

May 16, 2021 2021-05-16T20:00:00+09:00 10 min

14. (Mnih 2016 ICML) Asynchronous Methods for Deep Reinforcement Learning

목차 Asynchronous Advantage Actor Critic N-step TD Entropy Algorithm Reference Asynchronous Advantage Actor Critic Replay buffer를 사용함으로써 memory와 real interaction마다 computation을 요구하는 문제...

May 8, 2021 2021-05-08T10:30:00+09:00 2 min

13. Advanced Actor-Critic(A2C)

목차 Advanced Actor Critic Algorithm Reference Advanced Actor Critic [\nabla_\theta J_\theta \simeq \underset{t=0}{\overset{\infty}{\sum}} \int_{s_t,a_t} \nabla_\theta \text{ln} P_\theta (...

May 7, 2021 2021-05-07T02:20:00+09:00 2 min

Recent Update

Trending Tags