AI/Reinforcement Learning 2025. 7. 19. 17:16

728x90

강화학습 Chapter 06) Value Function Approximation

이 부분이 현재 LLM에서의 Reinforcememt Learning과 가장 밀접한 부분인 듯

1. Tabular Methods

⇒ 그러나 만약에 이 table이 더 커진다면, 즉 현실처럼 state가 무한대로 많은 경우라면, 저장하는 것도 문제고 개별적으로 학습하는 것도 문제 됨 generalization 필요함!

2. Approximation - Linear function

⇒ "table로 작성하지 않고 w라는 새로운 변수를 사용하여 value function을 함수화하며, 이를 **value function approximation(= parameterizing value function)**이라고 한다. "

⇒ state을 특징들의 벡터로 표현한다!

⇒ "w라는 parameter로 approximate한 value function을 학습할 때, 이 w를 Gradient Descent 방식으로 update한다"

경사 하강법에 방법 두 가지

⇒ 여기서는 batch처럼 나눈 게 아니라 모든 state에 대해 한번에 계산했기에 SGD라고 할 수 있다

⇒ 어떤 오차를 줄일 것이냐 : true value function 값 v(s) (==supervisor) 와 approximate v_hat(s)

⇒ RL에서는 superviser인 true value function V(s) 없으니 이를 target으로 대체한다!

⇒ "model free하기 위해선 Value Function이 아니라 Action Value Function!"

⇒ 어떤 오차를 줄일 것이냐 : true action value function 값 q(s) (==supervisor) 와 approximate q_hat(s)

⇒ RL에서는 superviser인 true action value function Q(s) 없으니 이를 target으로 대체한다!

728x90

[2025 강화학습 Recap] Chapter 1. Introduction to Reinforcement Learning (0)	2025.08.10
강화학습 Chapter 07) Deep Reinforcement Learning (0)	2025.07.19
강화학습 Chapter 05) Model-free Control (0)	2025.07.19
강화학습 Chapter 04) Model-free Prediction (0)	2025.07.19
강화학습 Chapter 03) Model-based Planning (0)	2023.04.12