From db5e62fc2ae26d6bef2ade8d9b3371df706cf9eb Mon Sep 17 00:00:00 2001
From: Haoyu Wang <32129905+why2011btv@users.noreply.github.com>
Date: Sun, 4 Jan 2026 17:56:01 -0500
Subject: [PATCH] fix typo in scripts/chat_rl.py

typo in comments: change "GAPO" to "DAPO"
---
 scripts/chat_rl.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/scripts/chat_rl.py b/scripts/chat_rl.py
index 1a09962..2c1587d 100644
--- a/scripts/chat_rl.py
+++ b/scripts/chat_rl.py
@@ -6,7 +6,7 @@ simpler and more similar to just REINFORCE:
 
 1) Delete trust region, so there is no KL regularization to a reference model
 2) We are on policy, so there's no need for PPO ratio+clip.
-3) We use GAPO style normalization that is token-level, not sequence-level.
+3) We use DAPO style normalization that is token-level, not sequence-level.
 4) Instead of z-score normalization (r - mu)/sigma, only use (r - mu) as the advantage.
 
 1 GPU: