澶辨晥閾炬帴澶勭悊 |
澶фā鍨嬶紙LLMs錛夊己鍖栧涔犻潰 PDF 涓嬭澆
鐩稿叧鎴浘錛?/strong>
![]() 涓昏鍐呭錛?/strong>
1 綆€鍗曚粙緇嶅己鍖栧涔狅紵
寮哄寲瀛︿範錛氾紙Reinforcement Learning錛変竴縐嶆満鍣ㄥ涔犵殑鏂規(guī)硶錛?/span>閫氳繃浠庡閮ㄨ幏寰楁縺鍔辨潵鏍℃瀛︿範鏂瑰悜浠庤€岃幏寰椾竴
縐嶈嚜閫傚簲鐨勫涔犺兘鍔?/strong>銆?/span>
2 綆€鍗曚粙緇嶄竴涓?/strong> RLHF錛?/strong>
鍩轟簬浜哄伐鍙嶉鐨勫己鍖栧涔狅紙Reinforcement Learning from Human Feedback錛?/span>RLHF錛夛細鏋勫緩浜虹被鍙嶉鏁版嵁闆嗭紝
璁粌涓€涓縺鍔辨ā鍨嬶紝妯′豢浜虹被鍋忓ソ瀵圭粨鏋滄墦鍒?/strong>錛岃繖鏄?/span>GPT-3鍚庢椂浠eぇ璇█妯″瀷瓚婃潵瓚婂儚浜虹被瀵硅瘽鏍稿績鎶€鏈€?/span>
3. 濂栧姳妯″瀷闇€瑕佸拰鍩虹妯″瀷涓€鑷村悧錛?/strong>
涓嶅悓瀹炵幇鏂瑰紡浼間箮闄愬埗涓嶅悓銆傦紙寰呭疄璺電‘璁わ級colossal-ai鐨?/span>coati涓渶瑕佹ā鍨嬫湁鐩稿悓鐨?/span>tokenizer錛屾墍浠ラ€夋ā鍨嬪彧鑳?/span>
浠庡悓緋誨垪涓壘銆傚湪ppo綆楁硶瀹炵幇鏂瑰紡涓婃嵁璇?/span>trlx鏄渶絎﹀悎璁烘枃鐨勩€?/span>
4. RLHF 鍦ㄥ疄璺佃繃紼嬩腑瀛樺湪鍝簺涓嶈凍錛?/strong>
1. 涓嶈凍鐐?/span>1錛氫漢宸ヤ駭鐢熺殑鍋忓ソ鏁版嵁闆嗘垚鏈緝楂橈紝寰堥毦閲忎駭錛?/span>
2. 涓嶈凍鐐?/span>2錛氫笁涓樁孌電殑璁粌錛?/span>SFT->RM->PPO錛夎繃紼嬭緝闀匡紝鏇存柊榪唬杈冩參錛?/span>
3. 涓嶈凍鐐?/span>3錛?/span>PPO 鐨勮緇冭繃紼嬪悓鏃跺瓨鍦?/span>4涓ā鍨嬶紙2璁粌錛?/span>2鎺ㄧ悊錛夛紝瀵硅綆楄祫婧愮殑瑕佹眰杈冮珮銆?/span>
5. 濡備綍瑙e喅 浜哄伐浜х敓鐨勫亸濂芥暟鎹泦鎴愭湰杈冮珮錛屽緢闅鵑噺浜ч棶棰橈紵
璇ユ柟娉曠殑鏍稿績鍦ㄤ簬閫氳繃AI 妯″瀷鐩戠潱鍏朵粬 AI 妯″瀷錛屽嵆鍦?/strong>SFT闃舵錛屼粠鍒濆妯″瀷涓噰鏍鳳紝鐒跺悗鐢熸垚鑷垜鎵硅瘎鍜屼慨
姝o紝鐒跺悗鏍規(guī)嵁淇鍚庣殑鍙嶅簲寰皟鍘熷妯″瀷銆?鍦?/span> RL 闃舵錛屼粠寰皟妯″瀷涓噰鏍鳳紝浣跨敤涓€涓ā鍨嬫潵璇勪及鐢熸垚鐨勬牱鏈紝
|