澶辨晥閾炬帴澶勭悊 |
寮哄寲瀛︿範(fàn)鍦ㄨ嚜鐒惰璦€澶勭悊涓嬬殑搴旂敤綃?PDF 涓嬭澆
杞澆鑷細(xì)http://www.python222.com/article/1207
鐩稿叧鎴浘錛?/strong>
![]() 涓昏鍐呭錛?/strong>
涓€銆佸己鍖栧涔?fàn)鍩杭媭闈?/strong>
1.1 浠嬬粛涓€涓嬪己鍖栧涔?fàn)锛?/strong>
寮哄寲瀛︿範(fàn)錛?/span>Reinforcement Learning錛夋槸涓€縐嶆椂搴忓喅絳栧涔?fàn)妗嗘炗灱岄€氳繃鏅鴻兘浣撳拰鐜浜や簰
1.2 浠嬬粛涓€涓嬪己鍖栧涔?鐨?鐘舵€侊紙States錛?鍜?瑙傛祴錛?/strong>Observations錛夛紵
• 鐘舵€侊紙States錛夛細(xì)瀵逛簬涓栫晫鐘舵€佺殑瀹屾暣鎻忚堪
• 瑙傛祴錛?/span>Observations錛夛細(xì)瀵逛簬涓€涓姸鎬佺殑閮ㄥ垎鎻忚堪錛屽彲鑳戒細(xì)緙哄け涓€浜涗俊鎭€傚綋O=S鏃訛紝縐?/span>O涓哄畬緹庝俊鎭?/span>/fully
observed錛?/span>O<S鏃訛紝縐?/span>O涓洪潪瀹岀編淇℃伅/partially observed銆?/span>
1.3 寮哄寲瀛︿範(fàn) 鏈夊摢浜?鍔ㄤ綔絀洪棿錛?/strong>Action Spaces錛夛紝浠栦滑涔嬮棿鐨勫尯鍒槸浠€涔堬紵
• 紱繪暎鍔ㄤ綔絀洪棿錛氬綋鏅鴻兘浣撳彧鑳介噰鍙栨湁闄愮殑鍔ㄤ綔錛屽涓嬫/鏂囨湰鐢熸垚
• 榪炵畫(huà)鍔ㄤ綔絀洪棿錛氬綋鏅鴻兘浣撶殑鍔ㄤ綔鏄疄鏁板悜閲忥紝濡傛満姊拌噦杞姩瑙掑害
鍏跺尯鍒細(xì)褰卞搷policy緗戠粶鐨勫疄鐜版柟寮忋€?/span>
1.4 寮哄寲瀛︿範(fàn) 鏈夊摢浜?/strong> Policy絳栫暐錛?/strong>
• 紜畾鎬х瓥鐣?/span>Deterministic Policy錛?/span> at = u(st)錛岃繛緇姩浣滅┖闂?/span>
• 闅忔満鎬х瓥鐣?/span>Stochastic Policy錛?/span> at ~ π(·|st) 錛岀鏁e姩浣滅┖闂?/span>
|