


Остановите войну!
for scientists:


default search action
Zhuoran Yang
Person information

Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2023
- [j15]Han Zhong, Zhuoran Yang, Zhaoran Wang, Michael I. Jordan:
Can Reinforcement Learning Find Stackelberg-Nash Equilibria in General-Sum Markov Games with Myopically Rational Followers? J. Mach. Learn. Res. 24: 35:1-35:52 (2023) - [j14]Qiaomin Xie
, Yudong Chen
, Zhaoran Wang
, Zhuoran Yang
:
Learning Zero-Sum Simultaneous-Move Markov Games Using Function Approximation and Correlated Equilibrium. Math. Oper. Res. 48(1): 433-462 (2023) - [j13]Chi Jin
, Zhuoran Yang
, Zhaoran Wang
, Michael I. Jordan
:
Provably Efficient Reinforcement Learning with Linear Function Approximation. Math. Oper. Res. 48(3): 1496-1521 (2023) - [j12]Nikola Banovic
, Zhuoran Yang
, Aditya Ramesh
, Alice Liu
:
Being Trustworthy is Not Enough: How Untrustworthy Artificial Intelligence (AI) Can Deceive the End-Users and Gain Their Trust. Proc. ACM Hum. Comput. Interact. 7(CSCW1): 1-17 (2023) - [j11]Mingyi Hong, Hoi-To Wai
, Zhaoran Wang, Zhuoran Yang:
A Two-Timescale Stochastic Algorithm Framework for Bilevel Optimization: Complexity Analysis and Application to Actor-Critic. SIAM J. Optim. 33(1): 147-180 (2023) - [c102]Ruitu Xu, Yifei Min, Tianhao Wang, Michael I. Jordan, Zhaoran Wang, Zhuoran Yang:
Finding Regularized Competitive Equilibria of Heterogeneous Agent Macroeconomic Models via Reinforcement Learning. AISTATS 2023: 375-407 - [c101]Yixuan Wang
, Simon Sinong Zhan, Zhilu Wang
, Chao Huang
, Zhaoran Wang
, Zhuoran Yang
, Qi Zhu
:
Joint Differentiable Optimization and Verification for Certified Reinforcement Learning. ICCPS 2023: 132-141 - [c100]Lingxiao Wang, Qi Cai, Zhuoran Yang, Zhaoran Wang:
Represent to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency. ICLR 2023 - [c99]Miao Lu, Yifei Min, Zhaoran Wang, Zhuoran Yang:
Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes. ICLR 2023 - [c98]Zhuoqing Song, Jason D. Lee, Zhuoran Yang:
Can We Find Nash Equilibria at a Linear Rate in Markov Games? ICLR 2023 - [c97]Haoran Xu, Li Jiang, Jianxiong Li, Zhuoran Yang, Zhaoran Wang, Wai Kin Victor Chan, Xianyuan Zhan:
Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization. ICLR 2023 - [c96]Wenhao Zhan, Jason D. Lee, Zhuoran Yang:
Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games. ICLR 2023 - [c95]Sirui Zheng, Lingxiao Wang, Shuang Qiu, Zuyue Fu, Zhuoran Yang, Csaba Szepesvári, Zhaoran Wang:
Optimistic Exploration with Learned Features Provably Solves Markov Decision Processes with Neural Dynamics. ICLR 2023 - [c94]Siyu Chen, Jibang Wu, Yifan Wu, Zhuoran Yang:
Learning to Incentivize Information Acquisition: Proper Scoring Rules Meet Principal-Agent Model. ICML 2023: 5194-5218 - [c93]Jiacheng Guo, Zihao Li, Huazheng Wang, Mengdi Wang, Zhuoran Yang, Xuezhou Zhang:
Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP. ICML 2023: 11967-11997 - [c92]Yixuan Wang, Simon Sinong Zhan, Ruochen Jiao, Zhilu Wang, Wanxin Jin, Zhuoran Yang, Zhaoran Wang, Chao Huang, Qi Zhu:
Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments. ICML 2023: 36593-36604 - [c91]Yulai Zhao, Zhuoran Yang, Zhaoran Wang, Jason D. Lee:
Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning. ICML 2023: 42200-42226 - [c90]Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanovic:
Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement Learning. L4DC 2023: 315-332 - [c89]Banghua Zhu
, Stephen Bates
, Zhuoran Yang
, Yixin Wang
, Jiantao Jiao
, Michael I. Jordan
:
The Sample Complexity of Online Contract Design. EC 2023: 1188 - [i117]Zhuoqing Song, Jason D. Lee, Zhuoran Yang:
Can We Find Nash Equilibria at a Linear Rate in Markov Games? CoRR abs/2303.03095 (2023) - [i116]Ruitu Xu, Yifei Min, Tianhao Wang, Zhaoran Wang, Michael I. Jordan, Zhuoran Yang:
Finding Regularized Competitive Equilibria of Heterogeneous Agent Macroeconomic Models with Reinforcement Learning. CoRR abs/2303.04833 (2023) - [i115]Siyu Chen, Jibang Wu, Yifan Wu, Zhuoran Yang:
Learning to Incentivize Information Acquisition: Proper Scoring Rules Meet Principal-Agent Model. CoRR abs/2303.08613 (2023) - [i114]Siyu Chen, Yitan Wang, Zhaoran Wang, Zhuoran Yang:
A Unified Framework of Policy Learning for Contextual Bandit with Confounding Bias and Missing Observations. CoRR abs/2303.11187 (2023) - [i113]Haoran Xu, Li Jiang, Jianxiong Li, Zhuoran Yang, Zhaoran Wang, Wai Kin Victor Chan, Xianyuan Zhan:
Offline RL with No OOD Actions: In-Sample Learning via Implicit Value Regularization. CoRR abs/2303.15810 (2023) - [i112]Yulai Zhao, Zhuoran Yang, Zhaoran Wang, Jason D. Lee:
Local Optimization Achieves Global Optimality in Multi-Agent Reinforcement Learning. CoRR abs/2305.04819 (2023) - [i111]Zhihan Liu, Miao Lu, Wei Xiong, Han Zhong, Hao Hu, Shenao Zhang, Sirui Zheng, Zhuoran Yang, Zhaoran Wang:
One Objective to Rule Them All: A Maximization Objective Fusing Estimation and Planning for Exploration. CoRR abs/2305.18258 (2023) - [i110]Zihao Li, Zhuoran Yang, Mengdi Wang:
Reinforcement Learning with Human Feedback: Learning Dynamic Choices via Pessimism. CoRR abs/2305.18438 (2023) - [i109]Haoran He, Chenjia Bai, Kang Xu, Zhuoran Yang, Weinan Zhang, Dong Wang, Bin Zhao, Xuelong Li:
Diffusion Model is an Effective Planner and Data Synthesizer for Multi-Task Reinforcement Learning. CoRR abs/2305.18459 (2023) - [i108]Yufeng Zhang, Fengzhuo Zhang, Zhuoran Yang, Zhaoran Wang:
What and How does In-Context Learning Learn? Bayesian Model Averaging, Parameterization, and Generalization. CoRR abs/2305.19420 (2023) - [i107]Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanovic:
Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement Learning. CoRR abs/2306.00212 (2023) - [i106]Jiacheng Guo, Zihao Li, Huazheng Wang, Mengdi Wang, Zhuoran Yang, Xuezhou Zhang:
Provably Efficient Representation Learning with Tractable Planning in Low-Rank POMDP. CoRR abs/2306.12356 (2023) - [i105]Nuoya Xiong, Zhaoran Wang, Zhuoran Yang:
A General Framework for Sequential Decision-Making under Adaptivity Constraints. CoRR abs/2306.14468 (2023) - [i104]Pangpang Liu, Zhuoran Yang, Zhaoran Wang, Will Wei Sun:
Contextual Dynamic Pricing with Strategic Buyers. CoRR abs/2307.04055 (2023) - [i103]Siyu Chen, Mengdi Wang, Zhuoran Yang:
Actions Speak What You Want: Provably Sample-Efficient Reinforcement Learning of the Quantal Stackelberg Equilibrium from Strategic Feedbacks. CoRR abs/2307.14085 (2023) - [i102]Nuoya Xiong, Zhihan Liu, Zhaoran Wang, Zhuoran Yang:
Sample-Efficient Multi-Agent RL: An Optimization Perspective. CoRR abs/2310.06243 (2023) - [i101]Fengzhuo Zhang, Vincent Y. F. Tan, Zhaoran Wang, Zhuoran Yang:
Learning Regularized Monotone Graphon Mean-Field Games. CoRR abs/2310.08089 (2023) - [i100]Fengzhuo Zhang, Vincent Y. F. Tan, Zhaoran Wang, Zhuoran Yang:
Learning Regularized Graphon Mean-Field Games with Unknown Graphons. CoRR abs/2310.17531 (2023) - [i99]Shuang Qiu, Ziyu Dai, Han Zhong, Zhaoran Wang, Zhuoran Yang, Tong Zhang:
Posterior Sampling for Competitive RL: Function Approximation and Partial Observation. CoRR abs/2310.19861 (2023) - [i98]Jianqing Fan, Zhaoran Wang, Zhuoran Yang, Chenlu Ye:
Provably Efficient High-Dimensional Bandit Learning with Batched Feedbacks. CoRR abs/2311.13180 (2023) - [i97]Yixuan Wang, Ruochen Jiao, Chengtian Lang, Simon Sinong Zhan, Chao Huang, Zhaoran Wang, Zhuoran Yang, Qi Zhu:
Empowering Autonomous Driving with Large Language Models: A Safety Perspective. CoRR abs/2312.00812 (2023) - 2022
- [c88]Zehao Dou, Zhuoran Yang, Zhaoran Wang, Simon S. Du:
Gap-Dependent Bounds for Two-Player Markov Games. AISTATS 2022: 432-455 - [c87]Chenjia Bai, Lingxiao Wang, Zhuoran Yang, Zhi-Hong Deng, Animesh Garg, Peng Liu, Zhaoran Wang:
Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning. ICLR 2022 - [c86]Baihe Huang, Jason D. Lee, Zhaoran Wang, Zhuoran Yang:
Towards General Function Approximation in Zero-Sum Markov Games. ICLR 2022 - [c85]Zhi Zhang, Zhuoran Yang, Han Liu, Pratap Tokekar, Furong Huang:
Reinforcement Learning under a Multi-agent Predictive State Representation Model: Method and Theory. ICLR 2022 - [c84]Qi Cai, Zhuoran Yang, Zhaoran Wang:
Reinforcement Learning from Partial Observation: Linear Function Approximation with Provable Sample Efficiency. ICML 2022: 2485-2522 - [c83]Siyu Chen, Donglin Yang, Jiayang Li, Senmiao Wang, Zhuoran Yang, Zhaoran Wang:
Adaptive Model Design for Markov Decision Process. ICML 2022: 3679-3700 - [c82]Xiaoyu Chen, Han Zhong, Zhuoran Yang, Zhaoran Wang, Liwei Wang:
Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation. ICML 2022: 3773-3793 - [c81]Hongyi Guo, Qi Cai, Yufeng Zhang, Zhuoran Yang, Zhaoran Wang:
Provably Efficient Offline Reinforcement Learning for Partially Observable Markov Decision Processes. ICML 2022: 8016-8038 - [c80]Zhihan Liu, Miao Lu, Zhaoran Wang, Michael I. Jordan, Zhuoran Yang:
Welfare Maximization in Competitive Equilibrium: Reinforcement Learning for Markov Exchange Economy. ICML 2022: 13870-13911 - [c79]Zhihan Liu, Yufeng Zhang, Zuyue Fu, Zhuoran Yang, Zhaoran Wang:
Learning from Demonstration: Provably Efficient Adversarial Policy Imitation with Linear Function Approximation. ICML 2022: 14094-14138 - [c78]Boxiang Lyu, Zhaoran Wang, Mladen Kolar, Zhuoran Yang:
Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning. ICML 2022: 14601-14638 - [c77]Shuang Qiu, Lingxiao Wang, Chenjia Bai, Zhuoran Yang, Zhaoran Wang:
Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning. ICML 2022: 18168-18210 - [c76]Han Zhong, Wei Xiong, Jiyuan Tan
, Liwei Wang, Tong Zhang, Zhaoran Wang, Zhuoran Yang:
Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets. ICML 2022: 27117-27142 - [c75]Gene Li, Junbo Li, Anmol Kabra, Nati Srebro, Zhaoran Wang, Zhuoran Yang:
Exponential Family Model-Based Reinforcement Learning via Score Matching. NeurIPS 2022 - [c74]Boyi Liu, Jiayang Li, Zhuoran Yang, Hoi-To Wai, Mingyi Hong, Yu Nie, Zhaoran Wang:
Inducing Equilibria via Incentives: Simultaneous Design-and-Play Ensures Global Convergence. NeurIPS 2022 - [c73]Yifei Min, Tianhao Wang, Ruitu Xu, Zhaoran Wang, Michael I. Jordan, Zhuoran Yang:
Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets. NeurIPS 2022 - [c72]Grigoris Velegkas, Zhuoran Yang, Amin Karbasi:
Reinforcement Learning with Logarithmic Regret and Policy Switches. NeurIPS 2022 - [c71]Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin Liang:
A Unifying Framework of Off-Policy General Value Function Evaluation. NeurIPS 2022 - [c70]Fengzhuo Zhang, Boyi Liu, Kaixin Wang, Vincent Y. F. Tan, Zhuoran Yang, Zhaoran Wang:
Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL. NeurIPS 2022 - [c69]Shichao Xu, Yangyang Fu, Yixuan Wang, Zhuoran Yang, Zheng O'Neill
, Zhaoran Wang, Qi Zhu:
Accelerate online reinforcement learning for building HVAC control with heterogeneous expert guidances. BuildSys@SenSys 2022: 89-98 - [c68]Jibang Wu, Zixuan Zhang, Zhe Feng, Zhaoran Wang, Zhuoran Yang, Michael I. Jordan, Haifeng Xu:
Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning. EC 2022: 471-472 - [i96]Yixuan Wang, Chao Huang, Zhaoran Wang, Zhuoran Yang, Qi Zhu:
Joint Differentiable Optimization and Verification for Certified Reinforcement Learning. CoRR abs/2201.12243 (2022) - [i95]Han Zhong, Wei Xiong, Jiyuan Tan, Liwei Wang, Tong Zhang, Zhaoran Wang, Zhuoran Yang:
Pessimistic Minimax Value Iteration: Provably Efficient Equilibrium Learning from Offline Datasets. CoRR abs/2202.07511 (2022) - [i94]Jibang Wu, Zixuan Zhang, Zhe Feng, Zhaoran Wang, Zhuoran Yang, Michael I. Jordan, Haifeng Xu:
Sequential Information Design: Markov Persuasion Process and Its Efficient Reinforcement Learning. CoRR abs/2202.10678 (2022) - [i93]Chenjia Bai, Lingxiao Wang, Zhuoran Yang, Zhihong Deng, Animesh Garg, Peng Liu, Zhaoran Wang:
Pessimistic Bootstrapping for Uncertainty-Driven Offline Reinforcement Learning. CoRR abs/2202.11566 (2022) - [i92]Boxiang Lyu, Qinglin Meng, Shuang Qiu, Zhaoran Wang, Zhuoran Yang, Michael I. Jordan:
Learning Dynamic Mechanisms in Unknown Environments: A Reinforcement Learning Approach. CoRR abs/2202.12797 (2022) - [i91]Grigoris Velegkas, Zhuoran Yang, Amin Karbasi:
The Best of Both Worlds: Reinforcement Learning with Logarithmic Regret and Policy Switches. CoRR abs/2203.01491 (2022) - [i90]Yifei Min, Tianhao Wang, Ruitu Xu, Zhaoran Wang, Michael I. Jordan, Zhuoran Yang:
Learn to Match with No Regret: Reinforcement Learning in Markov Matching Markets. CoRR abs/2203.03684 (2022) - [i89]Qi Cai, Zhuoran Yang, Zhaoran Wang:
Sample-Efficient Reinforcement Learning for POMDPs with Linear Function Approximations. CoRR abs/2204.09787 (2022) - [i88]Boxiang Lyu, Zhaoran Wang, Mladen Kolar, Zhuoran Yang:
Pessimism meets VCG: Learning Dynamic Mechanism Design via Offline Reinforcement Learning. CoRR abs/2205.02450 (2022) - [i87]Xiaoyu Chen, Han Zhong, Zhuoran Yang, Zhaoran Wang, Liwei Wang:
Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation. CoRR abs/2205.11140 (2022) - [i86]Lingxiao Wang, Qi Cai, Zhuoran Yang, Zhaoran Wang:
Embed to Control Partially Observed Systems: Representation Learning with Provable Sample Efficiency. CoRR abs/2205.13476 (2022) - [i85]Miao Lu, Yifei Min, Zhaoran Wang, Zhuoran Yang:
Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes. CoRR abs/2205.13589 (2022) - [i84]Wenhao Zhan, Jason D. Lee, Zhuoran Yang:
Decentralized Optimistic Hyperpolicy Mirror Descent: Provably No-Regret Learning in Markov Games. CoRR abs/2206.01588 (2022) - [i83]Shuang Qiu, Xiaohan Wei, Jieping Ye, Zhaoran Wang, Zhuoran Yang:
Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions. CoRR abs/2207.12463 (2022) - [i82]Shuang Qiu, Lingxiao Wang, Chenjia Bai, Zhuoran Yang, Zhaoran Wang:
Contrastive UCB: Provably Efficient Contrastive Self-Supervised Learning in Online Reinforcement Learning. CoRR abs/2207.14800 (2022) - [i81]Mengxin Yu, Zhuoran Yang, Jianqing Fan:
Strategic Decision-Making in the Presence of Information Asymmetry: Provably Efficient RL with Algorithmic Instruments. CoRR abs/2208.11040 (2022) - [i80]Zuyue Fu, Zhengling Qi
, Zhaoran Wang, Zhuoran Yang, Yanxun Xu, Michael R. Kosorok:
Offline Reinforcement Learning with Instrumental Variables in Confounded Markov Decision Processes. CoRR abs/2209.08666 (2022) - [i79]Fengzhuo Zhang, Boyi Liu, Kaixin Wang, Vincent Y. F. Tan, Zhuoran Yang, Zhaoran Wang:
Relational Reasoning via Set Transformers: Provable Efficiency and Applications to MARL. CoRR abs/2209.09845 (2022) - [i78]Yixuan Wang, Simon Sinong Zhan, Ruochen Jiao, Zhilu Wang, Wanxin Jin, Zhuoran Yang, Zhaoran Wang, Chao Huang, Qi Zhu:
Enforcing Hard Constraints with Soft Barriers: Safe Reinforcement Learning in Unknown Stochastic Environments. CoRR abs/2209.15090 (2022) - [i77]Rui Ai, Boxiang Lyu, Zhaoran Wang, Zhuoran Yang, Michael I. Jordan:
A Reinforcement Learning Approach in Multi-Phase Second-Price Auction Design. CoRR abs/2210.10278 (2022) - [i76]Han Zhong, Wei Xiong, Sirui Zheng, Liwei Wang, Zhaoran Wang, Zhuoran Yang, Tong Zhang:
GEC: A Unified Framework for Interactive Decision Making in MDP, POMDP, and Beyond. CoRR abs/2211.01962 (2022) - [i75]Banghua Zhu, Stephen Bates, Zhuoran Yang, Yixin Wang, Jiantao Jiao, Michael I. Jordan:
The Sample Complexity of Online Contract Design. CoRR abs/2211.05732 (2022) - [i74]Ying Jin, Zhimei Ren, Zhuoran Yang, Zhaoran Wang:
Policy learning "without" overlap: Pessimism and generalized empirical Bernstein's inequality. CoRR abs/2212.09900 (2022) - [i73]Zuyue Fu, Zhengling Qi
, Zhuoran Yang, Zhaoran Wang, Lan Wang:
Offline Reinforcement Learning for Human-Guided Human-Machine Interaction with Private Information. CoRR abs/2212.12167 (2022) - [i72]Riashat Islam, Samarth Sinha, Homanga Bharadhwaj, Samin Yeasar Arnob, Zhuoran Yang, Animesh Garg, Zhaoran Wang, Lihong Li, Doina Precup:
Offline Policy Optimization in RL with Variance Regularizaton. CoRR abs/2212.14405 (2022) - 2021
- [j10]Liya Fu, Zhuoran Yang, Jun Zhang, Anle Long, Yan Zhou:
Generalized estimating equations for analyzing multivariate survival data. Commun. Stat. Simul. Comput. 50(10): 3060-3068 (2021) - [j9]Liya Fu, Zhuoran Yang, Fengjing Cai, You-Gan Wang
:
Efficient and doubly-robust methods for variable selection and parameter estimation in longitudinal data analysis. Comput. Stat. 36(2): 781-804 (2021) - [j8]Shuang Qiu
, Zhuoran Yang
, Jieping Ye, Zhaoran Wang:
On Finite-Time Convergence of Actor-Critic Algorithm. IEEE J. Sel. Areas Inf. Theory 2(2): 652-664 (2021) - [j7]Kaiqing Zhang
, Zhuoran Yang, Tamer Basar:
Decentralized multi-agent reinforcement learning with networked agents: recent advances. Frontiers Inf. Technol. Electron. Eng. 22(6): 802-814 (2021) - [j6]Kaiqing Zhang
, Zhuoran Yang
, Han Liu
, Tong Zhang
, Tamer Basar
:
Finite-Sample Analysis for Decentralized Batch Multiagent Reinforcement Learning With Networked Agents. IEEE Trans. Autom. Control. 66(12): 5925-5940 (2021) - [c67]Jiaheng Wei, Zuyue Fu, Yang Liu, Xingyu Li, Zhuoran Yang, Zhaoran Wang:
Sample Elicitation. AISTATS 2021: 2692-2700 - [c66]Yufeng Zhang, Zhuoran Yang, Zhaoran Wang:
Provably Efficient Actor-Critic for Risk-Sensitive and Robust Adversarial RL: A Linear-Quadratic Case. AISTATS 2021: 2764-2772 - [c65]Dongsheng Ding, Xiaohan Wei, Zhuoran Yang, Zhaoran Wang, Mihailo R. Jovanovic:
Provably Efficient Safe Exploration via Primal-Dual Policy Optimization. AISTATS 2021: 3304-3312 - [c64]Zuyue Fu, Zhuoran Yang, Zhaoran Wang:
Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy. ICLR 2021 - [c63]Yingjie Fei, Zhuoran Yang, Zhaoran Wang:
Risk-Sensitive Reinforcement Learning with Function Approximation: A Debiasing Approach. ICML 2021: 3198-3207 - [c62]Hongyi Guo, Zuyue Fu, Zhuoran Yang, Zhaoran Wang:
Decentralized Single-Timescale Actor-Critic on Zero-Sum Two-Player Stochastic Games. ICML 2021: 3899-3909 - [c61]Haque Ishfaq, Qiwen Cui, Viet Nguyen, Alex Ayoub, Zhuoran Yang, Zhaoran Wang, Doina Precup, Lin Yang:
Randomized Exploration in Reinforcement Learning with General Value Function Approximation. ICML 2021: 4607-4616 - [c60]Ying Jin, Zhuoran Yang, Zhaoran Wang:
Is Pessimism Provably Efficient for Offline RL? ICML 2021: 5084-5096 - [c59]Lewis Liu, Yufeng Zhang, Zhuoran Yang, Reza Babanezhad, Zhaoran Wang:
Infinite-Dimensional Optimization for Zero-Sum Games via Variational Transport. ICML 2021: 7033-7044 - [c58]Shuang Qiu, Xiaohan Wei, Jieping Ye, Zhaoran Wang, Zhuoran Yang:
Provably Efficient Fictitious Play Policy Optimization for Zero-Sum Markov Games with Structured Transitions. ICML 2021: 8715-8725 - [c57]Shuang Qiu, Jieping Ye, Zhaoran Wang, Zhuoran Yang:
On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game. ICML 2021: 8737-8747 - [c56]Wesley Suttle, Kaiqing Zhang, Zhuoran Yang, Ji Liu, David N. Kraemer:
Reinforcement Learning for Cost-Aware Markov Decision Processes. ICML 2021: 9989-9999 - [c55]Weichen Wang, Jiequn Han, Zhuoran Yang, Zhaoran Wang:
Global Convergence of Policy Gradient for Linear-Quadratic Mean-Field Control/Game in Continuous Time. ICML 2021: 10772-10782 - [c54]Qiaomin Xie, Zhuoran Yang, Zhaoran Wang, Andreea Minca:
Learning While Playing in Mean-Field Games: Convergence and Optimality. ICML 2021: 11436-11447 - [c53]Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin Liang:
Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality. ICML 2021: 11581-11591 - [c52]Jingwei Zhang, Zhuoran Yang, Zhengyuan Zhou, Zhaoran Wang:
Provably Sample Efficient Reinforcement Learning in Competitive Linear Quadratic Systems. L4DC 2021: 597-598 - [c51]Boyi Liu, Qi Cai, Zhuoran Yang, Zhaoran Wang:
BooVI: Provably Efficient Bootstrapped Value Iteration. NeurIPS 2021: 7041-7053 - [c50]Yufeng Zhang, Siyu Chen, Zhuoran Yang, Michael I. Jordan, Zhaoran Wang:
Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic. NeurIPS 2021: 15993-16006 - [c49]Minshuo Chen, Yan Li, Ethan Wang, Zhuoran Yang, Zhaoran Wang, Tuo Zhao:
Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL. NeurIPS 2021: 17913-17926 - [c48]Yingjie Fei, Zhuoran Yang, Yudong Chen, Zhaoran Wang:
Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning. NeurIPS 2021: 20436-20446 - [c47]Lingxiao Wang, Zhuoran Yang, Zhaoran Wang:
Provably Efficient Causal Reinforcement Learning with Confounded Observational Data. NeurIPS 2021: 21164-21175 - [c46]Runzhe Wu, Yufeng Zhang, Zhuoran Yang, Zhaoran Wang:
Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration. NeurIPS 2021: 25439-25451 - [c45]Prashant Khanduri, Siliang Zeng, Mingyi Hong, Hoi-To Wai, Zhaoran Wang, Zhuoran Yang:
A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum. NeurIPS 2021: 30271-30283 - [i71]Prashant Khanduri, Siliang Zeng, Mingyi Hong, Hoi-To Wai, Zhaoran Wang, Zhuoran Yang:
A Momentum-Assisted Single-Timescale Stochastic Approximation Algorithm for Bilevel Optimization. CoRR abs/2102.07367 (2021) - [i70]Luofeng Liao, Zuyue Fu, Zhuoran Yang, Mladen Kolar, Zhaoran Wang:
Instrumental Variable Value Iteration for Causal Offline Reinforcement Learning. CoRR abs/2102.09907 (2021) - [i69]Tengyu Xu, Zhuoran Yang, Zhaoran Wang, Yingbin Liang:
Doubly Robust Off-Policy Actor-Critic: Convergence and Optimality. CoRR abs/2102.11866 (2021) - [i68]