


default search action
Yi Dong 0003
Person information
- affiliation: NVIDIA
Other persons with the same name
- Yi Dong — disambiguation page
- Yi Dong 0001
— Tongji University, College of Electronic and Information Engineering, Shanghai Research Institute for Intelligent Autonomous Systems, China (and 2 more) - Yi Dong 0002
— University of Liverpool, UK
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2025
[c7]Zhilin Wang, Jiaqi Zeng, Olivier Delalleau, Daniel Egert, Ellie Evans, Hoo-Chang Shin, Felipe Soares, Yi Dong, Oleksii Kuchaiev:
HelpSteer3: Human-Annotated Feedback and Edit Data to Empower Inference-Time Scaling in Open-Ended General-Domain Tasks. ACL (1) 2025: 25640-25662
[c6]Zhilin Wang, Alexander Bukharin, Olivier Delalleau, Daniel Egert, Gerald Shen, Jiaqi Zeng, Oleksii Kuchaiev, Yi Dong:
HelpSteer2-Preference: Complementing Ratings with Preferences. ICLR 2025
[c5]Michael J. Q. Zhang, Zhilin Wang, Jena D. Hwang, Yi Dong, Olivier Delalleau, Yejin Choi, Eunsol Choi, Xiang Ren, Valentina Pyatkin:
Diverging Preferences: When do Annotators Disagree and do Models Know? ICML 2025
[i17]Shengyang Sun, Yian Zhang, Alexander Bukharin, David Mosallanezhad, Jiaqi Zeng, Soumye Singhal, Gerald Shen, Adithya Renduchintala, Tugrul Konuk, Yi Dong, Zhilin Wang, Dmitry Chichkov, Olivier Delalleau, Oleksii Kuchaiev:
Reward-aware Preference Optimization: A Unified Mathematical Framework for Model Alignment. CoRR abs/2502.00203 (2025)
[i16]Zhilin Wang, Jiaqi Zeng, Olivier Delalleau, Daniel Egert, Ellie Evans, Hoo-Chang Shin, Felipe Soares, Yi Dong, Oleksii Kuchaiev:
Dedicated Feedback and Edit Models Empower Inference-Time Scaling for Open-Ended General-Domain Tasks. CoRR abs/2503.04378 (2025)
[i15]Shaokun Zhang, Yi Dong, Jieyu Zhang, Jan Kautz, Bryan Catanzaro, Andrew Tao, Qingyun Wu, Zhiding Yu, Guilin Liu:
Nemotron-Research-Tool-N1: Exploring Tool-Using Language Models with Reinforced Reasoning. CoRR abs/2505.00024 (2025)
[i14]Zhilin Wang, Jiaqi Zeng, Olivier Delalleau, Hoo-Chang Shin, Felipe Soares, Alexander Bukharin, Ellie Evans, Yi Dong, Oleksii Kuchaiev:
HelpSteer3-Preference: Open Human-Annotated Preference Data across Diverse Tasks and Languages. CoRR abs/2505.11475 (2025)
[i13]Mingjie Liu, Shizhe Diao, Ximing Lu, Jian Hu, Xin Dong, Yejin Choi, Jan Kautz, Yi Dong:
ProRL: Prolonged Reinforcement Learning Expands Reasoning Boundaries in Large Language Models. CoRR abs/2505.24864 (2025)
[i12]Mingjie Liu, Shizhe Diao, Jian Hu, Ximing Lu, Xin Dong, Hao Zhang, Alexander Bukharin, Shaokun Zhang, Jiaqi Zeng, Makesh Narsimhan Sreedhar, Gerald Shen, David Mosallanezhad, Di Zhang, Jonas Yang, June Yang, Oleksii Kuchaiev, Guilin Liu, Zhiding Yu, Pavlo Molchanov, Yejin Choi, Jan Kautz, Yi Dong:
Scaling Up RL: Unlocking Diverse Reasoning in LLMs via Prolonged Training. CoRR abs/2507.12507 (2025)
[i11]Zhilin Wang, Jiaqi Zeng, Olivier Delalleau, Ellie Evans, Daniel Egert, Hoo-Chang Shin, Felipe Soares, Yi Dong, Oleksii Kuchaiev:
RLBFF: Binary Flexible Feedback to bridge between Human Feedback & Verifiable Rewards. CoRR abs/2509.21319 (2025)
[i10]Jian Hu, Mingjie Liu, Ximing Lu, Fang Wu, Zaïd Harchaoui, Shizhe Diao, Yejin Choi, Pavlo Molchanov, June Yang, Jan Kautz, Yi Dong:
BroRL: Scaling Reinforcement Learning via Broadened Exploration. CoRR abs/2510.01180 (2025)
[i9]Zhilin Wang, Jaehun Jung, Ximing Lu, Shizhe Diao, Ellie Evans, Jiaqi Zeng, Pavlo Molchanov, Yejin Choi, Jan Kautz, Yi Dong:
ProfBench: Multi-Domain Rubrics requiring Professional Knowledge to Answer and Judge. CoRR abs/2510.18941 (2025)
[i8]Hongjin Su, Shizhe Diao, Ximing Lu, Mingjie Liu, Jiacheng Xu, Xin Dong, Yonggan Fu, Peter Belcak, Hanrong Ye, Hongxu Yin, Yi Dong, Evelina Bakhturina, Tao Yu, Yejin Choi, Jan Kautz, Pavlo Molchanov:
ToolOrchestra: Elevating Intelligence via Efficient Model and Tool Orchestration. CoRR abs/2511.21689 (2025)- 2024
[c4]Zhilin Wang, Yi Dong, Jiaqi Zeng, Virginia Adams, Makesh Narsimhan Sreedhar, Daniel Egert, Olivier Delalleau, Jane Polak Scowcroft, Neel Kant, Aidan Swope, Oleksii Kuchaiev:
HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM. NAACL-HLT 2024: 3371-3384
[c3]Zhilin Wang, Yi Dong, Olivier Delalleau, Jiaqi Zeng, Gerald Shen, Daniel Egert, Jimmy Zhang, Makesh Narsimhan Sreedhar, Oleksii Kuchaiev:
HelpSteer 2: Open-source dataset for training top-performing reward models. NeurIPS 2024
[i7]Gerald Shen, Zhilin Wang, Olivier Delalleau, Jiaqi Zeng, Yi Dong, Daniel Egert, Shengyang Sun, Jimmy J. Zhang, Sahil Jain, Ali Taghibakhshi, Markel Sanz Ausin, Ashwath Aithal, Oleksii Kuchaiev:
NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment. CoRR abs/2405.01481 (2024)
[i6]Zhilin Wang, Yi Dong, Olivier Delalleau, Jiaqi Zeng, Gerald Shen, Daniel Egert, Jimmy J. Zhang, Makesh Narsimhan Sreedhar, Oleksii Kuchaiev:
HelpSteer2: Open-source dataset for training top-performing reward models. CoRR abs/2406.08673 (2024)
[i5]Zhilin Wang, Alexander Bukharin, Olivier Delalleau, Daniel Egert, Gerald Shen, Jiaqi Zeng, Oleksii Kuchaiev, Yi Dong:
HelpSteer2-Preference: Complementing Ratings with Preferences. CoRR abs/2410.01257 (2024)
[i4]Michael J. Q. Zhang, Zhilin Wang, Jena D. Hwang, Yi Dong, Olivier Delalleau, Yejin Choi, Eunsol Choi, Xiang Ren, Valentina Pyatkin:
Diverging Preferences: When do Annotators Disagree and do Models Know? CoRR abs/2410.14632 (2024)- 2023
[c2]Boxin Wang, Wei Ping, Peng Xu, Lawrence McAfee, Zihan Liu, Mohammad Shoeybi, Yi Dong, Oleksii Kuchaiev, Bo Li, Chaowei Xiao, Anima Anandkumar, Bryan Catanzaro:
Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study. EMNLP 2023: 7763-7786
[c1]Yi Dong, Zhilin Wang, Makesh Narsimhan Sreedhar, Xianchao Wu, Oleksii Kuchaiev:
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF. EMNLP (Findings) 2023: 11275-11288
[i3]Boxin Wang, Wei Ping, Peng Xu, Lawrence McAfee, Zihan Liu, Mohammad Shoeybi, Yi Dong, Oleksii Kuchaiev, Bo Li, Chaowei Xiao, Anima Anandkumar, Bryan Catanzaro:
Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study. CoRR abs/2304.06762 (2023)
[i2]Yi Dong, Zhilin Wang, Makesh Narsimhan Sreedhar, Xianchao Wu, Oleksii Kuchaiev:
SteerLM: Attribute Conditioned SFT as an (User-Steerable) Alternative to RLHF. CoRR abs/2310.05344 (2023)
[i1]Zhilin Wang, Yi Dong, Jiaqi Zeng, Virginia Adams, Makesh Narsimhan Sreedhar, Daniel Egert, Olivier Delalleau, Jane Polak Scowcroft, Neel Kant, Aidan Swope, Oleksii Kuchaiev:
HelpSteer: Multi-attribute Helpfulness Dataset for SteerLM. CoRR abs/2311.09528 (2023)
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from
to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the
of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from
,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from
and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from
.
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2026-02-18 23:28 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID







