


default search action
Hiteshi Sharma
Person information
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2024
- [c12]Baihe Huang, Hiteshi Sharma, Yi Mao:
Enhancing Language Model Alignment: A Confidence-Based Approach to Label Smoothing. EMNLP 2024: 21341-21352 - [c11]Jiazhan Feng, Ruochen Xu, Junheng Hao, Hiteshi Sharma, Yelong Shen, Dongyan Zhao, Weizhu Chen:
Language Models can be Deductive Solvers. NAACL-HLT (Findings) 2024: 4026-4042 - [i11]Marah I Abdin, Sam Ade Jacobs, Ammar Ahmad Awan, Jyoti Aneja, Ahmed Awadallah, Hany Awadalla, Nguyen Bach, Amit Bahree, Arash Bakhtiari, Harkirat S. Behl, Alon Benhaim, Misha Bilenko, Johan Bjorck, Sébastien Bubeck, Martin Cai, Caio César Teodoro Mendes, Weizhu Chen, Vishrav Chaudhary, Parul Chopra, Allie Del Giorno, Gustavo de Rosa, Matthew Dixon, Ronen Eldan, Dan Iter, Amit Garg, Abhishek Goswami, Suriya Gunasekar, Emman Haider, Junheng Hao, Russell J. Hewett, Jamie Huynh, Mojan Javaheripi, Xin Jin, Piero Kauffmann, Nikos Karampatziakis, Dongwoo Kim, Mahoud Khademi, Lev Kurilenko, James R. Lee, Yin Tat Lee, Yuanzhi Li, Chen Liang, Weishung Liu, Eric Lin, Zeqi Lin, Piyush Madan, Arindam Mitra, Hardik Modi, Anh Nguyen, Brandon Norick, Barun Patra, Daniel Perez-Becker, Thomas Portet, Reid Pryzant, Heyang Qin, Marko Radmilac, Corby Rosset, Sambudha Roy, Olatunji Ruwase, Olli Saarikivi, Amin Saied, Adil Salim, Michael Santacroce, Shital Shah, Ning Shang, Hiteshi Sharma, Xia Song, Masahiro Tanaka, Xin Wang, Rachel Ward, Guanhua Wang, Philipp Witte, Michael Wyatt, Can Xu, Jiahang Xu, Sonali Yadav, Fan Yang, Ziyi Yang, Donghan Yu, Chengruidong Zhang, Cyril Zhang, Jianwen Zhang, Li Lyna Zhang, Yi Zhang, Yue Zhang, Yunan Zhang, Xiren Zhou:
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone. CoRR abs/2404.14219 (2024) - [i10]Shenao Zhang, Donghan Yu, Hiteshi Sharma, Ziyi Yang, Shuohang Wang, Hany Hassan, Zhaoran Wang:
Self-Exploring Language Models: Active Preference Elicitation for Online Alignment. CoRR abs/2405.19332 (2024) - [i9]Yifang Chen, Shuohang Wang, Ziyi Yang, Hiteshi Sharma, Nikos Karampatziakis, Donghan Yu, Kevin G. Jamieson, Simon Shaolei Du, Yelong Shen:
Cost-Effective Proxy Reward Model Construction with On-Policy and Active Learning. CoRR abs/2407.02119 (2024) - [i8]Emman Haider, Daniel Perez-Becker, Thomas Portet, Piyush Madan, Amit Garg, David Majercak, Wen Wen, Dongwoo Kim, Ziyi Yang, Jianwen Zhang, Hiteshi Sharma, Blake Bullwinkel, Martin Pouliot, Amanda J. Minnich, Shiven Chawla
, Solianna Herrera, Shahed Warreth, Maggie Engler, Gary Lopez, Nina Chikanov, Raja Sekhar Rao Dheekonda, Bolor-Erdene Jagdagdorj, Roman Lutz, Richard Lundeen, Tori Westerhoff, Pete Bryan, Christian Seifert, Ram Shankar Siva Kumar, Andrew Berkley, Alex Kessler:
Phi-3 Safety Post-Training: Aligning Language Models with a "Break-Fix" Cycle. CoRR abs/2407.13833 (2024) - [i7]Nabil Omi, Hosein Hasanbeig, Hiteshi Sharma, Sriram K. Rajamani, Siddhartha Sen:
Progressive Safeguards for Safe and Model-Agnostic Reinforcement Learning. CoRR abs/2410.24096 (2024) - 2023
- [c10]Ida Momennejad, Hosein Hasanbeig, Felipe Vieira Frujeri, Hiteshi Sharma, Nebojsa Jojic, Hamid Palangi, Robert Osazuwa Ness, Jonathan Larson:
Evaluating Cognitive Maps and Planning in Large Language Models with CogEval. NeurIPS 2023 - [i6]Banghua Zhu, Hiteshi Sharma, Felipe Vieira Frujeri, Shi Dong, Chenguang Zhu, Michael I. Jordan, Jiantao Jiao:
Fine-Tuning Language Models with Advantage-Induced Policy Alignment. CoRR abs/2306.02231 (2023) - [i5]Hosein Hasanbeig, Hiteshi Sharma, Leo Betthauser, Felipe Vieira Frujeri, Ida Momennejad:
ALLURE: Auditing and Improving LLM-based Evaluation of Text using Iterative In-Context-Learning. CoRR abs/2309.13701 (2023) - [i4]Ida Momennejad, Hosein Hasanbeig, Felipe Vieira Frujeri, Hiteshi Sharma, Robert Osazuwa Ness, Nebojsa Jojic, Hamid Palangi, Jonathan Larson:
Evaluating Cognitive Maps and Planning in Large Language Models with CogEval. CoRR abs/2309.15129 (2023) - [i3]Jiazhan Feng, Ruochen Xu, Junheng Hao, Hiteshi Sharma, Yelong Shen, Dongyan Zhao, Weizhu Chen:
Language Models can be Logical Solvers. CoRR abs/2311.06158 (2023) - 2020
- [j1]William B. Haskell
, Rahul Jain
, Hiteshi Sharma
, Pengqian Yu
:
A Universal Empirical Dynamic Programming Algorithm for Continuous State MDPs. IEEE Trans. Autom. Control. 65(1): 115-129 (2020) - [c9]Hiteshi Sharma, Rahul Jain
:
Finite Time Guarantees for Continuous State MDPs with Generative Model. CDC 2020: 3617-3622 - [c8]Chen-Yu Wei, Mehdi Jafarnia-Jahromi, Haipeng Luo, Hiteshi Sharma, Rahul Jain:
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes. ICML 2020: 10170-10180 - [i2]Hiteshi Sharma, Rahul Jain:
Randomized Policy Learning for Continuous State and Action MDPs. CoRR abs/2006.04331 (2020)
2010 – 2019
- 2019
- [c7]Hiteshi Sharma, Rahul Jain
:
An Approximately Optimal Relative Value Learning Algorithm for Averaged MDPs with Continuous States and Actions. Allerton 2019: 734-740 - [c6]Hiteshi Sharma, Rahul Jain
, William B. Haskell:
Empirical Algorithms for General Stochastic Systems with Continuous States and Actions. CDC 2019: 6344-6349 - [c5]Hiteshi Sharma, Rahul Jain
, Abhishek K. Gupta:
An Empirical Relative Value Learning Algorithm for Non-parametric MDPs with Continuous State Space. ECC 2019: 1368-1373 - [c4]Hiteshi Sharma, Mehdi Jafarnia-Jahromi, Rahul Jain:
Approximate Relative Value Learning for Average-reward Continuous State MDPs. UAI 2019: 956-964 - [i1]Chen-Yu Wei, Mehdi Jafarnia-Jahromi, Haipeng Luo, Hiteshi Sharma, Rahul Jain:
Model-free Reinforcement Learning in Infinite-horizon Average-reward Markov Decision Processes. CoRR abs/1910.07072 (2019) - 2017
- [c3]William B. Haskell, Pengqian Yu, Hiteshi Sharma, Rahul Jain
:
Randomized function fitting-based empirical value iteration. CDC 2017: 2467-2472 - 2016
- [c2]William B. Haskell, Rahul Jain
, Hiteshi Sharma:
A dynamical systems framework for stochastic iterative optimization. CDC 2016: 4504-4509 - 2014
- [c1]Hiteshi Sharma, Aaqib Patel
, S. N. Merchant, Uday B. Desai:
Optimal Spectrum Sensing for Cognitive Radio with Imperfect Detector. VTC Spring 2014: 1-5
Coauthor Index

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from ,
, and
to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and
to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2025-02-09 14:58 CET by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint