default search action
Peter Sunehag
Person information
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
2020 – today
- 2023
- [c34]Peter Sunehag, Alexander Sasha Vezhnevets, Edgar A. Duéñez-Guzmán, Igor Mordatch, Joel Z. Leibo:
Diversity Through Exclusion (DTE): Niche Identification for Reinforcement Learning through Value-Decomposition. AAMAS 2023: 2827-2829 - [i19]Peter Sunehag, Alexander Sasha Vezhnevets, Edgar A. Duéñez-Guzmán, Igor Mordatch, Joel Z. Leibo:
Diversity Through Exclusion (DTE): Niche Identification for Reinforcement Learning through Value-Decomposition. CoRR abs/2302.01180 (2023) - [i18]Yali Du, Joel Z. Leibo, Usman Islam, Richard Willis, Peter Sunehag:
A Review of Cooperation in Multi-agent Learning. CoRR abs/2312.05162 (2023) - 2022
- [i17]John P. Agapiou, Alexander Sasha Vezhnevets, Edgar A. Duéñez-Guzmán, Jayd Matyas, Yiran Mao, Peter Sunehag, Raphael Köster, Udari Madhushani, Kavya Kopparapu, Ramona Comanescu, DJ Strouse, Michael Bradley Johanson, Sukhdeep Singh, Julia Haas, Igor Mordatch, Dean Mobbs, Joel Z. Leibo:
Melting Pot 2.0. CoRR abs/2211.13746 (2022) - 2021
- [c33]Joel Z. Leibo, Edgar A. Duéñez-Guzmán, Alexander Vezhnevets, John P. Agapiou, Peter Sunehag, Raphael Koster, Jayd Matyas, Charlie Beattie, Igor Mordatch, Thore Graepel:
Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot. ICML 2021: 6187-6199 - [i16]Joel Z. Leibo, Edgar A. Duéñez-Guzmán, Alexander Sasha Vezhnevets, John P. Agapiou, Peter Sunehag, Raphael Koster, Jayd Matyas, Charles Beattie, Igor Mordatch, Thore Graepel:
Scalable Evaluation of Multi-Agent Reinforcement Learning with Melting Pot. CoRR abs/2107.06857 (2021) - 2020
- [c32]Jiachen Yang, Ang Li, Mehrdad Farajtabar, Peter Sunehag, Edward Hughes, Hongyuan Zha:
Learning to Incentivize Other Learning Agents. NeurIPS 2020 - [i15]Jiachen Yang, Ang Li, Mehrdad Farajtabar, Peter Sunehag, Edward Hughes, Hongyuan Zha:
Learning to Incentivize Other Learning Agents. CoRR abs/2006.06051 (2020)
2010 – 2019
- 2019
- [c31]Joel Z. Leibo, Julien Pérolat, Edward Hughes, Steven Wheelwright, Adam H. Marblestone, Edgar A. Duéñez-Guzmán, Peter Sunehag, Iain Dunning, Thore Graepel:
Malthusian Reinforcement Learning. AAMAS 2019: 1099-1107 - [c30]Peter Sunehag, Guy Lever, Siqi Liu, Josh Merel, Nicolas Heess, Joel Z. Leibo, Edward Hughes, Tom Eccles, Thore Graepel:
Reinforcement Learning Agents acquire Flocking and Symbiotic Behaviour in Simulated Ecosystems. ALIFE 2019: 103-110 - 2018
- [c29]Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinícius Flores Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, Thore Graepel:
Value-Decomposition Networks For Cooperative Multi-Agent Learning Based On Team Reward. AAMAS 2018: 2085-2087 - [i14]Joel Z. Leibo, Julien Pérolat, Edward Hughes, Steven Wheelwright, Adam H. Marblestone, Edgar A. Duéñez-Guzmán, Peter Sunehag, Iain Dunning, Thore Graepel:
Malthusian Reinforcement Learning. CoRR abs/1812.07019 (2018) - 2017
- [i13]Peter Sunehag, Guy Lever, Audrunas Gruslys, Wojciech Marian Czarnecki, Vinícius Flores Zambaldi, Max Jaderberg, Marc Lanctot, Nicolas Sonnerat, Joel Z. Leibo, Karl Tuyls, Thore Graepel:
Value-Decomposition Networks For Cooperative Multi-Agent Learning. CoRR abs/1706.05296 (2017) - 2015
- [j3]Peter Sunehag, Marcus Hutter:
Rationality, optimism and guarantees in general reinforcement learning. J. Mach. Learn. Res. 16: 1345-1390 (2015) - [c28]Peter Sunehag, Marcus Hutter:
Using Localization and Factorization to Reduce the Complexity of Reinforcement Learning. AGI 2015: 177-186 - [i12]Peter Sunehag, Richard Evans, Gabriel Dulac-Arnold, Yori Zwols, Daniel Visentin, Ben Coppin:
Deep Reinforcement Learning with Attention for Slate Markov Decision Processes with High-Dimensional States and Actions. CoRR abs/1512.01124 (2015) - [i11]Gabriel Dulac-Arnold, Richard Evans, Peter Sunehag, Ben Coppin:
Reinforcement Learning in Large Discrete Action Spaces. CoRR abs/1512.07679 (2015) - 2014
- [c27]Mayank Daswani, Peter Sunehag, Marcus Hutter:
Reinforcement learning with value advice. ACML 2014 - [c26]Peter Sunehag, Marcus Hutter:
Intelligence as Inference or Forcing Occam on the World. AGI 2014: 186-195 - [c25]Peter Sunehag, Marcus Hutter:
A Dual Process Theory of Optimistic Cognition. CogSci 2014 - 2013
- [c24]Mayank Daswani, Peter Sunehag, Marcus Hutter:
Q-learning for history-based reinforcement learning. ACML 2013: 213-228 - [c23]Peter Sunehag, Marcus Hutter:
Learning Agents with Evolving Hypothesis Classes. AGI 2013: 150-159 - [c22]Tor Lattimore, Marcus Hutter, Peter Sunehag:
Concentration and Confidence for Discrete Bayesian Sequence Predictors. ALT 2013: 324-338 - [c21]Gareth Oliver, Peter Sunehag, Tom Gedeon:
Online feature selection for Brain Computer Interfaces. CCMB 2013: 122-129 - [c20]Tor Lattimore, Marcus Hutter, Peter Sunehag:
The Sample-Complexity of General Reinforcement Learning. ICML (3) 2013: 28-36 - [i10]Tor Lattimore, Marcus Hutter, Peter Sunehag:
Concentration and Confidence for Discrete Bayesian Sequence Predictors. CoRR abs/1307.0127 (2013) - [i9]Hadi Mohasel Afshar, Peter Sunehag:
On Nicod's Condition, Rules of Induction and the Raven Paradox. CoRR abs/1307.3435 (2013) - [i8]Tor Lattimore, Marcus Hutter, Peter Sunehag:
The Sample-Complexity of General Reinforcement Learning. CoRR abs/1308.4828 (2013) - 2012
- [c19]Phuong Minh Nguyen, Peter Sunehag, Marcus Hutter:
Context Tree Maximizing. AAAI 2012: 1075-1082 - [c18]Peter Sunehag, Marcus Hutter:
Optimistic AIXI. AGI 2012: 312-321 - [c17]Joel Veness, Peter Sunehag, Marcus Hutter:
On Ensemble Techniques for AIXI Approximation. AGI 2012: 341-351 - [c16]Peter Sunehag, Marcus Hutter:
Optimistic Agents Are Asymptotically Optimal. Australasian Conference on Artificial Intelligence 2012: 15-26 - [c15]Peter Sunehag, Wen Shao, Marcus Hutter:
Coding of Non-Stationary Sources as a Foundation for Detecting Change Points and Outliers in Binary Time-Series. AusDM 2012: 79-84 - [c14]Alexander O'Neill, Marcus Hutter, Wen Shao, Peter Sunehag:
Adaptive Context Tree Weighting. DCC 2012: 317-326 - [c13]Gareth Oliver, Peter Sunehag, Tom Gedeon:
Recursive channel selection techniques for brain computer interfaces. EMBC 2012: 1753-1756 - [c12]Gareth Oliver, Peter Sunehag, Tom Gedeon:
Asynchronous Brain Computer Interface using Hidden Semi-Markov Models. EMBC 2012: 2728-2731 - [c11]Mayank Daswani, Peter Sunehag, Marcus Hutter:
Feature Reinforcement Learning using Looping Suffix Trees. EWRL 2012: 11-24 - [i7]Alexander O'Neill, Marcus Hutter, Wen Shao, Peter Sunehag:
Adaptive Context Tree Weighting. CoRR abs/1201.2056 (2012) - [i6]Peter Sunehag, Marcus Hutter:
Optimistic Agents are Asymptotically Optimal. CoRR abs/1210.0077 (2012) - 2011
- [c10]Peter Sunehag, Marcus Hutter:
Axioms for Rational Reinforcement Learning. ALT 2011: 338-352 - [c9]Peter Sunehag, Marcus Hutter:
Principles of Solomonoff Induction and AIXI. Algorithmic Probability and Friends 2011: 386-398 - [c8]Ian Wood, Peter Sunehag, Marcus Hutter:
(Non-)Equivalence of Universal Priors. Algorithmic Probability and Friends 2011: 417-425 - [c7]Matthew W. Robards, Peter Sunehag:
Gradient Based Algorithms with Loss Functions and Kernels for Improved On-Policy Control. EWRL 2011: 30-41 - [c6]Phuong Minh Nguyen, Peter Sunehag, Marcus Hutter:
Feature Reinforcement Learning in Practice. EWRL 2011: 66-77 - [c5]Matthew W. Robards, Peter Sunehag, Scott Sanner, Bhaskara Marthi:
Sparse Kernel-SARSA(λ) with an Eligibility Trace. ECML/PKDD (3) 2011: 1-17 - [i5]Peter Sunehag, Marcus Hutter:
Axioms for Rational Reinforcement Learning. CoRR abs/1107.5520 (2011) - [i4]Phuong Minh Nguyen, Peter Sunehag, Marcus Hutter:
Feature Reinforcement Learning In Practice. CoRR abs/1108.3614 (2011) - [i3]Ian Wood, Peter Sunehag, Marcus Hutter:
(Non-)Equivalence of Universal Priors. CoRR abs/1111.3854 (2011) - [i2]Peter Sunehag, Marcus Hutter:
Principles of Solomonoff Induction and AIXI. CoRR abs/1111.6117 (2011) - 2010
- [j2]Owen Thomas, Peter Sunehag, Gideon Dror, Sungrack Yun, Sungwoong Kim, Matthew W. Robards, Alexander J. Smola, Daniel Green, Philo Saunders:
Wearable sensor activity analysis using semi-Markov models with a grammar. Pervasive Mob. Comput. 6(3): 342-350 (2010) - [c4]Peter Sunehag, Marcus Hutter:
Consistency of Feature Markov Processes. ALT 2010: 360-374 - [i1]Peter Sunehag, Marcus Hutter:
Consistency of Feature Markov Processes. CoRR abs/1007.2075 (2010)
2000 – 2009
- 2009
- [c3]Matthew W. Robards, Peter Sunehag:
Semi-Markov kMeans Clustering and Activity Recognition from Body-Worn Sensors. ICDM 2009: 438-446 - [c2]Peter Sunehag, Jochen Trumpf, S. V. N. Vishwanathan, Nicol N. Schraudolph:
Variable Metric Stochastic Approximation Theory. AISTATS 2009: 560-566 - 2007
- [c1]Peter Sunehag:
Emerge and spread models and word burstiness. AISTATS 2007: 540-547 - 2004
- [j1]Peter Sunehag:
Subcouples of codimension one and interpolation of operators that almost agree. J. Approx. Theory 130(1): 78-98 (2004)
Coauthor Index
manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.
Unpaywalled article links
Add open access links from to the list of external document links (if available).
Privacy notice: By enabling the option above, your browser will contact the API of unpaywall.org to load hyperlinks to open access articles. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Unpaywall privacy policy.
Archived links via Wayback Machine
For web page which are no longer available, try to retrieve content from the of the Internet Archive (if available).
Privacy notice: By enabling the option above, your browser will contact the API of archive.org to check for archived content of web pages that are no longer available. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Internet Archive privacy policy.
Reference lists
Add a list of references from , , and to record detail pages.
load references from crossref.org and opencitations.net
Privacy notice: By enabling the option above, your browser will contact the APIs of crossref.org, opencitations.net, and semanticscholar.org to load article reference information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the Crossref privacy policy and the OpenCitations privacy policy, as well as the AI2 Privacy Policy covering Semantic Scholar.
Citation data
Add a list of citing articles from and to record detail pages.
load citations from opencitations.net
Privacy notice: By enabling the option above, your browser will contact the API of opencitations.net and semanticscholar.org to load citation information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the OpenCitations privacy policy as well as the AI2 Privacy Policy covering Semantic Scholar.
OpenAlex data
Load additional information about publications from .
Privacy notice: By enabling the option above, your browser will contact the API of openalex.org to load additional information. Although we do not have any reason to believe that your call will be tracked, we do not have any control over how the remote server uses your data. So please proceed with care and consider checking the information given by OpenAlex.
last updated on 2024-05-02 21:01 CEST by the dblp team
all metadata released as open data under CC0 1.0 license
see also: Terms of Use | Privacy Policy | Imprint