default search action
Michal Valko
Person information
- affiliation: DeepMind
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [c109]Mohammad Gheshlaghi Azar, Zhaohan Daniel Guo, Bilal Piot, Rémi Munos, Mark Rowland, Michal Valko, Daniele Calandriello:
A General Theoretical Paradigm to Understand Learning from Human Preferences. AISTATS 2024: 4447-4455 - [c108]Alaa Saade, Steven Kapturowski, Daniele Calandriello, Charles Blundell, Pablo Sprechmann, Leopoldo Sarra, Oliver Groth, Michal Valko, Bilal Piot:
Unlocking the Power of Representations in Long-term Novelty-based Exploration. ICLR 2024 - [c107]Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Ménard:
Demonstration-Regularized RL. ICLR 2024 - [c106]Daniele Calandriello, Zhaohan Daniel Guo, Rémi Munos, Mark Rowland, Yunhao Tang, Bernardo Ávila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot:
Human Alignment of Large Language Models through Online Preference Optimisation. ICML 2024 - [c105]Tianlin Liu, Shangmin Guo, Leonardo Bianco, Daniele Calandriello, Quentin Berthet, Felipe Llinares-López, Jessica Hoffmann, Lucas Dixon, Michal Valko, Mathieu Blondel:
Decoding-time Realignment of Language Models. ICML 2024 - [c104]Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Côme Fiegel, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot:
Nash Learning from Human Feedback. ICML 2024 - [c103]Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot:
Generalized Preference Optimization: A Unified Approach to Offline Alignment. ICML 2024 - [i85]Tianlin Liu, Shangmin Guo, Leonardo Bianco, Daniele Calandriello, Quentin Berthet, Felipe Llinares, Jessica Hoffmann, Lucas Dixon, Michal Valko, Mathieu Blondel:
Decoding-time Realignment of Language Models. CoRR abs/2402.02992 (2024) - [i84]Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Rémi Munos, Mark Rowland, Pierre Harvey Richemond, Michal Valko, Bernardo Ávila Pires, Bilal Piot:
Generalized Preference Optimization: A Unified Approach to Offline Alignment. CoRR abs/2402.05749 (2024) - [i83]Daniele Calandriello, Daniel Guo, Rémi Munos, Mark Rowland, Yunhao Tang, Bernardo Ávila Pires, Pierre Harvey Richemond, Charline Le Lan, Michal Valko, Tianqi Liu, Rishabh Joshi, Zeyu Zheng, Bilal Piot:
Human Alignment of Large Language Models through Online Preference Optimisation. CoRR abs/2403.08635 (2024) - [i82]Yunhao Tang, Zhaohan Daniel Guo, Zeyu Zheng, Daniele Calandriello, Yuan Cao, Eugene Tarassov, Rémi Munos, Bernardo Ávila Pires, Michal Valko, Yong Cheng, Will Dabney:
Understanding the performance gap between online and offline alignment algorithms. CoRR abs/2405.08448 (2024) - [i81]Aniket Didolkar, Anirudh Goyal, Nan Rosemary Ke, Siyuan Guo, Michal Valko, Timothy P. Lillicrap, Danilo J. Rezende, Yoshua Bengio, Michael Mozer, Sanjeev Arora:
Metacognitive Capabilities of LLMs: An Exploration in Mathematical Problem Solving. CoRR abs/2405.12205 (2024) - [i80]Pierre Perrault, Denis Belomestny, Pierre Ménard, Éric Moulines, Alexey Naumov, Daniil Tiapkin, Michal Valko:
A New Bound on the Cumulant Generating Function of Dirichlet Processes. CoRR abs/2409.18621 (2024) - [i79]Chaoqi Wang, Zhuokai Zhao, Chen Zhu, Karthik Abinav Sankararaman, Michal Valko, Xuefei Cao, Zhaorun Chen, Madian Khabsa, Yuxin Chen, Hao Ma, Sinong Wang:
Preference Optimization with Multi-Sample Comparisons. CoRR abs/2410.12138 (2024) - [i78]Antoine Scheid, Etienne Boursier, Alain Durmus, Michael I. Jordan, Pierre Ménard, Eric Moulines, Michal Valko:
Optimal Design for Reward Modeling in RLHF. CoRR abs/2410.17055 (2024) - 2023
- [c102]Mehdi Azabou, Venkataramana Ganesh, Shantanu Thakoor, Chi-Heng Lin, Lakshmi Sathidevi, Ran Liu, Michal Valko, Petar Velickovic, Eva L. Dyer:
Half-Hop: A graph upsampling approach for slowing down message passing. ICML 2023: 1341-1360 - [c101]Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko:
Adapting to game trees in zero-sum imperfect information games. ICML 2023: 10093-10135 - [c100]Daniel Jarrett, Corentin Tallec, Florent Altché, Thomas Mesnard, Rémi Munos, Michal Valko:
Curiosity in Hindsight: Intrinsic Exploration in Stochastic Environments. ICML 2023: 14780-14816 - [c99]Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári, Wataru Kumagai, Yutaka Matsuo:
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice. ICML 2023: 17135-17175 - [c98]Thomas Mesnard, Wenqi Chen, Alaa Saade, Yunhao Tang, Mark Rowland, Theophane Weber, Clare Lyle, Audrunas Gruslys, Michal Valko, Will Dabney, Georg Ostrovski, Eric Moulines, Rémi Munos:
Quantile Credit Assignment. ICML 2023: 24517-24531 - [c97]Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko:
Understanding Self-Predictive Learning for Reinforcement Learning. ICML 2023: 33632-33656 - [c96]Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Rémi Munos, Bernardo Ávila Pires, Michal Valko:
DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm. ICML 2023: 33657-33673 - [c95]Yunhao Tang, Rémi Munos, Mark Rowland, Michal Valko:
VA-learning as a more efficient alternative to Q-learning. ICML 2023: 33739-33757 - [c94]Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Rémi Munos, Alexey Naumov, Pierre Perrault, Yunhao Tang, Michal Valko, Pierre Ménard:
Fast Rates for Maximum Entropy Exploration. ICML 2023: 34161-34221 - [c93]Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Rémi Munos, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Ménard:
Model-free Posterior Sampling via Learning Rate Randomization. NeurIPS 2023 - [i77]Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Rémi Munos, Alexey Naumov, Pierre Perrault, Yunhao Tang, Michal Valko, Pierre Ménard:
Fast Rates for Maximum Entropy Exploration. CoRR abs/2303.08059 (2023) - [i76]Alaa Saade, Steven Kapturowski, Daniele Calandriello, Charles Blundell, Pablo Sprechmann, Leopoldo Sarra, Oliver Groth, Michal Valko, Bilal Piot:
Unlocking the Power of Representations in Long-term Novelty-based Exploration. CoRR abs/2305.01521 (2023) - [i75]Toshinori Kitamura, Tadashi Kozuno, Yunhao Tang, Nino Vieillard, Michal Valko, Wenhao Yang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári, Wataru Kumagai, Yutaka Matsuo:
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice. CoRR abs/2305.13185 (2023) - [i74]Yunhao Tang, Rémi Munos, Mark Rowland, Michal Valko:
VA-learning as a more efficient alternative to Q-learning. CoRR abs/2305.18161 (2023) - [i73]Yunhao Tang, Tadashi Kozuno, Mark Rowland, Anna Harutyunyan, Rémi Munos, Bernardo Ávila Pires, Michal Valko:
DoMo-AC: Doubly Multi-step Off-policy Actor-Critic Algorithm. CoRR abs/2305.18501 (2023) - [i72]Mehdi Azabou, Venkataramana Ganesh, Shantanu Thakoor, Chi-Heng Lin, Lakshmi Sathidevi, Ran Liu, Michal Valko, Petar Velickovic, Eva L. Dyer:
Half-Hop: A graph upsampling approach for slowing down message passing. CoRR abs/2308.09198 (2023) - [i71]Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko:
Local and adaptive mirror descents in extensive-form games. CoRR abs/2309.00656 (2023) - [i70]Mohammad Gheshlaghi Azar, Mark Rowland, Bilal Piot, Daniel Guo, Daniele Calandriello, Michal Valko, Rémi Munos:
A General Theoretical Paradigm to Understand Learning from Human Preferences. CoRR abs/2310.12036 (2023) - [i69]Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Ménard:
Demonstration-Regularized RL. CoRR abs/2310.17303 (2023) - [i68]Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Rémi Munos, Alexey Naumov, Pierre Perrault, Michal Valko, Pierre Ménard:
Model-free Posterior Sampling via Learning Rate Randomization. CoRR abs/2310.18186 (2023) - [i67]Rémi Munos, Michal Valko, Daniele Calandriello, Mohammad Gheshlaghi Azar, Mark Rowland, Zhaohan Daniel Guo, Yunhao Tang, Matthieu Geist, Thomas Mesnard, Andrea Michi, Marco Selvi, Sertan Girgin, Nikola Momchev, Olivier Bachem, Daniel J. Mankowitz, Doina Precup, Bilal Piot:
Nash Learning from Human Feedback. CoRR abs/2312.00886 (2023) - 2022
- [c92]Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko:
Marginalized Operators for Off-policy Reinforcement Learning. AISTATS 2022: 655-679 - [c91]Jean Tarbouriech, Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Michal Valko, Alessandro Lazaric:
Adaptive Multi-Goal Exploration. AISTATS 2022: 7349-7383 - [c90]Shantanu Thakoor, Corentin Tallec, Mohammad Gheshlaghi Azar, Mehdi Azabou, Eva L. Dyer, Rémi Munos, Petar Velickovic, Michal Valko:
Large-Scale Representation Learning on Graphs via Bootstrapping. ICLR 2022 - [c89]Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco:
Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times. ICML 2022: 2523-2541 - [c88]Anirudh Goyal, Abram L. Friesen, Andrea Banino, Theophane Weber, Nan Rosemary Ke, Adrià Puigdomènech Badia, Arthur Guez, Mehdi Mirza, Peter Conway Humphreys, Ksenia Konyushkova, Michal Valko, Simon Osindero, Timothy P. Lillicrap, Nicolas Heess, Charles Blundell:
Retrieval-Augmented Reinforcement Learning. ICML 2022: 7740-7765 - [c87]Daniil Tiapkin, Denis Belomestny, Eric Moulines, Alexey Naumov, Sergey Samsonov, Yunhao Tang, Michal Valko, Pierre Ménard:
From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses. ICML 2022: 21380-21431 - [c86]Zhaohan Guo, Shantanu Thakoor, Miruna Pislar, Bernardo Ávila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-Bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot:
BYOL-Explore: Exploration by Bootstrapped Prediction. NeurIPS 2022 - [c85]Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Rémi Munos, Alexey Naumov, Mark Rowland, Michal Valko, Pierre Ménard:
Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees. NeurIPS 2022 - [i66]Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco:
Scaling Gaussian Process Optimization by Evaluating a Few Unique Candidates Multiple Times. CoRR abs/2201.12909 (2022) - [i65]Anirudh Goyal, Abram L. Friesen, Andrea Banino, Theophane Weber, Nan Rosemary Ke, Adrià Puigdomènech Badia, Arthur Guez, Mehdi Mirza, Ksenia Konyushkova, Michal Valko, Simon Osindero, Timothy P. Lillicrap, Nicolas Heess, Charles Blundell:
Retrieval-Augmented Reinforcement Learning. CoRR abs/2202.08417 (2022) - [i64]Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko:
Marginalized Operators for Off-policy Reinforcement Learning. CoRR abs/2203.16177 (2022) - [i63]Daniil Tiapkin, Denis Belomestny, Eric Moulines, Alexey Naumov, Sergey Samsonov, Yunhao Tang, Michal Valko, Pierre Ménard:
From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses. CoRR abs/2205.07704 (2022) - [i62]Tadashi Kozuno, Wenhao Yang, Nino Vieillard, Toshinori Kitamura, Yunhao Tang, Jincheng Mei, Pierre Ménard, Mohammad Gheshlaghi Azar, Michal Valko, Rémi Munos, Olivier Pietquin, Matthieu Geist, Csaba Szepesvári:
KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal. CoRR abs/2205.14211 (2022) - [i61]Zhaohan Daniel Guo, Shantanu Thakoor, Miruna Pislar, Bernardo Ávila Pires, Florent Altché, Corentin Tallec, Alaa Saade, Daniele Calandriello, Jean-Bastien Grill, Yunhao Tang, Michal Valko, Rémi Munos, Mohammad Gheshlaghi Azar, Bilal Piot:
BYOL-Explore: Exploration by Bootstrapped Prediction. CoRR abs/2206.08332 (2022) - [i60]Daniil Tiapkin, Denis Belomestny, Daniele Calandriello, Eric Moulines, Rémi Munos, Alexey Naumov, Mark Rowland, Michal Valko, Pierre Ménard:
Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees. CoRR abs/2209.14414 (2022) - [i59]Daniel Jarrett, Corentin Tallec, Florent Altché, Thomas Mesnard, Rémi Munos, Michal Valko:
Curiosity in hindsight. CoRR abs/2211.10515 (2022) - [i58]Yunhao Tang, Zhaohan Daniel Guo, Pierre Harvey Richemond, Bernardo Ávila Pires, Yash Chandak, Rémi Munos, Mark Rowland, Mohammad Gheshlaghi Azar, Charline Le Lan, Clare Lyle, András György, Shantanu Thakoor, Will Dabney, Bilal Piot, Daniele Calandriello, Michal Valko:
Understanding Self-Predictive Learning for Reinforcement Learning. CoRR abs/2212.03319 (2022) - [i57]Côme Fiegel, Pierre Ménard, Tadashi Kozuno, Rémi Munos, Vianney Perchet, Michal Valko:
Adapting to game trees in zero-sum imperfect information games. CoRR abs/2212.12567 (2022) - 2021
- [j5]Karl Tuyls, Shayegan Omidshafiei, Paul Muller, Zhe Wang, Jerome T. Connor, Daniel Hennes, Ian Graham, William Spearman, Tim Waskett, Dafydd Steele, Pauline Luc, Adrià Recasens, Alexandre Galashov, Gregory Thornton, Romuald Elie, Pablo Sprechmann, Pol Moreno, Kris Cao, Marta Garnelo, Praneet Dutta, Michal Valko, Nicolas Heess, Alex Bridgland, Julien Pérolat, Bart De Vylder, S. M. Ali Eslami, Mark Rowland, Andrew Jaegle, Rémi Munos, Trevor Back, Razia Ahamed, Simon Bouton, Nathalie Beauguerlange, Jackson Broshear, Thore Graepel, Demis Hassabis:
Game Plan: What AI can do for Football, and What Football can do for AI. J. Artif. Intell. Res. 71: 41-88 (2021) - [j4]Guillaume Gautier, Rémi Bardenet, Michal Valko:
Fast sampling from β-ensembles. Stat. Comput. 31(1): 7 (2021) - [c84]Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko:
A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces. AISTATS 2021: 3538-3546 - [c83]Omar Darwiche Domingues, Pierre Ménard, Emilie Kaufmann, Michal Valko:
Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited. ALT 2021: 578-598 - [c82]Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Anders Jonsson, Edouard Leurent, Michal Valko:
Adaptive Reward-Free Exploration. ALT 2021: 865-891 - [c81]Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric:
Sample Complexity Bounds for Stochastic Shortest Path with a Generative Model. ALT 2021: 1157-1178 - [c80]Adrià Recasens, Pauline Luc, Jean-Baptiste Alayrac, Luyu Wang, Florian Strub, Corentin Tallec, Mateusz Malinowski, Viorica Patraucean, Florent Altché, Michal Valko, Jean-Bastien Grill, Aäron van den Oord, Andrew Zisserman:
Broaden Your Views for Self-Supervised Video Learning. ICCV 2021: 1235-1245 - [c79]Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko:
Kernel-Based Reinforcement Learning: A Finite-Time Analysis. ICML 2021: 2783-2792 - [c78]Xavier Fontaine, Pierre Perrault, Michal Valko, Vianney Perchet:
Online A-Optimal Design and Active Linear Regression. ICML 2021: 3374-3383 - [c77]Tadashi Kozuno, Yunhao Tang, Mark Rowland, Rémi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel:
Revisiting Peng's Q(λ) for Modern Reinforcement Learning. ICML 2021: 5794-5804 - [c76]Pierre Ménard, Omar Darwiche Domingues, Anders Jonsson, Emilie Kaufmann, Edouard Leurent, Michal Valko:
Fast active learning for pure exploration in reinforcement learning. ICML 2021: 7599-7608 - [c75]Pierre Ménard, Omar Darwiche Domingues, Xuedong Shang, Michal Valko:
UCB Momentum Q-learning: Correcting the bias without forgetting. ICML 2021: 7609-7618 - [c74]Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko:
Taylor Expansion of Discount Factors. ICML 2021: 10130-10140 - [c73]Yunhao Tang, Tadashi Kozuno, Mark Rowland, Rémi Munos, Michal Valko:
Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation. NeurIPS 2021: 5303-5315 - [c72]Jean Tarbouriech, Runlong Zhou, Simon S. Du, Matteo Pirotta, Michal Valko, Alessandro Lazaric:
Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret. NeurIPS 2021: 6843-6855 - [c71]Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric:
A Provably Efficient Sample Collection Strategy for Reinforcement Learning. NeurIPS 2021: 7611-7624 - [c70]Ran Liu, Mehdi Azabou, Max Dabagia, Chi-Heng Lin, Mohammad Gheshlaghi Azar, Keith B. Hengen, Michal Valko, Eva L. Dyer:
Drop, Swap, and Generate: A Self-Supervised Approach for Generating Neural Activity. NeurIPS 2021: 10587-10599 - [c69]Tadashi Kozuno, Pierre Ménard, Rémi Munos, Michal Valko:
Learning in two-player zero-sum partially observable Markov games with perfect recall. NeurIPS 2021: 11987-11998 - [i56]Pierre Perrault, Jennifer Healey, Zheng Wen, Michal Valko:
On the Approximation Relationship between Optimizing Ratio of Submodular (RS) and Difference of Submodular (DS) Functions. CoRR abs/2101.01631 (2021) - [i55]Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Alaa Saade, Shantanu Thakoor, Bilal Piot, Bernardo Ávila Pires, Michal Valko, Thomas Mesnard, Tor Lattimore, Rémi Munos:
Geometric Entropic Exploration. CoRR abs/2101.02055 (2021) - [i54]Shantanu Thakoor, Corentin Tallec, Mohammad Gheshlaghi Azar, Rémi Munos, Petar Velickovic, Michal Valko:
Bootstrapped Representation Learning on Graphs. CoRR abs/2102.06514 (2021) - [i53]Mehdi Azabou, Mohammad Gheshlaghi Azar, Ran Liu, Chi-Heng Lin, Erik C. Johnson, Kiran Bhaskaran-Nair, Max Dabagia, Keith B. Hengen, William R. Gray Roncal, Michal Valko, Eva L. Dyer:
Mine Your Own vieW: Self-Supervised Learning Through Across-Sample Prediction. CoRR abs/2102.10106 (2021) - [i52]Tadashi Kozuno, Yunhao Tang, Mark Rowland, Rémi Munos, Steven Kapturowski, Will Dabney, Michal Valko, David Abel:
Revisiting Peng's Q(λ) for Modern Reinforcement Learning. CoRR abs/2103.00107 (2021) - [i51]Pierre Ménard, Omar Darwiche Domingues, Xuedong Shang, Michal Valko:
UCB Momentum Q-learning: Correcting the bias without forgetting. CoRR abs/2103.01312 (2021) - [i50]Adrià Recasens, Pauline Luc, Jean-Baptiste Alayrac, Luyu Wang, Florian Strub, Corentin Tallec, Mateusz Malinowski, Viorica Patraucean, Florent Altché, Michal Valko, Jean-Bastien Grill, Aäron van den Oord, Andrew Zisserman:
Broaden Your Views for Self-Supervised Video Learning. CoRR abs/2103.16559 (2021) - [i49]Jean Tarbouriech, Runlong Zhou, Simon S. Du, Matteo Pirotta, Michal Valko, Alessandro Lazaric:
Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret. CoRR abs/2104.11186 (2021) - [i48]Yunhao Tang, Mark Rowland, Rémi Munos, Michal Valko:
Taylor Expansion of Discount Factors. CoRR abs/2106.06170 (2021) - [i47]Tadashi Kozuno, Pierre Ménard, Rémi Munos, Michal Valko:
Model-Free Learning for Two-Player Zero-Sum Partially Observable Markov Games with Perfect Recall. CoRR abs/2106.06279 (2021) - [i46]Yunhao Tang, Tadashi Kozuno, Mark Rowland, Rémi Munos, Michal Valko:
Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation. CoRR abs/2106.13125 (2021) - [i45]Ran Liu, Mehdi Azabou, Max Dabagia, Chi-Heng Lin, Mohammad Gheshlaghi Azar, Keith B. Hengen, Michal Valko, Eva L. Dyer:
Drop, Swap, and Generate: A Self-Supervised Approach for Generating Neural Activity. CoRR abs/2111.02338 (2021) - [i44]Jean Tarbouriech, Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Michal Valko, Alessandro Lazaric:
Adaptive Multi-Goal Exploration. CoRR abs/2111.12045 (2021) - 2020
- [c68]Xuedong Shang, Rianne de Heide, Pierre Ménard, Emilie Kaufmann, Michal Valko:
Fixed-confidence guarantees for Bayesian best-arm identification. AISTATS 2020: 1823-1832 - [c67]Haitham Ammar, Victor Gabillon, Rasul Tutunov, Michal Valko:
Derivative-Free & Order-Robust Optimisation. AISTATS 2020: 2293-2303 - [c66]Côme Fiegel, Victor Gabillon, Michal Valko:
Adaptive multi-fidelity optimization with fast learning rates. AISTATS 2020: 3493-3502 - [c65]Julien Seznec, Pierre Ménard, Alessandro Lazaric, Michal Valko:
A single algorithm for both restless and rested rotting bandits. AISTATS 2020: 3784-3794 - [c64]Pierre Perrault, Michal Valko, Vianney Perchet:
Covariance-adapting algorithm for semi-bandits with application to sparse outcomes. COLT 2020: 3152-3184 - [c63]Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco:
Near-linear time Gaussian process optimization with adaptive batching and resparsification. ICML 2020: 1295-1305 - [c62]Rémy Degenne, Pierre Ménard, Xuedong Shang, Michal Valko:
Gamification of Pure Exploration for Linear Bandits. ICML 2020: 2432-2442 - [c61]Anne Gael Manegueu, Claire Vernade, Alexandra Carpentier, Michal Valko:
Stochastic bandits with arm-dependent delays. ICML 2020: 3348-3356 - [c60]Jean-Bastien Grill, Florent Altché, Yunhao Tang, Thomas Hubert, Michal Valko, Ioannis Antonoglou, Rémi Munos:
Monte-Carlo Tree Search as Regularized Policy Optimization. ICML 2020: 3769-3778 - [c59]Pierre Perrault, Jennifer Healey, Zheng Wen, Michal Valko:
Budgeted Online Influence Maximization. ICML 2020: 7620-7631 - [c58]Aadirupa Saha, Pierre Gaillard, Michal Valko:
Improved Sleeping Bandits with Stochastic Action Sets and Adversarial Rewards. ICML 2020: 8357-8366 - [c57]Yunhao Tang, Michal Valko, Rémi Munos:
Taylor Expansion Policy Optimization. ICML 2020: 9397-9406 - [c56]Jean Tarbouriech, Evrard Garcelon, Michal Valko, Matteo Pirotta, Alessandro Lazaric:
No-Regret Exploration in Goal-Oriented Reinforcement Learning. ICML 2020: 9428-9437 - [c55]Daniele Calandriello, Michal Derezinski, Michal Valko:
Sampling from a k-DPP without looking at all items. NeurIPS 2020 - [c54]Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Ávila Pires, Zhaohan Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko:
Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning. NeurIPS 2020 - [c53]Anders Jonsson, Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Edouard Leurent, Michal Valko:
Planning in Markov Decision Processes with Gap-Dependent Sample Complexity. NeurIPS 2020 - [c52]Pierre Perrault, Etienne Boursier, Michal Valko, Vianney Perchet:
Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits. NeurIPS 2020 - [c51]Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric:
Improved Sample Complexity for Incremental Autonomous Exploration in MDPs. NeurIPS 2020 - [i43]Daniele Calandriello, Luigi Carratino, Alessandro Lazaric, Michal Valko, Lorenzo Rosasco:
Near-linear Time Gaussian Process Optimization with Adaptive Batching and Resparsification. CoRR abs/2002.09954 (2020) - [i42]Yunhao Tang, Michal Valko, Rémi Munos:
Taylor Expansion Policy Optimization. CoRR abs/2003.06259 (2020) - [i41]Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko:
Regret Bounds for Kernel-Based Reinforcement Learning. CoRR abs/2004.05599 (2020) - [i40]Aadirupa Saha, Pierre Gaillard, Michal Valko:
Improved Sleeping Bandits with Stochastic Actions Sets and Adversarial Rewards. CoRR abs/2004.06248 (2020) - [i39]Anders Jonsson, Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Edouard Leurent, Michal Valko:
Planning in Markov Decision Processes with Gap-Dependent Sample Complexity. CoRR abs/2006.05879 (2020) - [i38]Emilie Kaufmann, Pierre Ménard, Omar Darwiche Domingues, Anders Jonsson, Edouard Leurent, Michal Valko:
Adaptive Reward-Free Exploration. CoRR abs/2006.06294 (2020) - [i37]Pierre Perrault, Etienne Boursier, Vianney Perchet, Michal Valko:
Statistical Efficiency of Thompson Sampling for Combinatorial Semi-Bandits. CoRR abs/2006.06613 (2020) - [i36]Jean-Bastien Grill, Florian Strub, Florent Altché, Corentin Tallec, Pierre H. Richemond, Elena Buchatskaya, Carl Doersch, Bernardo Ávila Pires, Zhaohan Daniel Guo, Mohammad Gheshlaghi Azar, Bilal Piot, Koray Kavukcuoglu, Rémi Munos, Michal Valko:
Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning. CoRR abs/2006.07733 (2020) - [i35]Anne Gael Manegueu, Claire Vernade, Alexandra Carpentier, Michal Valko:
Stochastic bandits with arm-dependent delays. CoRR abs/2006.10459 (2020) - [i34]Daniele Calandriello, Michal Derezinski, Michal Valko:
Sampling from a k-DPP without looking at all items. CoRR abs/2006.16947 (2020) - [i33]Rémy Degenne, Pierre Ménard, Xuedong Shang, Michal Valko:
Gamification of Pure Exploration for Linear Bandits. CoRR abs/2007.00953 (2020) - [i32]Omar Darwiche Domingues, Pierre Ménard, Matteo Pirotta, Emilie Kaufmann, Michal Valko:
A Kernel-Based Approach to Non-Stationary Reinforcement Learning in Metric Spaces. CoRR abs/2007.05078 (2020) - [i31]Jean Tarbouriech, Matteo Pirotta, Michal Valko, Alessandro Lazaric:
A Provably Efficient Sample Collection Strategy for Reinforcement Learning. CoRR abs/2007.06437 (2020) - [i30]