Daniil Tiapkin

@hse.ru

HSE University

Daniil Tiapkin

EDUCATION

MSc in Computer Science 2021-2023: HSE University, specialization: Math of Machine Learning
BSc in Computer Science 2017-2021: HSE Unversity, specialization: Theoretical Computer Science

RESEARCH INTERESTS

Reinforcement learning, stochastic optimization
407

Scholar Citations

11

Scholar h-index

13

Scholar i10-index

RECENT SCHOLAR PUBLICATIONS

  • Learning Shortest Paths with Generative Flow Networks
    N Morozov, I Maksimov, D Tiapkin, S Samsonov
    arXiv preprint arXiv:2603.01786 , 2026
    2026
    Citations: 1
  • Beyond Softmax and Entropy: Convergence Rates of Policy Gradients with -SoftArgmax Parameterization Coupled Regularization
    S Labbi, D Tiapkin, P Mangold, E Moulines
    The Fourteenth International Conference on Learning Representations , 2026
    2026
    Citations: 1
  • On Global Convergence Rates for Federated Softmax Policy Gradient under Heterogeneous Environments
    S Labbi, P Mangold, D Tiapkin, E Moulines
    The 29th International Conference on Artificial Intelligence and Statistics , 2026
    2026
    Citations: 5
  • gfnx: Fast and Scalable Library for Generative Flow Networks in JAX
    D Tiapkin, A Agarkov, N Morozov, I Maksimov, A Tsyganov, T Gritsaev, ...
    arXiv preprint arXiv:2511.16592 , 2025
    2025
    Citations: 2
  • Sample-Efficient Reinforcement Learning: Exploration, Imitation, and Online Learning
    D Tiapkin
    Institut polytechnique de Paris , 2025
    2025
  • Adaptive Destruction Processes for Diffusion Samplers
    T Gritsaev, N Morozov, K Tamogashev, D Tiapkin, S Samsonov, ...
    arXiv preprint arXiv:2506.01541 , 2025
    2025
    Citations: 6
  • Finite-Sample Convergence Bounds for Trust Region Policy Optimization in Mean-Field Games
    A Ocello, D Tiapkin, L Mancini, M Lauriere, E Moulines
    arXiv preprint arXiv:2505.22781 , 2025
    2025
    Citations: 3
  • Proximal Point Nash Learning from Human Feedback
    D Tiapkin, D Calandriello, D Belomestny, E Moulines, A Naumov, K Rasul, ...
    arXiv preprint arXiv:2505.19731v2 , 2025
    2025
    Citations: 8
  • Optimizing backward policies in GFlownets via trajectory likelihood maximization
    T Gritsaev, N Morozov, S Samsonov, D Tiapkin
    International Conference on Learning Representations 2025, 98281-98301 , 2025
    2025
    Citations: 6
  • Revisiting Non-Acyclic GFlowNets in Discrete Environments
    N Morozov, I Maksimov, D Tiapkin, S Samsonov
    arXiv preprint arXiv:2502.07735 , 2025
    2025
    Citations: 9
  • On Teacher Hacking in Language Model Distillation
    D Tiapkin, D Calandriello, J Ferret, S Perrin, N Vieillard, A Ramé, ...
    arXiv preprint arXiv:2502.02671 , 2025
    2025
    Citations: 6
  • Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents
    S Labbi, D Tiapkin, L Mancini, P Mangold, E Moulines
    arXiv preprint arXiv:2410.22908 , 2024
    2024
    Citations: 7
  • A New Bound on the Cumulant Generating Function of Dirichlet Processes
    P Perrault, D Belomestny, P Ménard, É Moulines, A Naumov, D Tiapkin, ...
    arXiv preprint arXiv:2409.18621 , 2024
    2024
    Citations: 2
  • Narrowing the Gap between Adversarial and Stochastic MDPs via Policy Optimization
    D Tiapkin, E Chzhen, G Stoltz
    arXiv preprint arXiv:2407.05704 , 2024
    2024
    Citations: 3
  • Improved High-Probability Bounds for the Temporal Difference Learning Algorithm via Exponential Stability
    S Samsonov, D Tiapkin, A Naumov, E Moulines
    The Thirty Seventh Annual Conference on Learning Theory, 4511-4547 , 2024
    2024
    Citations: 29
  • Improving GFlowNets with Monte Carlo Tree Search
    N Morozov, D Tiapkin, S Samsonov, A Naumov, D Vetrov
    arXiv preprint arXiv:2406.13655 , 2024
    2024
    Citations: 11
  • Incentivized Learning in Principal-Agent Bandit Games
    A Scheid, D Tiapkin, E Boursier, A Capitaine, EME Mhamdi, É Moulines, ...
    arXiv preprint arXiv:2403.03811 , 2024
    2024
    Citations: 23
  • Model-free posterior sampling via learning rate randomization
    D Tiapkin, D Belomestny, D Calandriello, E Moulines, R Munos, ...
    Advances in Neural Information Processing Systems 36, 73719-73774 , 2023
    2023
    Citations: 10
  • Demonstration-Regularized RL
    D Tiapkin, D Belomestny, D Calandriello, E Moulines, A Naumov, ...
    ICLR-2024 , 2023
    2023
    Citations: 18
  • Generative Flow Networks as Entropy-Regularized RL
    D Tiapkin, N Morozov, A Naumov, D Vetrov
    AISTATS-2024 , 2023
    2023
    Citations: 62

MOST CITED SCHOLAR PUBLICATIONS

  • Generative Flow Networks as Entropy-Regularized RL
    D Tiapkin, N Morozov, A Naumov, D Vetrov
    AISTATS-2024 , 2023
    2023
    Citations: 62
  • Fast Rates for Maximum Entropy Exploration
    D Tiapkin, D Belomestny, D Calandriello, E Moulines, R Munos, ...
    International Conference on Machine Learning , 2023
    2023
    Citations: 47
  • From Dirichlet to Rubin: Optimistic Exploration in RL without Bonuses
    D Tiapkin, D Belomestny, E Moulines, A Naumov, S Samsonov, Y Tang, ...
    International Conference on Machine Learning, 21380-21431 , 2022
    2022
    Citations: 31
  • Improved High-Probability Bounds for the Temporal Difference Learning Algorithm via Exponential Stability
    S Samsonov, D Tiapkin, A Naumov, E Moulines
    The Thirty Seventh Annual Conference on Learning Theory, 4511-4547 , 2024
    2024
    Citations: 29
  • Improved complexity bounds in wasserstein barycenter problem
    D Dvinskikh, D Tiapkin
    International Conference on Artificial Intelligence and Statistics, 1738-1746 , 2021
    2021
    Citations: 28
  • Orthogonal Directions Constrained Gradient Method: from non-linear equality constraints to Stiefel manifold
    S Schechtman, D Tiapkin, M Muehlebach, E Moulines
    The Thirty Sixth Annual Conference on Learning Theory, 1228-1258 , 2023
    2023
    Citations: 26
  • Incentivized Learning in Principal-Agent Bandit Games
    A Scheid, D Tiapkin, E Boursier, A Capitaine, EME Mhamdi, É Moulines, ...
    arXiv preprint arXiv:2403.03811 , 2024
    2024
    Citations: 23
  • Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees
    D Tiapkin, D Belomestny, D Calandriello, E Moulines, R Munos, ...
    Neural Information Processing Systems , 2022
    2022
    Citations: 20
  • Demonstration-Regularized RL
    D Tiapkin, D Belomestny, D Calandriello, E Moulines, A Naumov, ...
    ICLR-2024 , 2023
    2023
    Citations: 18
  • Primal-Dual Stochastic Mirror Descent for MDPs
    D Tiapkin, A Gasnikov
    International Conference on Artificial Intelligence and Statistics, 9723-9740 , 2022
    2022
    Citations: 18
  • Stochastic saddle-point optimization for the Wasserstein barycenter problem
    D Tiapkin, A Gasnikov, P Dvurechensky
    Optimization Letters 16 (7), 2145-2175 , 2022
    2022
    Citations: 15
  • Improving GFlowNets with Monte Carlo Tree Search
    N Morozov, D Tiapkin, S Samsonov, A Naumov, D Vetrov
    arXiv preprint arXiv:2406.13655 , 2024
    2024
    Citations: 11
  • Model-free posterior sampling via learning rate randomization
    D Tiapkin, D Belomestny, D Calandriello, E Moulines, R Munos, ...
    Advances in Neural Information Processing Systems 36, 73719-73774 , 2023
    2023
    Citations: 10
  • Revisiting Non-Acyclic GFlowNets in Discrete Environments
    N Morozov, I Maksimov, D Tiapkin, S Samsonov
    arXiv preprint arXiv:2502.07735 , 2025
    2025
    Citations: 9
  • Proximal Point Nash Learning from Human Feedback
    D Tiapkin, D Calandriello, D Belomestny, E Moulines, A Naumov, K Rasul, ...
    arXiv preprint arXiv:2505.19731v2 , 2025
    2025
    Citations: 8
  • Federated UCBVI: Communication-Efficient Federated Regret Minimization with Heterogeneous Agents
    S Labbi, D Tiapkin, L Mancini, P Mangold, E Moulines
    arXiv preprint arXiv:2410.22908 , 2024
    2024
    Citations: 7
  • Adaptive Destruction Processes for Diffusion Samplers
    T Gritsaev, N Morozov, K Tamogashev, D Tiapkin, S Samsonov, ...
    arXiv preprint arXiv:2506.01541 , 2025
    2025
    Citations: 6
  • Optimizing backward policies in GFlownets via trajectory likelihood maximization
    T Gritsaev, N Morozov, S Samsonov, D Tiapkin
    International Conference on Learning Representations 2025, 98281-98301 , 2025
    2025
    Citations: 6
  • On Teacher Hacking in Language Model Distillation
    D Tiapkin, D Calandriello, J Ferret, S Perrin, N Vieillard, A Ramé, ...
    arXiv preprint arXiv:2502.02671 , 2025
    2025
    Citations: 6
  • On Global Convergence Rates for Federated Softmax Policy Gradient under Heterogeneous Environments
    S Labbi, P Mangold, D Tiapkin, E Moulines
    The 29th International Conference on Artificial Intelligence and Statistics , 2026
    2026
    Citations: 5