


default search action
PPoPP 2025: Las Vegas, NV, USA
- Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, PPoPP 2025, Las Vegas, NV, USA, March 1-5, 2025. ACM 2025, ISBN 979-8-4007-1443-6
Keynote
- Charles E. Leiserson
:
Setting a Course for Post-Moore Software Performance. 1
Graph Neural Networks
- Jie Sun
, Zuocheng Shi
, Li Su
, Wenting Shen
, Zeke Wang
, Yong Li
, Wenyuan Yu
, Wei Lin
, Fei Wu
, Bingsheng He
, Jingren Zhou
:
Helios: Efficient Distributed Dynamic Graph Sampling for Online GNN Inference. 2-15 - Jou-An Chen
, Hsin-Hsuan Sung
, Ruifeng Zhang
, Ang Li
, Xipeng Shen
:
Accelerating GNNs on GPU Sparse Tensor Cores through N: M Sparsity-Oriented Graph Reordering. 16-28 - Kaihao Ma
, Renjie Liu
, Xiao Yan
, Zhenkun Cai
, Xiang Song
, Minjie Wang
, Yichao Li
, James Cheng
:
Adaptive Parallel Training for Graph Neural Networks. 29-42
GPU I
- Vani Nagarajan
, Rohan Gangaraju
, Kirshanthan Sundararajah
, Artem Pelenitsyn
, Milind Kulkarni
:
RT-BarnesHut: Accelerating Barnes-Hut Using Ray-Tracing Hardware. 43-56 - Anna Yue
, Pen-Chung Yew
, Sanyam Mehta
:
EVeREST: An Effective and Versatile Runtime Energy Saving Tool for GPUs. 57-69 - Shixun Wu
, Yujia Zhai
, Jinyang Liu
, Jiajun Huang
, Zizhe Jian
, Huangliang Dai
, Sheng Di
, Franck Cappello
, Zizhong Chen
:
TurboFFT: Co-Designed High-Performance and Fault-Tolerant Fast Fourier Transform on GPUs. 70-84
Concurrent Data Structures and Synchronization I
- Dave Dice
, Alex Kogan
:
Reciprocating Locks. 85-98 - Younghun Roh
, Yuanhao Wei
, Eric Ruppert
, Panagiota Fatourou
, Siddhartha Jayanti
, Julian Shun
:
Aggregating Funnels for Faster Fetch&Add and Queues. 99-114 - Takashi Hoshino
, Kenjiro Taura
:
Fairer and More Scalable Reader-Writer Locks by Optimizing Queue Management. 115-127 - Ajay Singh
, Trevor Brown
:
Publish on Ping: A Better Way to Publish Reservations in Memory Reclamation for Concurrent Data Structures. 128-141
Memory
- Fulin Nan
, Ronglong Wu
, Zhirong Shen
, Jiahui Yang
, Li Cheng
, Zheng Chen
, Yiming Zhang
, Jiwu Shu
:
AC-Cache: A Memory-Efficient Caching System for Small Objects via Exploiting Access Correlations. 142-155 - Yun Wang
, Liang Chen
, Tianmai Deng
, Ben Luo
, Yibin Shen
, Zhixiang Wei
, Yixiao Xu
, Minglang Huang
, Zhengwei Qi
:
Effectively Virtual Page Prefetching via Spatial-Temporal Patterns for Memory-intensive Cloud Applications. 156-169 - Hulin Wang
, Yaqi Xia
, Donglin Yang
, Xiaobo Zhou
, Dazhao Cheng
:
Harnessing Inter-GPU Shared Memory for Seamless MoE Communication-Computation Fusion. 170-182
Deep Neural Networks
- Runxin Zhong
, Yuyang Jin
, Chen Zhang
, Kinman Lei
, Shuangyu Li
, Jidong Zhai
:
FlashTensor: Optimizing Tensor Programs by Leveraging Fine-grained Tensor Property. 183-196 - Weijian Liu
, Mingzhen Li
, Guangming Tan
, Weile Jia
:
Mario: Near Zero-cost Activation Checkpointing in Pipeline Parallelism. 197-211 - Baixi Sun
, Weijin Liu
, J. Gregory Pauloski
, Jiannan Tian
, Jinda Jia
, Daoce Wang
, Boyuan Zhang
, Mingkai Zheng
, Sheng Di
, Sian Jin
, Zhao Zhang
, Xiaodong Yu
, Kamil A. Iskra
, Pete Beckman
, Guangming Tan
, Dingwen Tao
:
COMPSO: Optimizing Gradient Compression for Distributed Training with Second-Order Optimizers. 212-224
Large Language Models
- Junfeng Lin
, Ziming Liu
, Yang You
, Jun Wang
, Weihao Zhang
, Rong Zhao
:
WeiPipe: Weight Pipeline Parallelism for Communication-Effective Long-Context Large Model Training. 225-238 - Elias Frantar
, Roberto L. Castro
, Jiale Chen
, Torsten Hoefler
, Dan Alistarh
:
MARLIN: Mixed-Precision Auto-Regressive Parallel Inference on Large Language Models. 239-251 - Yuhang Liang, Xinyi Li, Jie Ren, Ang Li, Bo Fang, Jieyang Chen:
ATTNChecker: Highly-Optimized Fault Tolerant Attention for Large Language Model Training. 252-266
Scheduling and Resource Management
- Yongkang Zhang
, Haoxuan Yu
, Chenxia Han
, Cheng Wang
, Baotong Lu
, Yunzhe Li
, Zhifeng Jiang
, Yang Li
, Xiaowen Chu
, Huaicheng Li
:
SGDRC: Software-Defined Dynamic Resource Control for Concurrent DNN Inference on NVIDIA GPUs. 267-281 - Zhengqing Liu
, Musa Unal
, Matthew J. Parkinson
, Marios Kogias
:
DORADD: Deterministic Parallel Execution in the Era of Microsecond-Scale Computing. 282-296 - Yankai Jiang
, Rohan Basu Roy
, Raghavendra Kanakagiri
, Devesh Tiwari
:
WaterWise: Co-optimizing Carbon- and Water-Footprint Toward Environmentally Sustainable Cloud Computing. 297-311
Tensor Cores
- Jinliang Shi
, Shigang Li
, Youxuan Xu
, Rongtian Fu
, Xueying Wang
, Tong Wu
:
FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores. 312-325 - Haisha Zhao
, San Li
, Jiaheng Wang
, Chunbao Zhou
, Jue Wang
, Zhikuang Xin
, Shunde Li
, Zhiqiang Liang
, Zhijie Pan
, Fang Liu
, Yan Zeng
, Yangang Wang
, Xuebin Chi
:
Acc-SpMM: Accelerating General-purpose Sparse Matrix-Matrix Multiplication with GPU Tensor Cores. 326-338 - Yuyao Niu
, Marc Casas
:
BerryBees: Breadth First Search by Bit-Tensor-Cores. 339-354 - Haozhi Han
, Kun Li
, Wei Cui
, Donglin Bai
, Yiwei Zhang
, Liang Yuan
, Yifeng Chen
, Yunquan Zhang
, Ting Cao
, Mao Yang
:
FlashFFTStencil: Bridging Fast Fourier Transforms to Memory-Efficient Stencil Computations on Tensor Core Units. 355-368
Concurrent Data Structures and Synchronization II
- Xizhe Yin
, Chao Gao
, Zhijia Zhao
, Rajiv Gupta
:
PANNS: Enhancing Graph-based Approximate Nearest Neighbor Search through Recency-aware Construction and Parameterized Search. 369-381 - Kåre von Geijer
, Philippas Tsigas
, Elias Johansson
, Sebastian Hermansson
:
Balanced Allocations over Efficient Queues: A Fast Relaxed FIFO Queue. 382-395 - Liang Geng
, Rubao Lee
, Xiaodong Zhang
:
LibRTS: A Spatial Indexing Library by Ray Tracing. 396-411 - Hao Wang
, Minghao Pan
, Jiaping Wang
:
Crystality: A Programming Model for Smart Contracts on Parallel EVMs. 412-425
GPU II
- Julian Bellavita
, Thomas Pasquali
, Laura Del Rio Martin
, Flavio Vella
, Giulia Guidi
:
Popcorn: Accelerating Kernel K-means on GPUs through Sparse Linear Algebra. 426-440 - Zhibin Wang
, Xi Lin
, Xue Li
, Pinhuan Wang
, Ziheng Meng
, Hang Liu
, Chen Tian
, Sheng Zhong
:
Swift Unfolding of Communities: GPU-Accelerated Louvain Algorithm. 441-454 - Weichen Cao
, Ke Meng
, Zhiheng Lin
, Guangming Tan
:
GLumin: Fast Connectivity Check Based on LUTs For Efficient Graph Pattern Mining. 455-468 - Hansheng Wang
, Zhekai Duan
, Zitian Zhao
, Siqi Wu
, Saiqi Zheng
, Qiao Li
, Xu Jiang
, Shaoshuai Zhang
:
Improving Tridiagonalization Performance on GPU Architectures. 469-480
Parallel Algorithms and Applications
- Yiwei Zhang
, Kun Li
, Liang Yuan
, Haozhi Han
, Yunquan Zhang
, Ting Cao
, Mao Yang
:
Jigsaw: Toward Conflict-free Vectorized Stencil Computation by Tessellating Swizzled Registers. 481-495 - Yi Zong
, Chensong Zhang
, Longjiang Mu
, Jianchun Wang
, Jian Sun
, Xiaowen Xu
, Xinliang Wang
, Peinan Yu
, Wei Xue
:
Semi-StructMG: A Fast and Scalable Semi-Structured Algebraic Multigrid. 496-511 - Weicong Chen
, Hao Qi
, Curtis Tatsuoka
, Xiaoyi Lu
:
SBMGT: Scaling Bayesian Multinomial Group Testing. 512-523 - Xiaohui Duan
, Yi Zhang
, Kai Xu
, Haohuan Fu
, Bin Yang
, Yiming Wang
, Yilun Han
, Siyuan Chen
, Zhuangzhuang Zhou
, Chenyu Wang
, Dongqiang Huang
, Huihai An
, Xiting Ju
, Haopeng Huang
, Zhuang Liu
, Wei Xue
, Weiguo Liu
, Bowen Yan
, Jianye Hou
, Maoxue Yu
, Wenguang Chen
, Jian Li
, Zhao Jing
, Hailong Liu
, Lixin Wu
:
An AI-Enhanced 1km-Resolution Seamless Global Weather and Climate Model to Achieve Year-Scale Simulation Speed using 34 Million Cores. 524-538
POSTER SESSION: Posters
- Daniel Anderson
, Guy E. Blelloch
, Siddhartha V. Jayanti
:
Big Atomics and Fast Hash Tables. 539-541 - Xinmiao Zhang
, Cheng Liu
, Shengwen Liang
, Chenwei Xiong
, Yu Zhang
, Lei Zhang
, Huawei Li
, Xiaowei Li
:
Frontier-guided Graph Reordering. 542-544 - Yaodong Sheng
, Ahmed Hassan
, Michael F. Spear
:
Transactional Data Structures with Orthogonal Metadata. 545-547 - Ao Li, Wenhai Li, Yuan Chen
, Lingfeng Deng:
Boost Lock-free Queue and Stack with Batching. 548-550 - Yucheng Ouyang, Ying Liu, Honghui Shang, Zhenchuan Chen, Jiahao Shan, Huimin Cui, Xiaobing Feng, Xin Chen, Xingyu Gao, Lifang Wang, Haifeng Song, Xin Chen, Rongfen Lin, Fang Li:
TensorMD: Molecular Dynamics Simulation with Ab Initio Accuracy of 50 Billion Atoms. 551-553 - Zhonghai Zhang
, Yewen Li
, Ke Meng
, Chunming Zhang
, Guangming Tan
:
FastBWA: Practical and Cost-Efficient Genome Sequence Alignment Pipeline. 554-556 - Boyuan Zhang
, Luanzheng Guo
, Jiannan Tian
, Jinyang Liu
, Daoce Wang
, Fanjiang Ye
, Chengming Zhang
, Jan Strube
, Nathan R. Tallent
, Dingwen Tao
:
High-performance Visual Semantics Compression for AI-Driven Science. 557-559 - YuAng Chen
, Jeffrey Xu Yu
:
Triangle Counting on Tensor Cores. 560-562 - Zhanyuan Di
, Leping Wang
, Ziyi Ren
, En Shao
, Jie Zhao
, Siyuan Feng
, Dingwen Tao
, Guangming Tan
, Ninghui Sun
:
Magneto: Accelerating Parallel Structures in DNNs via Co-Optimization of Operators. 563-565 - Chen Zhuang
, Peng Chen
, Xin Liu
, Rio Yokota
, Nikoli Dryden
, Lingqi Zhang
, Toshio Endo
, Satoshi Matsuoka
, Mohamed Wahib
:
A General and Scalable GCN Training Framework on CPU Supercomputers. 566-568 - Angelo Borsotti
, Luca Breveglieri
, Angelo Morzenti
, Stefano Crespi-Reghizzi
:
Minimizing speculation overhead in a parallel recognizer for regular texts. 569-572

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.