


default search action
35th IPDPS 2021: Portland, OR, USA
- 35th IEEE International Parallel and Distributed Processing Symposium, IPDPS 2021, Portland, OR, USA, May 17-21, 2021. IEEE 2021, ISBN 978-1-6654-4066-0

- Ilkay Altintas:

A Tale of Two C's: Convergence and Composability. 1 - Alessio Netti, Daniele Tafani, Michael Ott, Martin Schulz

:
Correlation-wise Smoothing: Lightweight Knowledge Extraction for HPC Monitoring Data. 2-12 - Jinyoung Choi, Sergey Blagodurov, Hung-Wei Tseng

:
Dancing in the Dark: Profiling for Tiered Memory. 13-22 - Marcus Ritter

, Alexander Geiß
, Johannes Wehrstein, Alexandru Calotoiu, Thorsten Reimann
, Torsten Hoefler, Felix Wolf:
Noise-Resilient Empirical Performance Modeling with Deep Neural Networks. 23-34 - Srinivasan Ramesh, Allen D. Malony, Philip H. Carns, Robert B. Ross, Matthieu Dorier

, Jérome Soumagne, Shane Snyder:
SYMBIOSYS: A Methodology for Performance Analysis of Composable HPC Data Services. 35-45 - Edward Hutter, Edgar Solomonik:

Accelerating Distributed-Memory Autotuning via Statistical Analysis of Execution Paths. 46-57 - Thomas B. Rolinger, Christopher D. Krieger, Alan Sussman

:
Optimizing Memory-Compute Colocation for Irregular Applications on a Migratory Thread Architecture. 58-67 - Yuyao Niu, Zhengyang Lu, Meichen Dong, Zhou Jin, Weifeng Liu

, Guangming Tan:
TileSpMV: A Tiled Algorithm for Sparse Matrix-Vector Multiplication on GPUs. 68-78 - Qinglei Cao

, Yu Pei, Kadir Akbudak
, George Bosilca, Hatem Ltaief
, David E. Keyes, Jack J. Dongarra:
Leveraging PaRSEC Runtime Support to Tackle Challenging 3D Data-Sparse Matrix Problems. 79-89 - Md Taufique Hussain, Oguz Selvitopi, Aydin Buluç

, Ariful Azad:
Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme Scale. 90-100 - Weiling Yang, Jianbin Fang, Dezun Dong:

Characterizing Small-Scale Matrix Multiplications on ARMv8-based Many-Core Architectures. 101-110 - Alberto Parravicini, Arnaud Delamare, Marco Arnaboldi, Marco D. Santambrogio:

DAG-based Scheduling with Resource Sharing for Multi-task Applications in a Polyglot GPU Runtime. 111-120 - Zhuoran Ji

, Cho-Li Wang:
CTXBack: Enabling Low Latency GPU Context Switching via Context Flashback. 121-130 - Nelson Mimura Gonzalez, Tonia Elengikal:

Transparent I/O-Aware GPU Virtualization for Efficient Resource Consolidation. 131-140 - Tyler N. Allen

, Rong Ge:
Demystifying GPU UVM Cost with Deep Runtime and Workload Analysis. 141-150 - Minjia Zhang, Zehua Hu, Mingqin Li:

DUET: A Compiler-Runtime Subgraph Scheduling Approach for Tensor Programs on a Coupled CPU-GPU Architecture. 151-161 - Suzhen Wu, Chunfeng Du, Haijun Li, Hong Jiang, Zhirong Shen, Bo Mao:

CAGC: A Content-aware Garbage Collection Scheme for Ultra-Low Latency Flash-based SSDs. 162-171 - Shashank Gugnani, Tianxi Li, Xiaoyi Lu:

NVMe-CR: A Scalable Ephemeral Storage Runtime for Checkpoint/Restart with NVMe-over-Fabrics. 172-181 - Qinzhe Wu, Jonathan Beard, Ashen Ekanayake, Andreas Gerstlauer, Lizy K. John:

Virtual-Link: A Scalable Multi-Producer Multi-Consumer Message Queue Architecture for Cross-Core Communication. 182-191 - Vito Giovanni Castellana, Antonino Tumeo, Fabrizio Ferrandi

:
High-Level Synthesis of Parallel Specifications Coupling Static and Dynamic Controllers. 192-202 - Ryan E. Grant, Michael J. Levenhagen, Matthew G. F. Dosanjh, Patrick M. Widener

:
RVMA: Remote Virtual Memory Access. 203-212 - Michael S. Gilbert, Seher Acer, Erik G. Boman, Kamesh Madduri, Sivasankaran Rajamanickam:

Performance-Portable Graph Coarsening for Efficient Multilevel Graph Analysis. 213-222 - John Augustine, Kishore Kothapalli, Gopal Pandurangan

:
Efficient Distributed Algorithms in the k-machine model via PRAM Simulations. 223-232 - Adam Polak

, Adrian Siwiec, Michal Stobierski:
Euler Meets GPU: Practical Graph Algorithms with Theoretical Guarantees. 233-244 - Kiran Kumar Matam, Hanieh Hashemi, Murali Annavaram:

MultiLogVC: Efficient Out-of-Core Graph Processing Framework for Flash Storage. 245-255 - Md. Khaledur Rahman, Majedul Haque Sujon, Ariful Azad:

FusedMM: A Unified SDDMM-SpMM Kernel for Graph Embedding and Graph Neural Networks. 256-266 - Anwesha Das, Frank Mueller, Barry Rountree:

Systemic Assessment of Node Failures in HPC Production Platforms. 267-276 - Masoud Gholami, Florian Schintke:

Combining XOR and Partner Checkpointing for Resilient Multilevel Checkpoint/Restart. 277-288 - Fernando Fernandes dos Santos, Siva Kumar Sastry Hari, Pedro Martins Basso, Luigi Carro, Paolo Rech:

Demystifying GPU Reliability: Comparing and Combining Beam Experiments, Fault Simulation, and Profiling. 289-298 - Alvaro Frank, Manuel Baumgartner

, Reza Salkhordeh, André Brinkmann:
Improving checkpointing intervals by considering individual job failure probabilities. 299-309 - Nicholas Gordon, John R. Lange:

Covirt: Lightweight Fault Isolation and Resource Protection for Co-Kernels. 310-319 - Daniel C. Wilson, Siddhartha Jana, Aniruddha Marathe

, Stephanie Brink, Christopher M. Cantalupo, Diana R. Guttman, Brad Geltz, Lowren H. Lawson, Asma H. Al-Rawi, Ali Mohammad, Fuat Keceli, Federico Ardanaz, Jonathan M. Eastep, Ayse K. Coskun:
Introducing Application Awareness Into a Unified Power Management Stack. 320-329 - Jinsu Park, Seongbeom Park, Myeonggyun Han, Woongki Baek:

PALM: Progress- and Locality-Aware Adaptive Task Migration for Efficient Thread Packing. 330-339 - Sudheer Chunduri, Kevin Harms

, Taylor L. Groves, Peter Mendygral, Justs Zarins
, Michèle Weiland
, Yasaman Ghadar:
Performance Evaluation of Adaptive Routing on Dragonfly-based Production Systems. 340-349 - Thaleia Dimitra Doudali

, Daniel Zahka, Ada Gavrilovska:
Cori: Dancing to the Right Beat of Periodic Data Movements over Hybrid Memory Systems. 350-359 - Florian Schmaus

, Nicolas Pfeiffer, Wolfgang Schröder-Preikschat
, Timo Hönig, Jörg Nolte:
Nowa: A Wait-Free Continuation-Stealing Concurrency Platform. 360-371 - Mehran Sadeghi Lahijani, Abu Naser, Cong Wu, Mohsen Gavahi, Viet Tung Hoang, Zhi Wang, Xin Yuan:

Efficient Algorithms for Encrypted All-gather Operation. 372-381 - Otávio Augusto de Oliviera Souza, Olga Goussevskaia, Stefan Schmid

:
CBNet: Minimizing Adjustments in Concurrent Demand-Aware Tree Networks. 382-391 - Yang Xia, Peng Jiang

, Gagan Agrawal, Rajiv Ramnath:
Scaling Sparse Matrix Multiplication on CPU-GPU Nodes. 392-401 - Huizhang Luo, Junqi Wang, Qing Liu

, Jieyang Chen, Scott Klasky, Norbert Podhorszki:
zMesh: Exploring Application Characteristics to Improve Lossy Compression Ratio for Adaptive Mesh Refinement. 402-411 - Linjian Ma, Edgar Solomonik:

Efficient parallel CP decomposition with pairwise perturbation and multi-sweep dimension tree. 412-421 - Lorena A. Barba:

12 Ways to Fool the Masses with Irreproducible Results. 422 - Karl Bäckström, Ivan Walulya, Marina Papatriantafilou, Philippas Tsigas

:
Consistent Lock-free Parallel Stochastic Gradient Descent for Fast and Stable Convergence. 423-432 - Xinyuan Li, Huang Ye, Jian Zhang:

Redesigning Peridigm on SIMT Accelerators for High-performance Peridynamics Simulations. 433-443 - Qinghua Zhou

, C. Chu, N. S. Kumar, Pouya Kousha
, Seyedeh Mahdieh Ghazimirsaeed, Hari Subramoni
, Dhabaleswar K. Panda:
Designing High-Performance MPI Libraries with On-the-fly Compression for Modern GPU Clusters*. 444-453 - Xi Wang, John D. Leidel

, Brody Williams, Alan Ehret, Miguel Mark, Michel A. Kinsy, Yong Chen:
xBGAS: A Global Address Space Extension on RISC-V for High Performance Computing. 454-463 - Lechen Yu, Joachim Protze

, Oscar R. Hernandez, Vivek Sarkar:
ARBALEST: Dynamic Detection of Data Mapping Issues in Heterogeneous OpenMP Applications. 464-474 - Jan Hückelheim

, Johannes Doerfert:
Spray: Sparse Reductions of Arrays in OPENMP. 475-484 - Larisa Stoltzfus, Brian Hamilton, Michel Steuwer, Lu Li, Christophe Dubach:

Code Generation for Room Acoustics Simulations with Complex Boundary Conditions. 485-496 - George Bisbas

, Fabio Luporini, Mathias Louboutin, Rhodri Nelson, Gerard J. Gorman, Paul H. J. Kelly:
Temporal blocking of finite-difference stencil operators with sparse "off-the-grid" sources. 497-506 - Louis Pisha, Lukasz Ligowski:

Accelerating non-power-of-2 size Fourier transforms with GPU Tensor Cores. 507-516 - Giulia Guidi, Oguz Selvitopi, Marquita Ellis, Leonid Oliker, Katherine A. Yelick

, Aydin Buluç
:
Parallel String Graph Construction and Transitive Reduction for De Novo Genome Assembly. 517-526 - Israt Nisa, Prashant Pandey, Marquita Ellis, Leonid Oliker, Aydin Buluç

, Katherine A. Yelick
:
Distributed-Memory k-mer Counting on GPUs. 527-536 - Thomas Hérault

, Yves Robert, George Bosilca, Robert J. Harrison, Cannada A. Lewis, Edward F. Valeev
, Jack J. Dongarra:
Distributed-memory multi-GPU block-sparse tensor contraction for electronic structure. 537-546 - Will Usher, Xuan Huang

, Steve Petruzza, Sidharth Kumar, Stuart R. Slattery, Samuel Temple Reeve, Feng Wang, Chris R. Johnson, Valerio Pascucci
:
Adaptive Spatially Aware I/O for Multiresolution Particle Data Layouts. 547-556 - Bing Xie

, Zilong Tan, Philip H. Carns, Jeffrey S. Chase, Kevin Harms
, Jay F. Lofstead
, Sarp Oral, Sudharshan S. Vazhkudai, Feiyi Wang:
Interpreting Write Performance of Supercomputer I/O Systems with Regression Models. 557-566 - Jiwoo Bang, Chungyong Kim, Sunggon Kim

, Qichen Chen, Cheongjun Lee, Eun-Kyu Byun, Jaehwan Lee, Hyeonsang Eom:
Finer-LRU: A Scalable Page Management Scheme for HPC Manycore Architectures. 567-576 - Jean Luca Bez, Alberto Miranda, Ramon Nou

, Francieli Zanon Boito, Toni Cortes
, Philippe O. A. Navaux:
Arbitration Policies for On-Demand User-Level I/O Forwarding on HPC Platforms. 577-586 - Aaron Handleman, Arthur G. Rattew, I-Ting Angelina Lee, Tao B. Schardl:

A Hybrid Scheduling Scheme for Parallel Loops. 587-598 - Hao Lan, Li Chen, Baochun Li:

EAGLE: Expedited Device Placement with Automatic Grouping for Large Models. 599-608 - Qiming Zheng, Quan Chen, Kaihao Bai, Huifeng Guo, Yong Gao, Xiuqiang He, Minyi Guo:

BiPS: Hotness-aware Bi-tier Parameter Synchronization for Recommendation Models. 609-618 - Yuke Wang, Boyuan Feng, Yufei Ding:

DSXplore: Optimizing Convolutional Neural Networks via Sliding-Channel Convolutions. 619-628 - Arpan Jain, Tim Moon, Tom Benson, Hari Subramoni, Sam Adé Jacobs, Dhabaleswar K. Panda, Brian Van Essen:

SUPER: SUb-Graph Parallelism for TransformERs. 629-638 - Dustin Machi, Parantapa Bhattacharya, Stefan Hoops, Jiangzhuo Chen, Henning S. Mortveit, Srinivasan Venkatramanan, Bryan L. Lewis, Mandy L. Wilson, Arindam Fadikar, Tom Maiden, Christopher L. Barrett, Madhav V. Marathe:

Scalable Epidemiological Workflows to Support COVID-19 Planning and Response. 639-650 - Yubo Qin, Ivan Rodero, Manish Parashar:

Facilitating Data Discovery for Large-scale Science Facilities using Knowledge Networks. 651-660 - Laércio Lima Pilla

:
Optimal Task Assignment for Heterogeneous Federated Learning Devices. 661-670 - Zhipin Gu

, Yuexiang Yang:
Detecting Malicious Model Updates from Federated Learning on Conditional Variational Autoencoder. 671-680 - Guy E. Blelloch:

Is Asymptotic Cost Analysis Useful in Developing Practical Parallel Algorithms. 681 - Jason Cong:

From Parallelization to Customization - Challenges and Opportunities. 682 - Yongseok Soh, Patrick Flick, Xing Liu, Shaden Smith, Fabio Checconi, Fabrizio Petrini, Jee W. Choi:

High Performance Streaming Tensor Decomposition. 683-692 - Le Li

, Shigeyuki Sato, Qiheng Liu, Kenjiro Taura
:
Plex: Scaling Parallel Lexing with Backtrack-Free Prescanning. 693-702 - Daniel Mlakar, Martin Winter

, Mathias Parger, Markus Steinberger
:
Speculative Parallel Reverse Cuthill-McKee Reordering on Multi- and Many-core Architectures. 703-713 - Brendan L. West, Jeffrey A. Fessler, Thomas F. Wenisch:

Jigsaw: A Slice-and-Dice Approach to Non-uniform FFT Acceleration for MRI Image Reconstruction. 714-723 - Bo Peng, Jiayu Li, Selahattin Akkas, Takuya Araki, Ohno Yoshiyuki, Judy Qiu:

Rank Position Forecasting in Car Racing. 724-733 - Yuan Xu

, Tianwei Zhang, Jimin Han, Sa Wang, Yungang Bao:
Towards Practical Cloud Offloading for Low-cost Ground Vehicle Workloads. 734-745 - Loïck Bonniot, Christoph Neumann, François Taïani

:
Towards Internet-Scale Convolutional Root-Cause Analysis with DIAGNET. 746-755 - Jananie Jarachanthan, Li Chen, Fei Xu, Bo Li:

Astra: Autonomous Serverless Analytics with Cost-Efficiency and QoS-Awareness. 756-765 - Anne Benoit, Redouane Elghazi, Yves Robert

:
Max-Stretch Minimization on an Edge-Cloud Platform. 766-775 - Janick Edinger, Martin Breitbach

, Niklas Gabrisch, Dominik Schäfer, Christian Becker, Amr Rizk:
Decentralized Low-Latency Task Scheduling for Ad-Hoc Computing. 776-785 - Tim Shaffer

, Zhuozhao Li
, Ben Tovar
, Yadu N. Babuji, T. J. Dasso, Zoe Surma, Kyle Chard
, Ian T. Foster, Douglas Thain:
Lightweight Function Monitors for Fine-Grained Management in Large Scale Python Applications. 786-796 - Xiaofeng Hou, Chao Li, Jiacheng Liu

, Lu Zhang, Shaolei Ren, Jingwen Leng, Quan Chen, Minyi Guo:
AlphaR: Learning-Powered Resource Management for Irregular, Dynamic Microservice Graph. 797-806 - Yuping Fan, Zhiling Lan, J. Taylor Childers

, Paul Rich, William E. Allcock, Michael E. Papka
:
Deep Reinforcement Agent for Scheduling in HPC. 807-816 - Bin Xu, Jianzhong Huang, Qiang Cao, Xiao Qin

, Ping Xie:
F-Write: Fast RDMA-supported Writes in Erasure-coded In-memory Clusters. 817-826 - Sijie Wu, Hanhua Chen, Yonghui Wang, Hai Jin:

Argus: Efficient Job Scheduling in RDMA-assisted Big Data Processing. 827-836 - Sajal Dash

, Qais Al-Hajri, Wu-chun Feng, Harold R. Garner, Ramu Anandakrishnan:
Scaling Out a Combinatorial Algorithm for Discovering Carcinogenic Gene Combinations to Thousands of GPUs. 837-846 - Zihao Wang, Xiaohua Wan, Zhiyong Liu, Qianshuo Fan, Fa Zhang, Guangming Tan:

A Multi-GPU Design for Large Size Cryo-EM 3D Reconstruction. 847-858 - Jieyang Chen

, Lipeng Wan, Xin Liang, Ben Whitney, Qing Liu
, David Pugmire, Nicholas Thompson, Jong Youl Choi, Matthew Wolf, Todd S. Munson, Ian T. Foster, Scott Klasky:
Accelerating Multigrid-based Hierarchical Scientific Data Refactoring on GPUs. 859-868 - Long Qu, Loris Lucido, Marie Bonnasse-Gahot, Pascal Vezolle, Diego Klahr:

Extremely Fast and Energy Efficient One-way Wave Equation Migration on GPU-based heterogeneous architecture. 869-880 - Jiannan Tian

, Cody Rivera
, Sheng Di, Jieyang Chen, Xin Liang, Dingwen Tao
, Franck Cappello:
Revisiting Huffman Coding: Toward Extreme Performance on Modern GPU Architectures. 881-891 - Zhehan Lin, Hanchen Guo, Chentao Wu, Jie Li, Guangtao Xue, Minyi Guo:

Rack-Scaling: An efficient rack-based redistribution method to accelerate the scaling of cloud disk arrays. 892-901 - Xiaoyi Zhang, Feng Zhu, Shu Li, Kun Wang, Wei Xu, Dengcai Xu:

Optimizing Performance for Open-Channel SSDs in Cloud Storage System. 902-911 - Liang Zhang, Wenli Zheng

, Chao Li, Yao Shen, Minyi Guo:
AuTraScale: An Automated and Transfer Learning Solution for Streaming System Auto-Scaling. 912-921 - Kishori M. Konwar, Wyatt Lloyd, Haonan Lu, Nancy A. Lynch:

SNOW Revisited: Understanding When Ideal READ Transactions Are Possible. 922-931 - Kaihua Fu, Wei Zhang, Quan Chen, Deze Zeng, Xin Peng, Wenli Zheng, Minyi Guo:

QoS-Aware and Resource Efficient Microservice Deployment in Cloud-Edge Continuum. 932-941 - Anisur Rahaman Molla, Kaushik Mondal, William K. Moses Jr.:

Byzantine Dispersion on Graphs. 942-951 - Pankaj Khanchandani, Roger Wattenhofer:

Byzantine Agreement with Unknown Participants and Failures. 952-961 - Abdullah T. Mughrabi, Mohannad Ibrahim

, Gregory T. Byrd
:
QPR: Quantizing PageRank with Coherent Shared Memory Accelerators. 962-972 - Gurbinder Gill, Roshan Dathathri, Saeed Maleki, Madan Musuvathi, Todd Mytkowicz, Olli Saarikivi:

Distributed Training of Embeddings using Graph Analytics. 973-983 - Joseph Renzullo, Westley Weimer, Stephanie Forrest

:
Multiplicative Weights Algorithms for Parallel Automated Software Repair. 984-993 - Yun-Yong Ko

, Kibong Choi, Jiwon Seo, Sang-Wook Kim:
An In-Depth Analysis of Distributed Training of Deep Neural Networks. 994-1003 - Masahiro Tanaka, Kenjiro Taura

, Toshihiro Hanawa, Kentaro Torisawa:
Automatic Graph Partitioning for Very Large-scale Deep Learning. 1004-1013 - Eric Qin, Geonhwa Jeong, William Won

, Sheng-Chun Kao, Hyoukjun Kwon
, Sudarshan Srinivasan, Dipankar Das, Gordon Euhyun Moon, Sivasankaran Rajamanickam, Tushar Krishna:
Extending Sparse Tensor Accelerators to Support Multiple Compression Formats. 1014-1024 - Venmugil Elango:

Pase: Parallelization Strategies for Efficient DNN Training. 1025-1034 - Horng-Ruey Huang, Ding-Yong Hong

, Jan-Jan Wu, Pangfeng Liu, Wei-Chung Hsu:
Efficient Video Captioning on Heterogeneous System Architectures. 1035-1045 - George Michelogiannakis

, Darren Lyles
, Patricia Gonzalez-Guerrero, Meriam Gay Bautista, Dilip Vasudevan
, Anastasiia Butko:
SRNoC: A Statically-Scheduled Circuit-Switched Superconducting Race Logic NoC. 1046-1055 - Jens Domke

, Emil Vatai
, Aleksandr Drozd, Peng Chen
, Yosuke Oyama, Lingqi Zhang
, Shweta Salaria, Daichi Mukunoki
, Artur Podobas, Mohamed Wahib, Satoshi Matsuoka:
Matrix Engines for High Performance Computing: A Paragon of Performance or Grasping at Straws? 1056-1065 - Ayaz Akram, Anna Giannakou, Venkatesh Akella, Jason Lowe-Power

, Sean Peisert
:
Performance Analysis of Scientific Computing Workloads on General Purpose TEEs. 1066-1076 - Martin Karp, Artur Podobas, Niclas Jansson

, Tobias Kenter, Christian Plessl
, Philipp Schlatter, Stefano Markidis:
High-Performance Spectral Element Methods on Field-Programmable Gate Arrays : Implementation, Evaluation, and Future Projection. 1077-1086 - Kamalakkannan Kamalavasan, Gihan R. Mudalige

, István Z. Reguly, Suhaib A. Fahmy:
High-Level FPGA Accelerator Design for Structured-Mesh-Based Explicit Numerical Solvers. 1087-1096

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














