


default search action
IPDPS 2021: Portland, OR, USA - Workshops
- IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPS Workshops 2021, Portland, OR, USA, June 17-21, 2021. IEEE 2021, ISBN 978-1-6654-3577-2

HCW: Heterogeneity in Computing Workshop
- Yujing Ma, Florin Rusu, Kesheng Wu

, Alexander Sim
:
Adaptive Stochastic Gradient Descent for Deep Learning on Heterogeneous CPU+GPU Architectures. 6-15 - Vinícius Garcia Pinto, Lucas Leandro Nesi, Marcelo Cogo Miletto, Lucas Mello Schnorr:

Providing In-depth Performance Analysis for Heterogeneous Task-based Applications with StarVZ. 16-25 - Francis O'Brien, Matthew Agostini, Tarek S. Abdelrahman:

A Streaming Accelerator for Heterogeneous CPU-FPGA Processing of Graph Applications. 26-35 - Feng Li, Moon Gi Seok, Wentong Cai:

A New Double Rank-based Multi-workflow Scheduling with Multi-objective Optimization in Cloud Environments. 36-45 - Caio S. Rohwedder

, João P. L. de Carvalho
, José Nelson Amaral, Guido Araújo, Giancarlo Colmenares, Kai-Ting Amy Wang:
Pooling Acceleration in the DaVinci Architecture Using Im2col and Col2im Instructions. 46-55 - Ranjan Sarpangala Venkatesh

, Tony Mason, Pradeep Fernando, Greg Eisenhauer, Ada Gavrilovska:
Scheduling HPC Workflows with Intel Optane Persistent Memory. 56-65 - Rohan Kumar, Matt Baughman, Ryan Chard, Zhuozhao Li

, Yadu N. Babuji, Ian T. Foster, Kyle Chard
:
Coding the Computing Continuum: Fluid Function Execution in Heterogeneous Computing Environments. 66-75 - Morris Riedel, Rocco Sedona, Chadi Barakat

, Pétur Helgi Einarsson, Reza Hassanian
, Gabriele Cavallaro
, Matthias Book, Helmut Neukirchen, Andreas Lintermann:
Practice and Experience in using Parallel and Scalable Machine Learning with Heterogenous Modular Supercomputing Architectures. 76-85
RAW: Reconfigurable Architectures Workshop
- Hirohisa Watanabe, Hiroki Matsutani:

Accelerating ODE-Based Neural Networks on Low-Cost FPGAs. 88-95 - Hirohisa Watanabe, Mineto Tsukada, Hiroki Matsutani:

An FPGA-Based On-Device Reinforcement Learning Approach using Online Sequential Learning. 96-103 - Lorenzo Farinelli, Daniele Valentino De Vincenti, Andrea Damiani

, Luca Stornaiuolo, Rolando Brondolin, Marco D. Santambrogio, Donatella Sciuto:
Plaster: an Embedded FPGA-based Cluster Orchestrator for Accelerated Distributed Algorithms. 104-107 - Nael Fasfous, Manoj Rohit Vemparala, Alexander Frickenstein, Lukas Frickenstein, Mohamed Badawy

, Walter Stechele:
BinaryCoP: Binary Neural Network-based COVID-19 Face-Mask Wear and Positioning Predictor on Edge Devices. 108-115 - Danielle Tchuinkou Kwadjo, Joel Mandebi Mbongue, Christophe Bobda:

Exploring a Layer-based Pre-implemented Flow for Mapping CNN on FPGA. 116-123 - Timothy Martin, Gary Gréwal, Shawki Areibi:

A Machine Learning Approach to Predict Timing Delays During FPGA Placement. 124-127 - Daniele Paletti

, Davide Conficconi
, Marco D. Santambrogio:
Dovado: An Open-Source Design Space Exploration Framework. 128-135 - Lukas Weber, Lukas Sommer, Leonardo Solis-Vasquez

, Tobias Vinçon, Christian Knödler, Arthur Bernhardt, Ilia Petrov, Andreas Koch:
A Framework for the Automatic Generation of FPGA-based Near-Data Processing Accelerators in Smart Storage Systems. 136-143 - Renato Campos, João M. P. Cardoso

:
On Data Parallelism Code Restructuring for HLS Targeting FPGAs. 144-151 - Philipp Holzinger, Daniel Reiser

, Tobias Hahn
, Marc Reichenbach:
Fast HBM Access with FPGAs: Analysis, Architectures, and Applications. 152-159 - Mohamed W. Hassan, Peter M. Athanas:

Graph Analytics on Hybrid System (GAHS) Case Study: PageRank. 160-167 - Joel Mandebi Mbongue, Sujan Kumar Saha, Christophe Bobda:

Performance Study of Multi-tenant Cloud FPGAs. 168-171 - Najdet Charaf

, Ahmed Kamaleldin, Martin Thümmler, Diana Göhringer
:
RV-CAP: Enabling Dynamic Partial Reconfiguration for FPGA-Based RISC-V System-on-Chip. 172-179 - Quentin Berthet

, Andres Upegui
, Laurent Gantel
, Alexandre Duc, Giulia Traverso:
An Area-Efficient SPHINCS+ Post-Quantum Signature Coprocessor. 180-187 - Jianyu Chen, Maurice Daverveldt, Zaid Al-Ars:

FPGA Acceleration of Zstd Compression Algorithm. 188-191
HiCOMB: High Performance Computational Biology
- Gulsum Gudukbay, Jashwant Raj Gunasekaran, Yilin Feng, Mahmut T. Kandemir, Anton Nekrutenko, Chita R. Das, Paul Medvedev, Björn A. Grüning, Nate Coraor, Nathan Roach, Enis Afgan:

GYAN: Accelerating Bioinformatics Tools in Galaxy with GPU-Aware Computation Mapping. 194-203 - Bryce Kille, Yunxi Liu, Nicolae Sapoval, Michael Nute, Lawrence Rauchwerger, Nancy M. Amato, Todd J. Treangen:

Accelerating SARS-CoV-2 low frequency variant calling on ultra deep sequencing datasets. 204-208 - Zülal Bingöl, Mohammed Alser, Onur Mutlu

, Ozcan Ozturk, Can Alkan:
GateKeeper-GPU: Fast and Accurate Pre-Alignment Filtering in Short Read Mapping. 209 - Ahmad Hesam

, Lukas Breitwieser
, Fons Rademakers, Zaid Al-Ars:
GPU Acceleration of 3D Agent-Based Biological Simulations. 210-217 - Pierre Barbera, Alexandros Stamatakis

:
Efficient Memory Management in Likelihood-based Phylogenetic Placement. 218-227 - Chiranjeb Mondal, Sanjay V. Rajopadhye:

Accelerating the BPMax Algorithm for RNA-RNA Interaction. 228-237
GrAPL: Graphs, Architectures, Programming, and Learning
- Gábor Szárnyas, David A. Bader, Timothy A. Davis, James Kitchen, Timothy G. Mattson, Scott McMillan, Erik Welch:

LAGraph: Linear Algebra, Network Analysis Libraries, and the Study of Graph Algorithms. 243-252 - Benjamin Brock, Aydin Buluç

, Timothy G. Mattson, Scott McMillan, José E. Moreira:
Introduction to GraphBLAS 2.0. 253-262 - Jeremy Kepner, Timothy Davis, Vijay Gadepally, Hayden Jananthan, Lauren Milechin:

Mathematics of Digital Hyperspace. 263-271 - Egor Orachev, Maria Karpenko, Artem Khoroshev, Semyon V. Grigorev

:
SPbLA: The Library of GPGPU-Powered Sparse Boolean Linear Algebra Operations. 272-275 - Kasimir Gabert, Ümit V. Çatalyürek:

PIGO: A Parallel Graph Input/Output Library. 276-279 - Pat Devlin, Jeremy Kepner, Ashley Luo, Erin Meger:

Hybrid Power-Law Models of Network Traffic. 280-287 - Zhaochen Gu, Sihai Tang, Beilei Jiang

, Song Huang, Qiang Guan, Song Fu:
Characterizing Job-Task Dependency in Cloud Workloads Using Graph Learning. 288-297 - Kuldeep R. Kurte

, Neena Imam, Ramakrishnan Kannan, S. M. Shamimul Hasan, Srikanth B. Yoginath
:
Co-design of Advanced Architectures for Graph Analytics using Machine Learning. 298-307 - Catherine D. Schuman

, Bill Kay
, Prasanna Date, Ramakrishnan Kannan, Piyush Sao, Thomas E. Potok:
Sparse Binary Matrix-Vector Multiplication on Neuromorphic Computers. 308-311
EduPar: NSF/TCPP Workshop on Parallel and Distributed Computing Education
- Jirí Dokulil:

Let's Put the Memory Model Front and Center When Teaching Parallel Programming in C++. 315-320 - Sascha Hunold

, Bartlomiej Przybylski
:
Teaching Complex Scheduling Algorithms. 321-327 - Sherif G. Aly

, Haidar Harmanani, Rajendra K. Raj, Sanaa Sharafeddine:
ABET Accreditation: A Way Forward for PDC Education. 328-335 - Jesús Cámara

, José-Carlos Cano, Javier Cuenca
, Toshiyuki Maeda, Mariano Saura-Sánchez, Lewis Tseng, Akiyoshi Wakatani, Martina Barnas:
EduPar Virtual Poster Session. 336-341 - Joel C. Adams

, Richard A. Brown, Suzanne J. Matthews
, Elizabeth Shoop:
Teaching PDC in the Time of COVID: Hands-on Materials for Remote Learning. 342-349 - Michael Gowanlock, Benoît Gallet:

Data-Intensive Computing Modules for Teaching Parallel and Distributed Computing. 350-357
HIPS: High-level Parallel Programming Models and Supportive Environments
- Yong Wang, Yongfa Zhou, Qi Scott Wang, Yang Wang, Qing Xu, Chen Wang, Bo Peng, Zhaojun Zhu, Katayama Takuya, Dylan Wang:

Developing medical ultrasound beamforming application on GPU and FPGA using oneAPI. 360-370 - Zheming Jin, Jeffrey S. Vetter:

Evaluating CUDA Portability with HIPCL and DPCT. 371-376 - Gregor Daiß

, Mikael Simberg, Auriane Reverdell, John Biddiscombe, Theresa Pollinger, Hartmut Kaiser
, Dirk Pflüger:
Beyond Fork-Join: Integration of Performance Portable Kokkos Kernels with HPX. 377-386 - Bo Qiao, Jürgen Teich, Frank Hannig:

An Efficient Approach for Image Border Handling on GPUs via Iteration Space Partitioning. 387-396 - Xinyao Yi, David Stokes, Yonghong Yan, Chunhua Liao

:
CUDAMicroBench: Microbenchmarks to Assist CUDA Performance Programming. 397-406 - Poornima Nookala

, Zafar Ahmad
, Mohammad Mahdi Javanmard, Martin Kong
, Rezaul Chowdhury, Robert J. Harrison:
Understanding Recursive Divide-and-Conquer Dynamic Programs in Fork-Join and Data-Flow Execution Models. 407-416 - Donovan Snyder

, Chen Ding:
Measuring Cache Complexity Using Data Movement Distance (DMD). 417-419 - Aaron Welch, Oscar R. Hernandez, Barbara M. Chapman:

Combining Static and Dynamic Analysis to Query Characteristics of HPC Applications. 420-429
AsHES: Accelerators and Hybrid Emerging Systems
- Tetsuro Nakamura

, Shogo Saito, Kei Fujimoto
, Masashi Kaneko, Akinori Shiraga:
Time-Division Multiplexing for FPGA Considering CNN Model Switch Time. 433-438 - S. M. Shamimul Hasan, Neena Imam, Ramakrishnan Kannan, Srikanth B. Yoginath

, Kuldeep R. Kurte
:
Design Space Exploration of Emerging Memory Technologies for Machine Learning Applications. 439-448 - Felix Liu, Niclas Jansson, Artur Podobas, Albin Fredriksson, Stefano Markidis:

Accelerating Radiation Therapy Dose Calculation with Nvidia GPUs. 449-458 - Lena Oden, Jörg Keller:

Improving Cryptanalytic Applications with Stochastic Runtimes on GPUs. 459-468 - Jennifer A. Loe, Christian A. Glusa, Ichitaro Yamazaki, Erik G. Boman, Sivasankaran Rajamanickam:

Experimental Evaluation of Multiprecision Strategies for GMRES on GPUs. 469-478 - Jaemin Choi, Zane Fink, Sam White, Nitin Bhat, David F. Richards, Laxmikant V. Kalé:

GPU-aware Communication with UCX in Parallel Programming Models: Charm++, MPI, and Python. 479-488
PDCO: Parallel / Distributed Combinatorics and Optimization
- Florian Fey, Sergei Gorlatch:

CPRIC: Collaborative Parallelism for Randomized Incremental Constructions. 490-499 - Fekhr Eddine Keddous, H.-N. Nguyen, Amir Nakib:

Characters Recognition based on CNN-RNN architecture and Metaheuristic. 500-507 - Roger L. Goodwin:

Linearizing Computing the Power Set with OpenMP. 508-519 - Oswaldo Artiles, Fahad Saeed

:
TurboBFS: GPU Based Breadth-First Search (BFS) Algorithms in the Language of Linear Algebra. 520-528 - Ryan J. Marshall, Lakmali Weerasena

, Anthony Skjellum:
A Parallel Meta-Solver for the Multi-Objective Set Covering Problem. 529-538 - Peter Oostema, Franz Franchetti:

Leveraging High Dimensional Spatial Graph Embedding as a Heuristic for Graph Algorithms. 539-547 - Mikhail G. Babenko

, Andrei Tchernykh, Luis Bernardo Pulido-Gaytan
, Jorge M. Cortés-Mendoza
, Egor M. Shiryaev
, Elena Golimblevskaia
, Arutyun Avetisyan, Sergio Nesmachnow:
RRNS Base Extension Error-Correcting Code for Performance Optimization of Scalable Reliable Distributed Cloud Data Storage. 548-553
APDCM: Advances in Parallel and Distributed Computational Models
- Jonas Posner, Lukas Reitz, Claudia Fohry:

Checkpointing vs. Supervision Resilience Approaches for Dynamic Independent Tasks. 556-565 - Masahiro Shibata, Masaki Ohyabu, Yuichi Sudo

, Junya Nakamura
, Yonghwan Kim, Yoshiaki Katayama:
Gathering of seven autonomous mobile robots on triangular grids. 566-575 - Kevin Buchin

, Paola Flocchini, Irina Kostitsyna, Tom Peters
, Nicola Santoro, Koichi Wada:
Autonomous Mobile Robots: Refining the Computational Landscape. 576-585 - Shota Nagahama, Fukuhito Ooshita, Michiko Inoue:

Terminating Grid Exploration with Myopic Luminous Robots. 586-595 - Hirotsugu Kakugawa, Sayaka Kamei

:
A self-stabilizing token circulation with graceful handover on bidirectional ring networks. 596-604 - Andreas Klos, Marius Rosenbaum, Wolfram Schiffmann:

Scalable and Highly Available Multi-Objective Neural Architecture Search in Bare Metal Kubernetes Cluster. 605-610 - George Bosilca, Aurélien Bouteiller

, Thomas Hérault
, Valentin Le Fèvre, Yves Robert
, Jack J. Dongarra:
Revisiting Credit Distribution Algorithms for Distributed Termination Detection. 611-620 - Roman Iakymchuk, Amândio Faustino, Andrew P. J. Emerson

, João Barreto
, Valeria Bartsch, Rodrigo Rodrigues, José C. Monteiro:
Efficient and Eventually Consistent Collective Operations. 621-630 - Andrew Rosen, Benjamin Levin, Anu G. Bourgeois:

Autonomous Load Balancing in Distributed Hash Tables Using Churn and the Sybil Attack. 631-640 - Aparna Sasidharan:

Performance Models for Hybrid Programs Accelerated by GPUs. 641-651 - Zheming Jin, Jeffrey S. Vetter:

Evaluating the Performance of Integer Sum Reduction on an Intel GPU. 652-655 - Koji Nakano

, Shotaro Aoki, Yasuaki Ito, Akihiko Kasagi:
On the Computational Power of Convolution Pooling: A Theoretical Approach for Deep Learning. 656-665
PDSEC: Parallel and Distributed Scientific and Engineering Computing
- Pranav U. Gadikar

, Patrick Diehl
, Prashant K. Jha:
Load balancing for distributed nonlocal models within asynchronous many-task systems. 669-678 - Sandra Catalán

, Francisco D. Igual
, Rafael Rodríguez-Sánchez
, Enrique S. Quintana-Ortí:
Scalable Hybrid Loop- and Task-Parallel Matrix Inversion for Multicore Processors. 679-687 - Yu-Hsuan Shih, Garrett Wright, Joakim Andén, Johannes P. Blaschke

, Alex H. Barnett:
cuFINUFFT: a load-balanced GPU library for general-purpose nonuniform FFTs. 688-697 - Amin Totounferoush, Neda Ebrahimi Pour, Sabine Roller, Miriam Mehl:

Parallel Machine Learning of Partial Differential Equations. 698-703 - Jessica Imlau Dagostini, Henrique Corrêa Pereira da Silva, Vinícius Garcia Pinto, Roberto Machado Velho

, Eduardo S. L. Gastal, Lucas Mello Schnorr:
Improving Workload Balance of a Marine CSEM Inversion Application. 704-713 - Hadia Ahmed, David B. Williams-Young

, Khaled Z. Ibrahim, Chao Yang:
Performance Modeling and Tuning for DFT Calculations on Heterogeneous Architectures. 714-722 - Makoto Morishita, Satoshi Ohshima

, Takahiro Katagiri, Toru Nagai:
Parallelization of GKV benchmark using OpenACC. 723-729 - Sergio Barrachina

, Adrián Castelló
, Mar Catalán, Manuel F. Dolz
, José I. Mestre:
A Flexible Research-Oriented Framework for Distributed Training of Deep Neural Networks. 730-739 - Jan Verschelde:

Accelerated Polynomial Evaluation and Differentiation at Power Series in Multiple Double Precision. 740-749
iWAPT: Automatic Performance Tuning
- Yuta Sasaki, Ayumu Ishizuka, Mulya Agung

, Hiroyuki Takizawa
:
Evaluating I/O Acceleration Mechanisms of SX-Aurora TSUBASA. 752-759 - Kengo Nakajima, Takeshi Ogita

, Masatoshi Kawai:
Efficient Parallel Multigrid Methods on Manycore Clusters with Double/Single Precision Computing. 760-769 - Chia-Chun Liang, Che-Rung Lee:

Automatic Selection of Tensor Decomposition for Compressing Convolutional Neural Networks A Case Study on VGG-type Networks. 770-778 - Kou Murakami, Kazuhiko Komatsu, Masayuki Sato, Hiroaki Kobayashi:

A Processor Selection Method based on Execution Time Estimation for Machine Learning Programs. 779-788 - Naruya Kitai, Daisuke Takahashi

, Franz Franchetti, Takahiro Katagiri, Satoshi Ohshima
, Toru Nagai:
An Auto-tuning with Adaptation of A64 Scalable Vector Extension for SPIRAL. 789-797 - Ayse Bagbaba, Xuan Wang:

Improving the MPI-IO Performance of Applications with Genetic Algorithm based Auto-tuning. 798-805 - Jacob O. Tørring, Jan Christian Meyer, Anne C. Elster:

Autotuning Benchmarking Techniques: A Roofline Model Case Study. 806-815 - Sai P. Chenna, Herman Lam, Greg Stitt, S. Balachandar

:
Scalable Performance Prediction of Irregular Workloads in Multi-Phase Particle-in-Cell Applications. 816-825
SNACS: Scalable Networks for Advanced Computing Systems Workshop
- Ryohei Sato

, Hidetoshi Kawaguchi, Yuichi Nakatani:
User Allocation for Real-Time Applications with State Sharing in Fog Computing Networks. 828-831 - Zaid Salamah A. Alzaid

, Saptarshi Bhowmik, Xin Yuan:
Multi-Path Routing in the Jellyfish Network. 832-841
PAISE: Parallel AI and Systems for the Edge
- Enrique Nueve, Sean Shahkarami, Seongha Park, Nicola J. Ferrier

:
Addressing the Constraints of Active Learning on the Edge. 845-849 - Xiaojun Ruan, Haiquan Chen:

Informed Prefetching in I/O Bounded Distributed Deep Learning. 850-857 - Gaurav Verma

, Yashi Gupta, Abid M. Malik, Barbara M. Chapman:
Performance Evaluation of Deep Learning Compilers for Edge Inference. 858-865 - Martin Breitbach

, Janick Edinger, Dominik Schäfer, Christian Becker:
DataVinci: Proactive Data Placement for Ad-Hoc Computing. 866-873 - André Luckow

, Kartik Rattan, Shantenu Jha
:
Pilot-Edge: Distributed Resource Management Along the Edge-to-Cloud Continuum. 874-878 - Bibek Shrestha, Richard Cziva, Engin Arslan

:
INT Based Network-Aware Task Scheduling for Edge Computing. 879-886 - Aravind Sankaran

, Paolo Bientinesi:
Performance Comparison for Scientific Computations on the Edge via Relative Performance. 887-895
RADR: Resource Arbitration for Dynamic Runtimes
- Liang Wei, Kazuyuki Shudo:

Dynamic Computing Resources Allocation for Multiple Deep Learning Tasks. 899-905
ScaDL: Scalable Deep Learning over Parallel And Distributed Infrastructures
- Quentin Anthony, Lang Xu, Hari Subramoni, Dhabaleswar K. D. K. Panda:

Scaling Single-Image Super-Resolution Training on Modern HPC Clusters: Early Experiences. 923-932 - Medha Atre, Birendra Jha

, Ashwini Rao:
Distributed Deep Learning Using Volunteer Computing-Like Paradigm. 933-942 - Pankaj Rajak, Anikeya Aditya, Shogo Fukushima, Rajiv K. Kalia, Thomas Linker

, Kuang Liu, Ye Luo
, Aiichiro Nakano, Ken-ichi Nomura, Kohei Shimamura, Fuyuki Shimojo, Priya Vashishta:
Ex-NNQMD: Extreme-Scale Neural Network Quantum Molecular Dynamics. 943-946 - Arissa Wongpanich, Hieu Pham, James Demmel, Mingxing Tan, Quoc V. Le, Yang You, Sameer Kumar:

Training EfficientNets at Supercomputer Scale: 83% ImageNet Top-1 Accuracy in One Hour. 947-950 - Kaoutar El Maghraoui

, Lorraine M. Herger, Chekuri Choudary, Kim Tran, Todd Deshane, David Hanson:
Performance Analysis of Deep Learning Workloads on a Composable System. 951-954
HPS: High-Performance Storage
- Zhe Wang, Pradeep Subedi, Matthieu Dorier

, Philip E. Davis, Manish Parashar:
Facilitating Staging-based Unstructured Mesh Processing to Support Hybrid In-Situ Workflows. 960-964 - Ke Fan, Kristopher K. Micinski, Thomas Gilray, Sidharth Kumar:

Exploring MPI Collective I/O and File-per-process I/O for Checkpointing a Logical Inference Task. 965-972
ParSocial: Parallel and Distributed Processing for Computational Social Systems
- Eunice E. Santos, Vairavan Murugappan

, John Korah:
Memory Efficient Edge Addition Designs for Large and Dynamic Social Networks. 975-984 - Bogdan Mucenic, Chaitanya Kaligotla, Abby Stevens, Jonathan Ozik, Nicholson T. Collier, Charles M. Macal:

Load Balancing Schemes for Large Synthetic Population-Based Complex Simulators. 985-988 - Eric Tatara, John A. Schneider, Madeline Quasebarth, Nicholson T. Collier, Harold Pollack, Basmattee Boodram, Samuel H. Friedman, Elizabeth Salisbury-Afshar

, Mary Ellen Mackesy-Amiti, Jonathan Ozik:
Application of Distributed Agent-based Modeling to Investigate Opioid Use Outcomes in Justice Involved Populations. 989-997 - Kasimir Gabert, Ali Pinar, Ümit V. Çatalyürek:

Shared-Memory Scalable k-Core Maintenance on Dynamic Graphs and Hypergraphs. 998-1007 - Petros Anastasiadis

, Sergiy Gogolenko, Nikela Papadopoulou
, Marcin Lawenda
, Hamid Arabnejad, Alireza Jahani, Imran Mahmood, Derek Groen:
P-Flee: An Efficient Parallel Algorithm for Simulating Human Migration. 1008-1011

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














