


default search action
Dhabaleswar K. Panda 0001
Dhabaleswar K. D. K. Panda – Dhabaleswar Kumar Panda 0001
Person information
- affiliation: Ohio State University, Columbus, USA
Refine list

refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2025
- [i20]Lang Xu, Quentin Anthony, Jacob Hatef, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Scaling Large Language Model Training on Frontier with Low-Bandwidth Partitioning. CoRR abs/2501.04266 (2025) - 2024
- [j65]Dhabaleswar K. Panda
, Vipin Chaudhary, Eric Fosler-Lussier, Raghu Machiraju, Amit Majumdar, Beth Plale
, Rajiv Ramnath, Ponnuswamy Sadayappan, Neelima Savardekar, Karen Tomko:
Creating intelligent cyberinfrastructure for democratizing AI. AI Mag. 45(1): 22-28 (2024) - [j64]Tu Tran
, Bharath Ramesh, Benjamin Michalowicz
, Mustafa Abduljabbar, Hari Subramoni, Aamir Shafi, Dhabaleswar K. Panda:
Accelerating communication with multi-HCA aware collectives in MPI. Concurr. Comput. Pract. Exp. 36(1) (2024) - [c521]Hooyoung Ahn, Seonyoung Kim, Yoo-Mi Park, Woojong Han, Shinyoung Ahn, Tu Tran, Bharath Ramesh, Hari Subramoni, Dhabaleswar K. Panda:
MPI Allgather Utilizing CXL Shared Memory Pool in Multi-Node Computing Systems. IEEE Big Data 2024: 332-337 - [c520]Lang Xu, Quentin Anthony, Qinghua Zhou, Nawras Alnaasan, Radha Gulhane, Aamir Shafi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Accelerating Large Language Model Training with Hybrid GPU-based Compression. CCGrid 2024: 196-205 - [c519]Benjamin Michalowicz
, Kaushik Kandadi Suresh, Hari Subramoni, Mustafa Abduljabbar, Dhabaleswar K. Panda, Steve Poole:
Effective and Efficient Offloading Designs for One-Sided Communication to SmartNICs. HiPC 2024: 23-33 - [c518]Lang Xu, Quentin Anthony, Jacob Hatef, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Scaling Large Language Model Training on Frontier with Low-Bandwidth Partitioning. HiPC 2024: 57-67 - [c517]Nawras Alnaasan, Bharath Ramesh, Jinghan Yao, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
HyperSack: Distributed Hyperparameter Optimization for Deep Learning using Resource-Aware Scheduling on Heterogeneous GPU Systems. HiPC 2024: 100-110 - [c516]Chen-Chun Chen, Goutham Kalikrishna Reddy Kuncham, Hari Subramoni, Dhabaleswar K. Panda:
Design and Implementation of Kernel-based MPI Reduction Operations for Intel GPU s. HiPC 2024: 122-131 - [c515]Kaushik Kandadi Suresh, Benjamin Michalowicz
, Nick Contini, Bharath Ramesh, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Using BlueField-3 SmartNICs to Offload Vector Operations in Krylov Subspace Methods. HiPC 2024: 155-165 - [c514]Nawras Alnaasan, Horng-Ruey Huang, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Characterizing Communication in Distributed Parameter-Efficient Fine-Tuning for Large Language Models. HOTI 2024: 11-19 - [c513]Tu Tran, Goutham Kalikrishna Reddy Kuncham, Bharath Ramesh, Shulei Xu, Hari Subramoni, Mustafa Abduljabbar, Dhabaleswar K. Panda:
OHIO: Improving RDMA Network Scalability in MPI_Alltoall Through Optimized Hierarchical and Intra/Inter-Node Communication Overlap Design. HOTI 2024: 47-56 - [c512]Quentin Anthony, Benjamin Michalowicz
, Jacob Hatef, Lang Xu, Mustafa Abdul Jabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Demystifying the Communication Characteristics for Distributed Transformer Models. HOTI 2024: 57-65 - [c511]Quentin Anthony
, Jacob Hatef
, Deepak Narayanan
, Stella Biderman
, Stas Bekman
, Junqi Yin
, Aamir Shafi
, Hari Subramoni
, Dhabaleswar K. Panda
:
The Case for Co-Designing Model Architectures with Hardware. ICPP 2024: 84-96 - [c510]Dhabaleswar K. Panda, Hari Subramoni:
Message from the HCW 2024 Technical Program Committee Co-Chairs. IPDPS (Workshops) 2024: 1 - [c509]Dhabaleswar K. Panda, Hari Subramoni:
Message from the HCW 2024 Technical Program Committee Co-Chairs. IPDPS (Workshops) 2024: 4 - [c508]Hooyoung Ahn, Seonyoung Kim, Yoo-Mi Park, Woojong Han, Nick Contini, Bharath Ramesh, Mustafa Abduljabbar, Dhabaleswar K. Panda:
Towards Accelerating k-NN with MPI and Near-Memory Processing. IPDPS (Workshops) 2024: 608-615 - [c507]Mingzhe Han, Goutham Kalikrishna Reddy Kuncham, Benjamin Michalowicz, Rahul Vaidya, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
PML-MPI: A Pre-Trained ML Framework for Efficient Collective Algorithm Selection in MPI. IPDPS (Workshops) 2024: 761-770 - [c506]Bharath Ramesh, Nick Contini, Nawras Alnaasan
, Kaushik Kandadi Suresh, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
HINT: Designing Cache-Efficient MPI_Alltoall using Hybrid Memory Copy Ordering and Non-Temporal Instructions. IPDPS 2024: 802-813 - [c505]Jinghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference. IPDPS 2024: 915-925 - [c504]Qinghua Zhou, Bharath Ramesh, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
Accelerating MPI AllReduce Communication with Efficient GPU-Based Compression Schemes on Modern GPU Clusters. ISC 2024: 1-12 - [c503]Nicholas Contini
, Mustafa Abduljabbar
, Hari Subramoni
, Dhabaleswar K. Panda
:
OMB-FPGA: A Microbenchmark Suite for FPGA-aware MPIs using OpenCL and SYCL. PEARC 2024: 1:1-1:9 - [c502]Radha Gulhane
, Quentin Anthony
, Aamir Shafi
, Hari Subramoni
, Dhabaleswar K. Panda
:
Infer-HiRes: Accelerating Inference for High-Resolution Images with Quantization and Distributed Deep Learning. PEARC 2024: 5:1-5:9 - [c501]Chen-Chun Chen
, Goutham Kalikrishna Reddy Kuncham
, Pouya Kousha
, Hari Subramoni
, Dhabaleswar K. Panda
:
Design and Implementation of an IPC-based Collective MPI Library for Intel GPUs. PEARC 2024: 17:1-17:9 - [c500]Tu Tran
, Mustafa Abduljabbar
, Hooyoung Ahn
, Seonyoung Kim
, Yoo-Mi Park
, Woojong Han
, Shin-Young Ahn
, Hari Subramoni
, Dhabaleswar K. Panda
:
OMB-CXL: A Micro-Benchmark Suite for Evaluating MPI Communication Utilizing Compute Express Link Memory Devices. PEARC 2024: 27:1-27:8 - [i19]Jinghan Yao, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Exploiting Inter-Layer Expert Affinity for Accelerating Mixture-of-Experts Model Inference. CoRR abs/2401.08383 (2024) - [i18]Quentin Anthony, Jacob Hatef, Deepak Narayanan, Stella Biderman, Stas Bekman, Junqi Yin, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
The Case for Co-Designing Model Architectures with Hardware. CoRR abs/2401.14489 (2024) - [i17]Quentin Anthony, Benjamin Michalowicz, Jacob Hatef, Lang Xu, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Demystifying the Communication Characteristics for Distributed Transformer Models. CoRR abs/2408.10197 (2024) - [i16]Jinghan Yao, Sam Ade Jacobs, Masahiro Tanaka, Olatunji Ruwase, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer. CoRR abs/2408.16978 (2024) - [i15]Lang Xu, Quentin Anthony, Qinghua Zhou, Nawras Alnaasan, Radha Gulhane, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Accelerating Large Language Model Training with Hybrid GPU-based Compression. CoRR abs/2409.02423 (2024) - 2023
- [j63]Kawthar Shafie Khorassani, Chen-Chun Chen, Bharath Ramesh, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
High Performance MPI over the Slingshot Interconnect. J. Comput. Sci. Technol. 38(1): 128-145 (2023) - [j62]Kaushik Kandadi Suresh
, Kawthar Shafie Khorassani, Chen-Chun Chen, Bharath Ramesh, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Network-Assisted Noncontiguous Transfers for GPU-Aware MPI Libraries. IEEE Micro 43(2): 131-139 (2023) - [c499]Pouya Kousha
, Qinghua Zhou
, Hari Subramoni
, Dhabaleswar K. Panda
:
Benchmarking Modern Databases for Storing and Profiling Very Large Scale HPC Communication Data. Bench 2023: 104-119 - [c498]Nawras Alnaasan
, Matthew Lieber, Aamir Shafi, Hari Subramoni, Scott A. Shearer, Dhabaleswar K. Panda:
HARVEST: High-Performance Artificial Vision Framework for Expert Labeling using Semi-Supervised Training. IEEE Big Data 2023: 139-148 - [c497]Kinan Al-Attar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
MPI4Spark Meets YARN: Enhancing MPI4Spark through YARN support for HPC. IEEE Big Data 2023: 2265-2274 - [c496]Chen-Chun Chen, Kawthar Shafie Khorassani, Goutham Kalikrishna Reddy Kuncham, Rahul Vaidya, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Implementing and Optimizing a GPU-aware MPI Library for Intel GPUs: Early Experiences. CCGrid 2023: 131-140 - [c495]Quentin Anthony, Lang Xu, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
ScaMP: Scalable Meta-Parallelism for Deep Learning Search. CCGridW 2023: 346-348 - [c494]Quentin Anthony, Lang Xu, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
ScaMP: Scalable Meta-Parallelism for Deep Learning Search. CCGrid 2023: 391-402 - [c493]Dhabaleswar K. D. K. Panda:
How to Educate HPC-Enabled AI and Data Science to Students and Professionals in a Holistic Manner? HiPCW 2023: 4 - [c492]Shulei Xu, Goutham Kalikrishna Reddy Kuncham, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Optimized All-to-All Connection Establishment for High-Performance MPI Libraries Over InfiniBand. HiPC 2023: 41-50 - [c491]Jinghan Yao, Nawras Alnaasan
, Tian Chen, Aamir Shafi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference. HiPC 2023: 107-116 - [c490]Bharath Ramesh, Goutham Kalikrishna Reddy Kuncham, Kaushik Kandadi Suresh, Rahul Vaidya, Nawras Alnaasan
, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Designing In-network Computing Aware Reduction Collectives in MPI. HOTI 2023: 25-32 - [c489]Benjamin Michalowicz
, Kaushik Kandadi Suresh, Hari Subramoni, Dhabaleswar K. D. K. Panda, Stephen W. Poole:
Battle of the BlueFields: An In-Depth Comparison of the BlueField-2 and BlueField-3 SmartNICs. HOTI 2023: 41-48 - [c488]Hyunho Ahn, Tian Chen, Nawras Alnaasan
, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
Performance Characterization of Using Quantization for DNN Inference on Edge Devices. ICFEC 2023: 1-6 - [c487]Nicholas Contini
, Bharath Ramesh
, Kaushik Kandadi Suresh
, Tu Tran
, Benjamin Michalowicz
, Mustafa Abduljabbar
, Hari Subramoni
, Dhabaleswar K. Panda
:
Enabling Reconfigurable HPC through MPI-based Inter-FPGA Communication. ICS 2023: 477-487 - [c486]Kaushik Kandadi Suresh, Benjamin Michalowicz
, Bharath Ramesh, Nicholas Contini
, Jinghan Yao, Shulei Xu, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
A Novel Framework for Efficient Offloading of Communication Operations to Bluefield SmartNICs. IPDPS 2023: 123-133 - [c485]Qinghua Zhou, Quentin Anthony, Lang Xu, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
Accelerating Distributed Deep Learning Training with Compression Assisted Allgather and Reduce-Scatter Communication. IPDPS 2023: 134-144 - [c484]Benjamin Michalowicz
, Kaushik Kandadi Suresh, Bharath Ramesh, Aamir Shafi, Hari Subramoni, Mustafa Abduljabbar, Dhabaleswar K. Panda:
In-Depth Evaluation of a Lower-Level Direct-Verbs API on InfiniBand-based Clusters: Early Experiences. IPDPS Workshops 2023: 354-363 - [c483]Kawthar Shafie Khorassani, Chen-Chun Chen, Hari Subramoni, Dhabaleswar K. Panda:
Designing and Optimizing GPU-aware Nonblocking MPI Neighborhood Collective Communication for PETSc*. IPDPS 2023: 646-656 - [c482]Quentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
MCR-DL: Mix-and-Match Communication Runtime for Deep Learning. IPDPS 2023: 996-1006 - [c481]Pouya Kousha
, Vivekananda Sathu
, Matthew Lieber
, Hari Subramoni
, Dhabaleswar K. Panda
:
Democratizing HPC Access and Use with Knowledge Graphs. SC Workshops 2023: 242-251 - [c480]Chen-Chun Chen
, Kawthar Shafie Khorassani
, Pouya Kousha
, Qinghua Zhou
, Jinghan Yao
, Hari Subramoni
, Dhabaleswar K. Panda
:
MPI-xCCL: A Portable MPI Library over Collective Communication Libraries for Various Accelerators. SC Workshops 2023: 847-854 - [c479]Pouya Kousha
, Arpan Jain, Ayyappa Kolli, Matthew Lieber, Mingzhe Han, Nicholas Contini
, Hari Subramoni, Dhabaleswar K. Panda:
SAI: AI-Enabled Speech Assistant Interface for Science Gateways in HPC. ISC 2023: 402-424 - [c478]Benjamin Michalowicz
, Kaushik Kandadi Suresh
, Hari Subramoni
, Dhabaleswar K. Panda
, Steve Poole
:
DPU-Bench: A Micro-Benchmark Suite to Measure Offload Efficiency Of SmartNICs. PEARC 2023: 94-101 - [c477]Samuel Khuvis
, Karen Tomko
, Scott R. Brozell
, Chen-Chun Chen
, Hari Subramoni
, Dhabaleswar K. Panda
:
Optimizing Amber for Device-to-Device GPU Communication. PEARC 2023: 200-205 - [i14]Hyunho Ahn, Tian Chen, Nawras Alnaasan, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
Performance Characterization of using Quantization for DNN Inference on Edge Devices: Extended Version. CoRR abs/2303.05016 (2023) - [i13]Quentin Anthony, Ammar Ahmad Awan, Jeff Rasley, Yuxiong He, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
MCR-DL: Mix-and-Match Communication Runtime for Deep Learning. CoRR abs/2303.08374 (2023) - [i12]Jinghan Yao, Nawras Alnaasan, Tian Chen, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Flover: A Temporal Fusion Framework for Efficient Autoregressive Model Parallel Inference. CoRR abs/2305.13484 (2023) - 2022
- [j61]Arpan Jain
, Nawras Alnaasan
, Aamir Shafi
, Hari Subramoni
, Dhabaleswar K. Panda:
Optimizing Distributed DNN Training Using CPUs and BlueField-2 DPUs. IEEE Micro 42(2): 53-60 (2022) - [c476]Kinan Al-Attar, Aamir Shafi, Mustafa Abduljabbar
, Hari Subramoni, Dhabaleswar K. Panda:
Spark Meets MPI: Towards High-Performance Communication Framework for Spark using MPI. CLUSTER 2022: 71-81 - [c475]Apan Qasem, Hartwig Anzt, Eduard Ayguadé, Katharine J. Cahill
, Ramon Canal, Jany Chan
, Eric Fosler-Lussier, Fritz Göbel, Arpan Jain, Marcel Koch, Mateusz Kuzak, Josep Llosa, Raghu Machiraju, Xavier Martorell, Pratik Nayak, Shameema Oottikkal, Marcin Ostasz, Dhabaleswar K. Panda, Dirk Pleiter, Rajiv Ramnath, Maria-Ribera Sancho, Alessio Sclocco, Aamir Shafi, Hanno Spreeuw, Hari Subramoni, Karen Tomko
:
Lightning Talks of EduHPC 2022. EduHPC@SC 2022: 42-49 - [c474]Qinghua Zhou, Quentin Anthony, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Accelerating Broadcast Communication with GPU Compression for Deep Learning Workloads. HIPC 2022: 22-31 - [c473]Nawras Alnaasan
, Arpan Jain, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
AccDP: Accelerated Data-Parallel Distributed DNN Training for Modern GPU-Based HPC Clusters. HIPC 2022: 32-41 - [c472]Bharath Ramesh, Qinghua Zhou, Aamir Shafi, Mustafa Abduljabbar, Hari Subramoni, Dhabaleswar K. Panda:
Designing Efficient Pipelined Communication Schemes using Compression in MPI Libraries. HIPC 2022: 95-99 - [c471]Kaushik Kandadi Suresh, Akshay Paniraja Guptha, Benjamin Michalowicz
, Bharath Ramesh, Mustafa Abduljabbar, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Efficient Personalized and Non-Personalized Alltoall Communication for Modern Multi-HCA GPU-Based Clusters. HIPC 2022: 100-104 - [c470]Kaushik Kandadi Suresh, Kawthar Shafie Khorassani, Chen-Chun Chen, Bharath Ramesh, Mustafa Abduljabbar
, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
Network Assisted Non-Contiguous Transfers for GPU-Aware MPI Libraries. HOTI 2022: 13-20 - [c469]Tu Tran
, Benjamin Michalowicz
, Bharath Ramesh, Hari Subramoni, Aamir Shafi, Dhabaleswar K. Panda:
Designing Hierarchical Multi-HCA Aware Allgather in MPI. ICPP Workshops 2022: 28:1-28:10 - [c468]Dhabaleswar K. Panda:
Challenges and Opportunities in Designing High-Performance and Scalable Middleware for HPC and AI: Past, Present, and Future. IPDPS 2022: 1 - [c467]Chen-Chun Chen, Kawthar Shafie Khorassani, Quentin G. Anthony, Aamir Shafi
, Hari Subramoni, Dhabaleswar K. Panda:
Highly Efficient Alltoall and Alltoallv Communication Algorithms for GPU Systems. IPDPS Workshops 2022: 24-33 - [c466]Shulei Xu, Aamir Shafi
, Hari Subramoni, Dhabaleswar K. Panda:
Arm meets Cloud: A Case Study of MPI Library Performance on AWS Arm-based HPC Cloud with Elastic Fabric Adapter. IPDPS Workshops 2022: 449-456 - [c465]Kinan Al-Attar, Aamir Shafi
, Hari Subramoni, Dhabaleswar K. Panda:
Towards Java-based HPC using the MVAPICH2 Library: Early Experiences. IPDPS Workshops 2022: 510-519 - [c464]Nawras Alnaasan
, Arpan Jain, Aamir Shafi
, Hari Subramoni, Dhabaleswar K. Panda:
OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems. IPDPS Workshops 2022: 870-879 - [c463]Qinghua Zhou, Pouya Kousha
, Quentin Anthony, Kawthar Shafie Khorassani, Aamir Shafi
, Hari Subramoni
, Dhabaleswar K. Panda:
Accelerating MPI All-to-All Communication with Online Compression on Modern GPU Clusters. ISC 2022: 3-25 - [c462]Pouya Kousha
, Arpan Jain, Ayyappa Kolli, Prasanna Sainath, Hari Subramoni
, Aamir Shafi
, Dhabaleswar K. Panda:
"Hey CAI" - Conversational AI Enabled User Interface for HPC Tools. ISC 2022: 87-108 - [c461]Arpan Jain, Aamir Shafi
, Quentin Anthony, Pouya Kousha
, Hari Subramoni, Dhabaleswar K. Panda:
Hy-Fi: Hybrid Five-Dimensional Parallel DNN Training on High-Performance GPU Clusters. ISC 2022: 109-130 - [c460]Kawthar Shafie Khorassani, Chen-Chun Chen, Bharath Ramesh, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
High Performance MPI over the Slingshot Interconnect: Early Experiences. PEARC 2022: 15:1-15:7 - [e8]Dhabaleswar K. Panda, Michael B. Sullivan:
Supercomputing Frontiers - 7th Asian Conference, SCFA 2022, Singapore, March 1-3, 2022, Proceedings. Lecture Notes in Computer Science 13214, Springer 2022, ISBN 978-3-031-10418-3 [contents] - 2021
- [j60]Dhabaleswar Kumar Panda
, Hari Subramoni
, Ching-Hsiang Chu
, Mohammadreza Bayatpour:
The MVAPICH project: Transforming research into high-performance MPI library for HPC community. J. Comput. Sci. 52: 101208 (2021) - [c459]Kawthar Shafie Khorassani, Ching-Hsiang Chu, Quentin G. Anthony, Hari Subramoni, Dhabaleswar K. Panda:
Adaptive and Hierarchical Large Message All-to-all Communication Algorithms for Large-scale Dense GPU Systems. CCGRID 2021: 113-122 - [c458]Aamir Shafi
, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Efficient MPI-based Communication for GPU-Accelerated Dask Applications. CCGRID 2021: 277-286 - [c457]Bharath Ramesh, Jahanzeb Maqbool Hashmi, Shulei Xu, Aamir Shafi
, Seyedeh Mahdieh Ghazimirsaeed, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. Panda:
Towards Architecture-aware Hierarchical Communication Trees on Modern HPC Systems. HiPC 2021: 272-281 - [c456]Yuntian He, Saket Gurukar, Pouya Kousha
, Hari Subramoni, Dhabaleswar K. Panda, Srinivasan Parthasarathy:
DistMILE: A Distributed Multi-Level Framework for Scalable Graph Embedding. HiPC 2021: 282-291 - [c455]Kaushik Kandadi Suresh, Bharath Ramesh, Chen-Chun Chen, Seyedeh Mahdieh Ghazimirsaeed, Mohammadreza Bayatpour, Aamir Shafi
, Hari Subramoni, Dhabaleswar K. Panda:
Layout-aware Hardware-assisted Designs for Derived Data Types in MPI. HiPC 2021: 302-311 - [c454]Nick Sarkauskas, Mohammadreza Bayatpour, Tu Tran
, Bharath Ramesh, Hari Subramoni, Dhabaleswar K. Panda:
Large-Message Nonblocking MPI_Iallgather and MPI Ibcast Offload via BlueField-2 DPU. HiPC 2021: 388-393 - [c453]Arpan Jain, Nawras Alnaasan
, Aamir Shafi
, Hari Subramoni, Dhabaleswar K. Panda:
Accelerating CPU-based Distributed DNN Training on Modern HPC Clusters using BlueField-2 DPUs. HOTI 2021: 17-24 - [c452]Q. Zhou, C. Chu, N. S. Kumar, Pouya Kousha
, Seyedeh Mahdieh Ghazimirsaeed, Hari Subramoni
, Dhabaleswar K. Panda:
Designing High-Performance MPI Libraries with On-the-fly Compression for Modern GPU Clusters*. IPDPS 2021: 444-453 - [c451]Arpan Jain, Tim Moon, Tom Benson, Hari Subramoni, Sam Adé Jacobs, Dhabaleswar K. Panda, Brian Van Essen:
SUPER: SUb-Graph Parallelism for TransformERs. IPDPS 2021: 629-638 - [c450]Quentin Anthony, Lang Xu, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Scaling Single-Image Super-Resolution Training on Modern HPC Clusters: Early Experiences. IPDPS Workshops 2021: 923-932 - [c449]Mohammadreza Bayatpour, Nick Sarkauskas, Hari Subramoni, Jahanzeb Maqbool Hashmi, Dhabaleswar K. Panda:
BluesMPI: Efficient MPI Non-blocking Alltoall Offloading Designs on Modern BlueField Smart NICs. ISC 2021: 18-37 - [c448]Kawthar Shafie Khorassani, Jahanzeb Maqbool Hashmi, Ching-Hsiang Chu
, Chen-Chun Chen, Hari Subramoni, Dhabaleswar K. Panda:
Designing a ROCm-Aware MPI Library for AMD GPUs: Early Experiences. ISC 2021: 118-136 - [c447]Pouya Kousha
, Kamal Raj Sankarapandian Dayala Ganesh Ram, Mansa Kedia, Hari Subramoni
, Arpan Jain, Aamir Shafi
, Dhabaleswar K. Panda, Trey Dockendorf, Heechang Na, Karen Tomko
:
INAM: Cross-stack Profiling and Analysis of Communication in MPI-based Applications. PEARC 2021: 14:1-14:11 - [i11]Aamir Shafi, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. Panda:
Efficient MPI-based Communication for GPU-Accelerated Dask Applications. CoRR abs/2101.08878 (2021) - [i10]Pouya Kousha, Quentin Anthony, Hari Subramoni, Dhabaleswar K. Panda:
Cross-layer Visualization and Profiling of Network and I/O Communication for HPC Clusters. CoRR abs/2109.08329 (2021) - [i9]Nawras Alnaasan, Arpan Jain, Aamir Shafi, Hari Subramoni, Dhabaleswar K. Panda:
OMB-Py: Python Micro-Benchmarks for Evaluating Performance of MPI Libraries on HPC Systems. CoRR abs/2110.10659 (2021) - 2020
- [j59]Sourav Chakraborty, Ignacio Laguna
, Murali Emani, Kathryn M. Mohror, Dhabaleswar K. Panda, Martin Schulz
, Hari Subramoni:
EReinit: Scalable and efficient fault-tolerance for bulk-synchronous MPI applications. Concurr. Comput. Pract. Exp. 32(3) (2020) - [j58]Jahanzeb Maqbool Hashmi, Ching-Hsiang Chu
, Sourav Chakraborty, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. Panda:
FALCON-X: Zero-copy MPI derived datatype processing on modern CPU and GPU architectures. J. Parallel Distributed Comput. 144: 1-13 (2020) - [j57]Ammar Ahmad Awan, Arpan Jain, Ching-Hsiang Chu
, Hari Subramoni, Dhabaleswar K. Panda:
Communication Profiling and Characterization of Deep-Learning Workloads on Clusters With High-Performance Interconnects. IEEE Micro 40(1): 35-43 (2020) - [c446]Mohammadreza Bayatpour, Seyedeh Mahdieh Ghazimirsaeed, Shulei Xu, Hari Subramoni, Dhabaleswar K. Panda:
Design and Characterization of InfiniBand Hardware Tag Matching in MPI. CCGRID 2020: 101-110 - [c445]Ching-Hsiang Chu
, Kawthar Shafie Khorassani, Qinghua Zhou, Hari Subramoni, Dhabaleswar K. Panda:
Dynamic Kernel Fusion for Bulk Non-contiguous Data Transfer on GPU Clusters. CLUSTER 2020: 130-141 - [c444]Aamir Shafi
, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. Panda:
Blink: Towards Efficient RDMA-based Communication Coroutines for Parallel Python Applications. HiPC 2020: 111-120 - [c443]Ching-Hsiang Chu, Pouya Kousha
, Ammar Ahmad Awan, Kawthar Shafie Khorassani, Hari Subramoni, Dhabaleswar K. D. K. Panda:
NV-group: link-efficient reduction for distributed deep learning on modern dense GPU systems. ICS 2020: 6:1-6:12 - [c442]Jahanzeb Maqbool Hashmi, Shulei Xu, Bharath Ramesh, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Machine-agnostic and Communication-aware Designs for MPI on Emerging Architectures. IPDPS 2020: 32-41 - [c441]Amit Ruhela
, Shulei Xu, Karthik Vadambacheri Manian
, Hari Subramoni, Dhabaleswar K. Panda:
Analyzing and Understanding the Impact of Interconnect Performance on HPC, Big Data, and Deep Learning Applications: A Case Study with InfiniBand EDR and HDR. IPDPS Workshops 2020: 869-878 - [c440]Kaushik Kandadi Suresh, Bharath Ramesh, Seyedeh Mahdieh Ghazimirsaeed, Mohammadreza Bayatpour, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Performance Characterization of Network Mechanisms for Non-Contiguous Data Transfers in MPI. IPDPS Workshops 2020: 896-905 - [c439]Quentin Anthony, Ammar Ahmad Awan, Arpan Jain, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Efficient Training of Semantic Image Segmentation on Summit using Horovod and MVAPICH2-GDR. IPDPS Workshops 2020: 1015-1023 - [c438]Bharath Ramesh, Kaushik Kandadi Suresh, Nick Sarkauskas, Mohammadreza Bayatpour, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. Panda:
Scalable MPI Collectives using SHARP: Large Scale Performance Evaluation on the TACC Frontera System. ExaMPI@SC 2020: 11-20 - [c437]Seyedeh Mahdieh Ghazimirsaeed, Quentin Anthony, Aamir Shafi
, Hari Subramoni, Dhabaleswar K. D. K. Panda:
Accelerating GPU-based Machine Learning in Python using MPI Library: A Case Study with MVAPICH2-GDR. MLHPC/AI4S@SC 2020: 17-28 - [c436]Shulei Xu, Seyedeh Mahdieh Ghazimirsaeed, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. Panda:
MPI Meets Cloud: Case Study with Amazon EC2 and Microsoft Azure. IPDRM@SC 2020: 41-48 - [c435]Arpan Jain, Ammar Ahmad Awan, Asmaa M. Aljuhani, Jahanzeb Maqbool Hashmi, Quentin G. Anthony, Hari Subramoni, Dhabaleswar K. Panda, Raghu Machiraju, Anil Parwani
:
GEMS: GPU-enabled memory-aware model-parallelism system for distributed DNN training. SC 2020: 45 - [c434]Samuel Khuvis
, Karen Tomko
, Jahanzeb Maqbool Hashmi, Dhabaleswar K. Panda:
Exploring Hybrid MPI+Kokkos Tasks Programming Model. PAW-ATM@SC 2020: 66-73 - [c433]Ammar Ahmad Awan, Arpan Jain, Quentin Anthony, Hari Subramoni, Dhabaleswar K. Panda:
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training with TensorFlow. ISC 2020: 83-103 - [c432]Mohammadreza Bayatpour, Jahanzeb Maqbool Hashmi, Sourav Chakraborty, Kaushik Kandadi Suresh, Seyedeh Mahdieh Ghazimirsaeed, Bharath Ramesh, Hari Subramoni, Dhabaleswar K. Panda:
Communication-Aware Hardware-Assisted MPI Overlap Engine. ISC 2020: 517-535 - [c431]Dan Stanzione
, John West
, R. Todd Evans
, Tommy Minyard, Omar Ghattas
, Dhabaleswar K. Panda:
Frontera: The Evolution of Leadership Computing at the National Science Foundation. PEARC 2020: 106-111 - [c430]Pouya Kousha
, Kamal Raj S. D., Hari Subramoni
, Dhabaleswar K. Panda, Heechang Na, Trey Dockendorf, Karen Tomko
:
Accelerated Real-time Network Monitoring and Profiling at Scale using OSU INAM. PEARC 2020: 215-223 - [e7]Dhabaleswar K. Panda:
Supercomputing Frontiers - 6th Asian Conference, SCFA 2020, Singapore, February 24-27, 2020, Proceedings. Lecture Notes in Computer Science 12082, Springer 2020, ISBN 978-3-030-48841-3 [contents] - [i8]Ritu Arora, Xiaosong Li, Bonnie Hurwitz, Daniel Fay, Dhabaleswar K. Panda, Edward F. Valeev, Shaowen Wang, Shirley Moore, Sunita Chandrasekaran, Ting Cao, Holly Bik, Matthew Curry, Tanzima Z. Islam:
Future Directions of the Cyberinfrastructure for Sustained Scientific Innovation (CSSI) Program. CoRR abs/2010.15584 (2020)
2010 – 2019
- 2019
- [j56]Depai Qian, Dhabaleswar K. Panda:
CCF THPC inaugural issue editorial. CCF Trans. High Perform. Comput. 1(1): 1-2 (2019) - [j55]Amit Ruhela
, Hari Subramoni, Sourav Chakraborty, Mohammadreza Bayatpour, Pouya Kousha
, Dhabaleswar K. Panda:
Efficient design for MPI asynchronous progress without dedicated resources. Parallel Comput. 85: 13-26 (2019) - [j54]Ammar Ahmad Awan
, Karthik Vadambacheri Manian
, Ching-Hsiang Chu
, Hari Subramoni, Dhabaleswar K. Panda:
Optimized large-message broadcast for deep learning workloads: MPI, MPI+NCCL, or NCCL2? Parallel Comput. 85: 141-152 (2019) - [j53]Ching-Hsiang Chu
, Xiaoyi Lu
, Ammar Ahmad Awan
, Hari Subramoni
, Bracy Elton
, Dhabaleswar K. Panda:
Exploiting Hardware Multicast and GPUDirect RDMA for Efficient Broadcast. IEEE Trans. Parallel Distributed Syst. 30(3): 575-588 (2019) - [c429]Karthik Vadambacheri Manian
, A. A. Ammar, Amit Ruhela
, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Characterizing CUDA Unified Memory (UM)-Aware MPI Designs on Modern GPU Architectures. GPGPU@ASPLOS 2019: 43-52 - [c428]Jahanzeb Maqbool Hashmi, Sourav Chakraborty, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. Panda:
Design and Characterization of Shared Address Space MPI Collectives on Modern Architectures. CCGRID 2019: 410-419 - [c427]Ammar Ahmad Awan, Jeroen Bédorf, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation. CCGRID 2019: 498-507 - [c426]Arpan Jain, Ammar Ahmad Awan, Quentin Anthony, Hari Subramoni, Dhabaleswar K. Panda:
Performance Characterization of DNN Training using TensorFlow and PyTorch on Modern Clusters. CLUSTER 2019: 1-11 - [c425]Pouya Kousha
, Bharath Ramesh, Kaushik Kandadi Suresh, Ching-Hsiang Chu, Arpan Jain, Nick Sarkauskas, Hari Subramoni
, Dhabaleswar K. Panda:
Designing a Profiling and Visualization Tool for Scalable and In-depth Analysis of High-Performance GPU Clusters. HiPC 2019: 93-102 - [c424]Dipti Shankar, Xiaoyi Lu, Dhabaleswar K. Panda:
SCOR-KV: SIMD-Aware Client-Centric and Optimistic RDMA-Based Key-Value Store for Emerging CPU Architectures. HiPC 2019: 257-266 - [c423]Ching-Hsiang Chu, Jahanzeb Maqbool Hashmi, Kawthar Shafie Khorassani, Hari Subramoni, Dhabaleswar K. Panda:
High-Performance Adaptive MPI Derived Datatype Communication for Modern Multi-GPU Systems. HiPC 2019: 267-276 - [c422]Sourav Chakraborty, Shulei Xu, Hari Subramoni, Dhabaleswar K. Panda:
Designing Scalable and High-Performance MPI Libraries on Amazon Elastic Fabric Adapter. Hot Interconnects 2019: 40-44 - [c421]Ammar Ahmad Awan, Arpan Jain, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Communication Profiling and Characterization of Deep Learning Workloads on Clusters with High-Performance Interconnects. Hot Interconnects 2019: 49-53 - [c420]Haiyang Shi, Xiaoyi Lu, Dipti Shankar, Dhabaleswar K. Panda:
UMR-EC: A Unified and Multi-Rail Erasure Coding Library for High-Performance Distributed Storage Systems. HPDC 2019: 219-230 - [c419]Dipti Shankar, Xiaoyi Lu, Dhabaleswar K. D. K. Panda:
SimdHT-Bench: Characterizing SIMD-Aware Hash Table Designs on Emerging CPU Architectures. IISWC 2019: 178-188 - [c418]Jie Zhang, Xiaoyi Lu, Ching-Hsiang Chu, Dhabaleswar K. Panda:
C-GDR: High-Performance Container-Aware GPUDirect MPI Communication Schemes on RDMA Networks. IPDPS 2019: 242-251 - [c417]Jahanzeb Maqbool Hashmi, Sourav Chakraborty, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. Panda:
FALCON: Efficient Designs for Zero-Copy MPI Datatype Processing on Emerging Architectures. IPDPS 2019: 355-364 - [c416]Xiaoyi Lu, Jianfeng Zhan, Dhabaleswar K. Panda:
Introduction to HPBDC 2019. IPDPS Workshops 2019: 394 - [c415]Dhabaleswar K. Panda, Ammar Ahmad Awan, Hari Subramoni:
High performance distributed deep learning: a beginner's guide. PPoPP 2019: 452-454 - [c414]Amit Ruhela
, Bharath Ramesh, Sourav Chakraborty, Hari Subramoni, Jahanzeb Maqbool Hashmi, Dhabaleswar K. Panda:
Leveraging Network-level parallelism with Multiple Process-Endpoints for MPI Broadcast. IPDRM@SC 2019: 34-41 - [c413]Shulei Xu, Jahanzeb Maqbool Hashmi, Sourav Chakraborty, Hari Subramoni, Dhabaleswar K. Panda:
Design and Evaluation of Shared Memory CommunicationBenchmarks on Emerging Architectures using MVAPICH2. IPDRM@SC 2019: 42-49 - [c412]Arpan Jain, Ammar Ahmad Awan, Hari Subramoni, Dhabaleswar K. Panda:
Scaling TensorFlow, PyTorch, and MXNet using MVAPICH2 for High-Performance Deep Learning on Frontera. DLS@SC 2019: 76-83 - [c411]Kawthar Shafie Khorassani, Ching-Hsiang Chu
, Hari Subramoni, Dhabaleswar K. Panda:
Performance Evaluation of MPI Libraries on GPU-Enabled OpenPOWER Architectures: Early Experiences. ISC Workshops 2019: 361-378 - [i7]Ammar Ahmad Awan, Arpan Jain, Quentin Anthony, Hari Subramoni, Dhabaleswar K. Panda:
HyPar-Flow: Exploiting MPI and Keras for Scalable Hybrid-Parallel DNN Training using TensorFlow. CoRR abs/1911.05146 (2019) - 2018
- [j52]Md. Wasi-ur-Rahman, Nusrat Sharmin Islam, Xiaoyi Lu, Dipti Shankar, Dhabaleswar K. Panda:
MR-Advisor: A comprehensive tuning, profiling, and prediction tool for MapReduce execution frameworks on HPC clusters. J. Parallel Distributed Comput. 120: 237-250 (2018) - [j51]Dhabaleswar K. Panda, Xiaoyi Lu
, Hari Subramoni:
Networking and communication challenges for post-exascale systems. Frontiers Inf. Technol. Electron. Eng. 19(10): 1230-1235 (2018) - [j50]Srinivasan Ramesh, Aurèle Mahéo, Sameer Shende, Allen D. Malony, Hari Subramoni, Amit Ruhela
, Dhabaleswar K. Panda:
MPI performance engineering with the MPI tool interface: The integration of MVAPICH and TAU. Parallel Comput. 77: 19-37 (2018) - [j49]Xiaoyi Lu
, Haiyang Shi, Rajarshi Biswas, M. Haseeb Javed
, Dhabaleswar K. Panda:
DLoBD: A Comprehensive Study of Deep Learning over Big Data Stacks on HPC Clusters. IEEE Trans. Multi Scale Comput. Syst. 4(4): 635-648 (2018) - [c410]Haiyang Shi, Xiaoyi Lu, Dhabaleswar K. Panda:
EC-Bench: Benchmarking Onload and Offload Erasure Coders on Modern Hardware Architectures. Bench 2018: 215-230 - [c409]Xiaoyi Lu, Dipti Shankar, Haiyang Shi, Dhabaleswar K. Panda:
Spark-uDAPL: Cost-Saving Big Data Analytics on Microsoft Azure Cloud with RDMA Networks*. IEEE BigData 2018: 321-326 - [c408]Haiyang Shi, Xiaoyi Lu, Dipti Shankar, Dhabaleswar K. Panda:
High-Performance Multi-Rail Erasure Coding Library over Modern Data Center Architectures: Early Experiences. SoCC 2018: 530-531 - [c407]Mohammadreza Bayatpour, Jahanzeb Maqbool Hashmi, Sourav Chakraborty, Hari Subramoni, Pouya Kousha
, Dhabaleswar K. Panda:
SALaR: Scalable and Adaptive Designs for Large Message Reduction Collectives. CLUSTER 2018: 12-23 - [c406]M. Haseeb Javed, Xiaoyi Lu, Dhabaleswar K. Panda:
Cutting the Tail: Designing High Performance Message Brokers to Reduce Tail Latencies in Stream Processing. CLUSTER 2018: 223-233 - [c405]Rajarshi Biswas
, Xiaoyi Lu, Dhabaleswar K. Panda:
Accelerating TensorFlow with Adaptive RDMA-Based gRPC. HiPC 2018: 2-11 - [c404]Ammar Ahmad Awan, Ching-Hsiang Chu, Hari Subramoni, Xiaoyi Lu, Dhabaleswar K. Panda:
OC-DNN: Exploiting Advanced Unified Memory Capabilities in CUDA 9 and Volta GPUs for Out-of-Core DNN Training. HiPC 2018: 143-152 - [c403]Xiaoyi Lu, Jianfeng Zhan, Dhabaleswar K. Panda:
Introduction to HPBDC 2018. IPDPS Workshops 2018: 446 - [c402]Jahanzeb Maqbool Hashmi, Sourav Chakraborty, Mohammadreza Bayatpour, Hari Subramoni, Dhabaleswar K. Panda:
Designing Efficient Shared Address Space Reduction Collectives for Multi-/Many-cores. IPDPS 2018: 1020-1029 - [c401]Ammar Ahmad Awan, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL? EuroMPI 2018: 2:1-2:9 - [c400]Mingzhe Li, Xiaoyi Lu, Hari Subramoni, Dhabaleswar K. Panda:
Multi-Threading and Lock-Free MPI RMA Based Graph Processing on KNL and POWER Architectures. EuroMPI 2018: 4:1-4:10 - [c399]Amit Ruhela
, Hari Subramoni
, Sourav Chakraborty, Mohammadreza Bayatpour, Pouya Kousha
, Dhabaleswar K. Panda:
Efficient Asynchronous Communication Progress for MPI without Dedicated Resources. EuroMPI 2018: 14:1-14:11 - [c398]Sourav Chakraborty, Mohammadreza Bayatpour, Jahanzeb Maqbool Hashmi, Hari Subramoni, Dhabaleswar K. Panda:
Cooperative rendezvous protocols for improved performance and overlap. SC 2018: 28:1-28:13 - [c397]Shashank Gugnani, Xiaoyi Lu, Dhabaleswar K. Panda:
Analyzing, Modeling, and Provisioning QoS for NVMe SSDs. UCC 2018: 247-256 - [e6]Esam El-Araby, Dhabaleswar K. Panda, Sandra Gesing, Amy W. Apon, Volodymyr V. Kindratenko, Massimo Cafaro, Alfredo Cuzzocrea:
18th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing, CCGRID 2018, Washington, DC, USA, May 1-4, 2018. IEEE Computer Society 2018, ISBN 978-1-5386-5815-4 [contents] - [i6]Rajarshi Biswas, Xiaoyi Lu, Dhabaleswar K. Panda:
Designing a Micro-Benchmark Suite to Evaluate gRPC for TensorFlow: Early Experiences. CoRR abs/1804.01138 (2018) - [i5]Ammar Ahmad Awan, Jeroen Bédorf, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Scalable Distributed DNN Training using TensorFlow and CUDA-Aware MPI: Characterization, Designs, and Performance Evaluation. CoRR abs/1810.11112 (2018) - 2017
- [j48]Xiaoyi Lu, Dipti Shankar, Dhabaleswar K. Panda:
Scalable and Distributed Key-Value Store-based Data Management Using RDMA-Memcached. IEEE Data Eng. Bull. 40(1): 50-61 (2017) - [j47]Md. Wasi-ur-Rahman, Nusrat Sharmin Islam, Xiaoyi Lu, Dhabaleswar K. Panda:
A Comprehensive Study of MapReduce Over Lustre for Intermediate Data Placement and Shuffle Strategies on HPC Clusters. IEEE Trans. Parallel Distributed Syst. 28(3): 633-646 (2017) - [c396]M. Haseeb Javed, Xiaoyi Lu, Dhabaleswar K. Panda:
Characterization of Big Data Stream Processing Pipeline: A Case Study using Flink and Kafka. BDCAT 2017: 1-10 - [c395]Shashank Gugnani, Xiaoyi Lu, Houliang Qi, Li Zha, Dhabaleswar K. Panda:
Characterizing and accelerating indexing techniques on distributed ordered tables. IEEE BigData 2017: 173-182 - [c394]Xiaoyi Lu, Haiyang Shi, Dipti Shankar, Dhabaleswar K. Panda:
Performance characterization and acceleration of big data workloads on OpenPOWER system. IEEE BigData 2017: 213-222 - [c393]Md. Wasi-ur-Rahman, Nusrat Sharmin Islam, Xiaoyi Lu, Dhabaleswar K. Panda:
NVMD: Non-volatile memory assisted design for accelerating MapReduce and DAG execution frameworks on HPC systems. IEEE BigData 2017: 369-374 - [c392]Shashank Gugnani, Xiaoyi Lu, Dhabaleswar K. Panda:
Swift-X: Accelerating OpenStack Swift with RDMA for Building an Efficient HPC Cloud. CCGrid 2017: 238-247 - [c391]Sourav Chakraborty, Hari Subramoni, Dhabaleswar K. Panda:
Contention-Aware Kernel-Assisted MPI Collectives for Multi-/Many-Core Systems. CLUSTER 2017: 13-24 - [c390]Hari Subramoni, Xiaoyi Lu, Dhabaleswar K. Panda:
A Scalable Network-Based Performance Analysis Tool for MPI on Large-Scale HPC Systems. CLUSTER 2017: 354-358 - [c389]Mingzhe Li, Xiaoyi Lu, Hari Subramoni, Dhabaleswar K. Panda:
Designing Registration Caching Free High-Performance MPI Library with Implicit On-Demand Paging (ODP) of InfiniBand. HiPC 2017: 62-71 - [c388]Jahanzeb Maqbool Hashmi, Khaled Hamidouche, Hari Subramoni, Dhabaleswar K. Panda:
Kernel-Assisted Communication Engine for MPI on Emerging Manycore Processors. HiPC 2017: 84-93 - [c387]Shashank Gugnani, Xiaoyi Lu, Franco Pestilli
, Cesar F. Caiafa, Dhabaleswar K. Panda:
MPI-LiFE: Designing High-Performance Linear Fascicle Evaluation of Brain Connectome with MPI. HiPC 2017: 213-222 - [c386]Xiaoyi Lu, Haiyang Shi, M. Haseeb Javed, Rajarshi Biswas, Dhabaleswar K. Panda:
Characterizing Deep Learning over Big Data (DLoBD) Stacks on RDMA-Capable Networks. Hot Interconnects 2017: 87-94 - [c385]Dipti Shankar, Xiaoyi Lu, Dhabaleswar K. Panda:
High-Performance and Resilient Key-Value Store with Online Erasure Coding for Big Data Workloads. ICDCS 2017: 527-537 - [c384]Akshay Venkatesh, Khaled Hamidouche, Sreeram Potluri, Davide Rossetti, Ching-Hsiang Chu
, Dhabaleswar K. Panda:
MPI-GDS: High Performance MPI Designs with GPUDirect-aSync for CPU-GPU Control Flow Decoupling. ICPP 2017: 151-160 - [c383]Ching-Hsiang Chu
, Xiaoyi Lu, Ammar Ahmad Awan, Hari Subramoni, Jahanzeb Maqbool Hashmi, Bracy Elton, Dhabaleswar K. Panda:
Efficient and Scalable Multi-Source Streaming Broadcast on GPU Clusters for Deep Learning. ICPP 2017: 161-170 - [c382]Jie Zhang, Xiaoyi Lu, Dhabaleswar K. Panda:
High-Performance Virtual Machine Migration Framework for MPI Applications on SR-IOV Enabled InfiniBand Clusters. IPDPS 2017: 143-152 - [c381]Xiaoyi Lu, Jianfeng Zhan, Dhabaleswar K. Panda:
Introduction to HPBDC Workshop. IPDPS Workshops 2017: 1020 - [c380]Jahanzeb Maqbool Hashmi, Mingzhe Li, Hari Subramoni, Dhabaleswar K. Panda:
Exploiting and Evaluating OpenSHMEM on KNL Architecture. OpenSHMEM 2017: 143-158 - [c379]Ammar Ahmad Awan, Khaled Hamidouche, Jahanzeb Maqbool Hashmi, Dhabaleswar K. Panda:
S-Caffe: Co-designing MPI Runtimes and Caffe for Scalable Deep Learning on Modern GPU Clusters. PPoPP 2017: 193-205 - [c378]Srinivasan Ramesh, Aurèle Mahéo, Sameer Shende, Allen D. Malony, Hari Subramoni, Dhabaleswar K. Panda:
MPI performance engineering with the MPI tool interface: the integration of MVAPICH and TAU. EuroMPI/USA 2017: 16:1-16:11 - [c377]Ammar Ahmad Awan, Hari Subramoni, Dhabaleswar K. Panda:
An In-depth Performance Characterization of CPU- and GPU-based DNN Training on Modern Architectures. MLHPC@SC 2017: 8:1-8:8 - [c376]Mohammadreza Bayatpour, Sourav Chakraborty, Hari Subramoni
, Xiaoyi Lu, Dhabaleswar K. Panda:
Scalable reduction collectives with data partitioning-based multi-leader design. SC 2017: 64 - [c375]Hari Subramoni, Sourav Chakraborty, Dhabaleswar K. Panda:
Designing Dynamic and Adaptive MPI Point-to-Point Communication Protocols for Efficient Overlap of Computation and Communication. ISC 2017: 334-354 - [c374]Jie Zhang, Xiaoyi Lu, Dhabaleswar K. Panda:
Is Singularity-based Container Technology Ready for Running MPI Applications on HPC Clouds? UCC 2017: 151-160 - [c373]Dhabaleswar K. Panda, Xiaoyi Lu:
HPC Meets Cloud: Building Efficient Clouds for HPC, Big Data, and Deep Learning Middleware and Applications. UCC 2017: 189-190 - [c372]Jie Zhang, Xiaoyi Lu, Dhabaleswar K. Panda:
Designing Locality and NUMA Aware MPI Runtime for Nested Virtualization based HPC Cloud with SR-IOV Enabled InfiniBand. VEE 2017: 187-200 - [c371]Dan Stanzione
, Bill Barth, Niall Gaffney
, Kelly P. Gaither, Chris Hempel, Tommy Minyard, Susan Mehringer
, Eric A. Wernert, H. Tufo, Dhabaleswar K. Panda, Patricia J. Teller:
Stampede 2: The Evolution of an XSEDE Supercomputer. PEARC 2017: 15:1-15:8 - [p1]Xiaoyi Lu, Jie Zhang, Dhabaleswar K. Panda:
Building Efficient HPC Cloud with SR-IOV-Enabled InfiniBand: The MVAPICH2 Approach. Research Advances in Cloud Computing 2017: 115-140 - [i4]Ammar Ahmad Awan, Ching-Hsiang Chu, Hari Subramoni, Dhabaleswar K. Panda:
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL? CoRR abs/1707.09414 (2017) - 2016
- [j46]Khaled Hamidouche
, Akshay Venkatesh, Ammar Ahmad Awan, Hari Subramoni, Ching-Hsiang Chu
, Dhabaleswar K. Panda:
CUDA-Aware OpenSHMEM: Extensions and Designs for High Performance OpenSHMEM on GPU Clusters. Parallel Comput. 58: 27-36 (2016) - [j45]Dipti Shankar, Xiaoyi Lu, Md. Wasi-ur-Rahman, Nusrat S. Islam, Dhabaleswar K. Panda:
Characterizing and benchmarking stand-alone Hadoop MapReduce on modern HPC clusters. J. Supercomput. 72(12): 4573-4600 (2016) - [c370]Shashank Gugnani, Xiaoyi Lu, Dhabaleswar K. Panda:
Performance characterization of hadoop workloads on SR-IOV-enabled virtualized InfiniBand clusters. BDCAT 2016: 36-45 - [c369]Nusrat Sharmin Islam, Md. Wasi-ur-Rahman, Xiaoyi Lu, Dhabaleswar K. Panda:
Efficient data access strategies for Hadoop and Spark on HPC cluster with heterogeneous storage. IEEE BigData 2016: 223-232 - [c368]Xiaoyi Lu, Dipti Shankar, Shashank Gugnani, Dhabaleswar K. Panda:
High-performance design of apache spark with RDMA and its benefits on various workloads. IEEE BigData 2016: 253-262 - [c367]Dipti Shankar, Xiaoyi Lu, Dhabaleswar K. Panda:
Boldio: A hybrid and resilient burst-buffer over lustre for accelerating big data I/O. IEEE BigData 2016: 404-409 - [c366]Sourav Chakraborty, Hari Subramoni, Jonathan L. Perkins, Dhabaleswar K. Panda:
SHMEMPMI - Shared Memory Based PMI for Improved Performance and Scalability. CCGrid 2016: 60-69 - [c365]Ching-Hsiang Chu, Khaled Hamidouche, Akshay Venkatesh, Ammar Ahmad Awan, Dhabaleswar K. Panda:
CUDA Kernel Based Collective Reduction Operations on Large-scale GPU Clusters. CCGrid 2016: 726-735 - [c364]Dip Sankar Banerjee
, Khaled Hamidouche, Dhabaleswar K. Panda:
Re-Designing CNTK Deep Learning Framework on Modern GPU Enabled Clusters. CloudCom 2016: 144-151 - [c363]Shashank Gugnani, Xiaoyi Lu, Dhabaleswar K. Panda:
Designing Virtualization-Aware and Automatic Topology Detection Schemes for Accelerating Hadoop on SR-IOV-Enabled Clouds. CloudCom 2016: 152-159 - [c362]Xiaoyi Lu, Dipti Shankar, Shashank Gugnani, Hari Subramoni, Dhabaleswar K. Panda:
Impact of HPC Cloud Networking Technologies on Accelerating Hadoop RPC and HBase. CloudCom 2016: 310-317 - [c361]Mohammadreza Bayatpour, Hari Subramoni, Sourav Chakraborty, Dhabaleswar K. Panda:
Adaptive and Dynamic Design for MPI Tag Matching. CLUSTER 2016: 1-10 - [c360]Jie Zhang, Xiaoyi Lu, Sourav Chakraborty, Dhabaleswar K. Panda:
Slurm-V: Extending Slurm for Building Efficient HPC Cloud with SR-IOV and IVShmem. Euro-Par 2016: 349-362 - [c359]Mingzhe Li, Xiaoyi Lu, Khaled Hamidouche, Jie Zhang, Dhabaleswar K. Panda:
Mizan-RMA: Accelerating Mizan Graph Processing Framework with MPI RMA. HiPC 2016: 42-51 - [c358]Khaled Hamidouche, Ammar Ahmad Awan, Akshay Venkatesh, Dhabaleswar K. Panda:
CUDA M3: Designing Efficient CUDA Managed Memory-Aware MPI by Exploiting GDR and IPC. HiPC 2016: 52-61 - [c357]Jahanzeb Maqbool Hashmi, Khaled Hamidouche, Dhabaleswar K. Panda:
Enabling Performance Efficient Runtime Support for Hybrid MPI+UPC++ Programming Models. HPCC/SmartCity/DSS 2016: 1180-1187 - [c356]Jiajun Cao, Kapil Arya
, Rohan Garg, L. Shawn Matott, Dhabaleswar K. Panda, Hari Subramoni, Jérôme Vienne, Gene Cooperman:
System-Level Scalable Checkpoint-Restart for Petascale Computing. ICPADS 2016: 932-941 - [c355]Jie Zhang, Xiaoyi Lu, Dhabaleswar K. Panda:
High Performance MPI Library for Container-Based HPC Cloud on InfiniBand Clusters. ICPP 2016: 268-277 - [c354]Nusrat Sharmin Islam, Md. Wasi-ur-Rahman, Xiaoyi Lu, Dhabaleswar K. Panda:
High Performance Design for HDFS with Byte-Addressability of NVM and RDMA. ICS 2016: 8:1-8:14 - [c353]Dipti Shankar, Xiaoyi Lu, Nusrat S. Islam, Md. Wasi-ur-Rahman, Dhabaleswar K. Panda:
High-Performance Hybrid Key-Value Store on Modern Clusters with RDMA Interconnects and SSDs: Non-blocking Extensions, Designs, and Benefits. IPDPS 2016: 393-402 - [c352]Ching-Hsiang Chu, Khaled Hamidouche, Akshay Venkatesh, Dip Sankar Banerjee
, Hari Subramoni, Dhabaleswar K. Panda:
Exploiting Maximal Overlap for Non-Contiguous Data Movement Processing on Modern GPU-Enabled Systems. IPDPS 2016: 983-992 - [c351]Dhabaleswar K. Panda, Jianfeng Zhan, Xiaoyi Lu:
HPBDC Introduction and Committees. IPDPS Workshops 2016: 1596 - [c350]Jie Zhang, Xiaoyi Lu, Dhabaleswar K. Panda:
Performance Characterization of Hypervisor-and Container-Based Virtualization for HPC on SR-IOV Enabled InfiniBand Clusters. IPDPS Workshops 2016: 1777-1784 - [c349]Dip Sankar Banerjee
, Khaled Hamidouche, Dhabaleswar K. Panda:
Designing high performance communication runtime for GPU managed memory: early experiences. GPGPU@PPoPP 2016: 82-91 - [c348]A. A. Awan, Khaled Hamidouche, Akshay Venkatesh, Dhabaleswar K. Panda:
Efficient Large Message Broadcast using NCCL and CUDA-Aware MPI for Deep Learning. EuroMPI 2016: 15-22 - [c347]Ching-Hsiang Chu
, Khaled Hamidouche, Hari Subramoni, Akshay Venkatesh, Bracy Elton, Dhabaleswar K. Panda:
Designing High Performance Heterogeneous Broadcast for Streaming Applications on GPU Clusters. SBAC-PAD 2016: 59-66 - [c346]Md. Wasi-ur-Rahman, Nusrat Sharmin Islam, Xiaoyi Lu, Dipti Shankar, Dhabaleswar K. Panda:
MR-Advisor: A Comprehensive Tuning Tool for Advising HPC Users to Accelerate MapReduce Applications on Supercomputers. SBAC-PAD 2016: 198-205 - [c345]Khaled Hamidouche, Jie Zhang, Dhabaleswar K. Panda, Karen Tomko
:
OpenSHMEM Non-blocking Data Movement Operations with MVAPICH2-X: Early Experiences. PAW@SC 2016: 9-16 - [c344]Md. Wasi-ur-Rahman, Nusrat Sharmin Islam, Xiaoyi Lu, Dhabaleswar K. Panda:
Can Non-volatile Memory Benefit MapReduce Applications on HPC Clusters? PDSW-DISCS@SC 2016: 19-24 - [c343]Ching-Hsiang Chu, Khaled Hamidouche, Hari Subramoni, Akshay Venkatesh, Bracy Elton, Dhabaleswar K. Panda:
Efficient Reliability Support for Hardware Multicast-Based Broadcast in GPU-enabled Streaming Applications. COMHPC@SC 2016: 29-38 - [c342]Mingzhe Li, Khaled Hamidouche, Xiaoyi Lu, Hari Subramoni, Jie Zhang, Dhabaleswar K. Panda:
Designing MPI library with on-demand paging (ODP) of infiniband: challenges and benefits. SC 2016: 433-443 - [c341]Hari Subramoni, Albert Mathews Augustine, Mark Daniel Arnold, Jonathan L. Perkins, Xiaoyi Lu, Khaled Hamidouche, Dhabaleswar K. Panda:
INAM2: InfiniBand Network Analysis and Monitoring with MPI. ISC 2016: 300-320 - [c340]Mahidhar Tatineni, Xiaoyi Lu, Dong Ju Choi, Amitava Majumdar
, Dhabaleswar K. Panda:
Experiences and Benefits of Running RDMA Hadoop and Spark on SDSC Comet. XSEDE 2016: 23:1-23:5 - [i3]Jiajun Cao, Kapil Arya, Rohan Garg, L. Shawn Matott, Dhabaleswar K. Panda, Hari Subramoni, Jérôme Vienne, Gene Cooperman:
System-level Scalable Checkpoint-Restart for Petascale Computing. CoRR abs/1607.07995 (2016) - 2015
- [c339]Nusrat Sharmin Islam, Md. Wasi-ur-Rahman, Xiaoyi Lu, Dipti Shankar, Dhabaleswar K. Panda:
Performance characterization and acceleration of in-memory file systems for Hadoop and Spark applications on HPC clusters. IEEE BigData 2015: 243-252 - [c338]Dipti Shankar, Xiaoyi Lu, Md. Wasi-ur-Rahman, Nusrat S. Islam, Dhabaleswar K. Panda:
Benchmarking key-value stores on high-performance storage and interconnects for web-scale workloads. IEEE BigData 2015: 539-544 - [c337]Adithya Bhat, Nusrat Sharmin Islam, Xiaoyi Lu, Md. Wasi-ur-Rahman, Dipti Shankar, Dhabaleswar K. Panda:
A Plugin-Based Approach to Exploit RDMA Benefits for Apache and Enterprise HDFS. BPOE 2015: 119-132 - [c336]Jie Zhang, Xiaoyi Lu, Mark Daniel Arnold, Dhabaleswar K. Panda:
MVAPICH2 over OpenStack with SR-IOV: An Efficient Approach to Build HPC Clouds. CCGRID 2015: 71-80 - [c335]Nusrat Sharmin Islam, Xiaoyi Lu, Md. Wasi-ur-Rahman, Dipti Shankar, Dhabaleswar K. Panda:
Triple-H: A Hybrid Approach to Accelerate HDFS on HPC Clusters with Heterogeneous Storage Architecture. CCGRID 2015: 101-110 - [c334]Sourav Chakraborty, Hari Subramoni, Adam Moody, Akshay Venkatesh, Jonathan L. Perkins, Dhabaleswar K. Panda:
Non-Blocking PMI Extensions for Fast MPI Startup. CCGRID 2015: 131-140 - [c333]Raghunath Raja Chandrasekar, Akshay Venkatesh, Khaled Hamidouche, Dhabaleswar K. Panda:
Power-Check: An Energy-Efficient Checkpointing Framework for HPC Clusters. CCGRID 2015: 261-270 - [c332]Khaled Hamidouche, Akshay Venkatesh, Ammar Ahmad Awan, Hari Subramoni, Ching-Hsiang Chu
, Dhabaleswar K. Panda:
Exploiting GPUDirect RDMA in Designing High Performance OpenSHMEM for NVIDIA GPU Clusters. CLUSTER 2015: 78-87 - [c331]Mingzhe Li, Hari Subramoni, Khaled Hamidouche, Xiaoyi Lu, Dhabaleswar K. Panda:
High Performance MPI Datatype Support with User-Mode Memory Registration: Challenges, Designs, and Benefits. CLUSTER 2015: 226-235 - [c330]Mingzhe Li, Khaled Hamidouche, Xiaoyi Lu, Jian Lin
, Dhabaleswar K. Panda:
High-Performance and Scalable Design of MPI-3 RMA on Xeon Phi Clusters. Euro-Par 2015: 625-637 - [c329]Akshay Venkatesh, Khaled Hamidouche, Hari Subramoni, Dhabaleswar K. Panda:
Offloaded GPU Collectives Using CORE-Direct and CUDA Capabilities on InfiniBand Clusters. HiPC 2015: 234-243 - [c328]Mingzhe Li, Khaled Hamidouche, Xiaoyi Lu, Jie Zhang, Jian Lin
, Dhabaleswar K. Panda:
High Performance OpenSHMEM Strided Communication Support with InfiniBand UMR. HiPC 2015: 244-253 - [c327]Hari Subramoni, Akshay Venkatesh, Khaled Hamidouche, Karen Tomko
, Dhabaleswar K. Panda:
Impact of InfiniBand DC Transport Protocol on Energy Consumption of All-to-All Collective Algorithms. Hot Interconnects 2015: 60-67 - [c326]Nusrat Sharmin Islam, Dipti Shankar, Xiaoyi Lu, Md. Wasi-ur-Rahman, Dhabaleswar K. Panda:
Accelerating I/O Performance of Big Data Analytics on HPC Clusters through RDMA-Based Key-Value Store. ICPP 2015: 280-289 - [c325]Jian Lin
, Khaled Hamidouche, Xiaoyi Lu, Mingzhe Li, Dhabaleswar K. Panda:
High-Performance Coarray Fortran Support with MVAPICH2-X: Initial Experience and Evaluation. IPDPS Workshops 2015: 225-234 - [c324]Sourav Chakraborty, Hari Subramoni, Jonathan L. Perkins, Ammar Ahmad Awan, Dhabaleswar K. Panda:
On-demand Connection Management for OpenSHMEM and OpenSHMEM+MPI. IPDPS Workshops 2015: 235-244 - [c323]Md. Wasi-ur-Rahman, Xiaoyi Lu, Nusrat Sharmin Islam, Raghunath Rajachandrasekar, Dhabaleswar K. Panda:
High-Performance Design of YARN MapReduce on Modern HPC Clusters with Lustre and RDMA. IPDPS 2015: 291-300 - [c322]Dipti Shankar, Xiaoyi Lu, Jithin Jose, Md. Wasi-ur-Rahman, Nusrat S. Islam, Dhabaleswar K. Panda:
Can RDMA benefit online data processing workloads on memcached and MySQL? ISPASS 2015: 159-160 - [c321]A. A. Awan, Khaled Hamidouche, Ching-Hsiang Chu
, Dhabaleswar K. Panda:
A Case for Non-blocking Collectives in OpenSHMEM: Design, Implementation, and Performance Evaluation using MVAPICH2-X. OpenSHMEM 2015: 69-86 - [c320]Antonio Gómez-Iglesias
, Jérôme Vienne, Khaled Hamidouche, Christopher S. Simmons, William L. Barth, Dhabaleswar K. Panda:
Scalable Out-of-core OpenSHMEM Library for HPC. OpenSHMEM 2015: 138-153 - [c319]Jian Lin
, Khaled Hamidouche, Jie Zhang, Xiaoyi Lu, Abhinav Vishnu, Dhabaleswar K. Panda:
Accelerating k-NN Algorithm with Hybrid MPI and OpenSHMEM. OpenSHMEM 2015: 164-177 - [c318]A. A. Awan, Khaled Hamidouche, Akshay Venkatesh, Jonathan L. Perkins, Hari Subramoni, Dhabaleswar K. Panda:
GPU-Aware Design, Implementation, and Evaluation of Non-blocking Collective Benchmarks. EuroMPI 2015: 9:1-9:10 - [c317]Akshay Venkatesh, Abhinav Vishnu, Khaled Hamidouche, Nathan R. Tallent
, Dhabaleswar K. Panda, Darren J. Kerbyson, Adolfy Hoisie
:
A case for application-oblivious energy-efficient MPI runtime. SC 2015: 29:1-29:12 - [c316]Hari Subramoni, Ammar Ahmad Awan, Khaled Hamidouche, Dmitry Pekurovsky
, Akshay Venkatesh, Sourav Chakraborty, Karen Tomko
, Dhabaleswar K. Panda:
Designing Non-blocking Personalized Collectives with Near Perfect Overlap for RDMA-Enabled Clusters. ISC 2015: 434-453 - [c315]Dhabaleswar K. Panda:
Accelerating Big Data Processing on Modern Clusters. PABS@ICPE 2015: 1 - [e5]Dhabaleswar K. Panda, Karl W. Schulz, Khaled Hamidouche, Hari Subramoni:
Proceedings of the First International Workshop on Extreme Scale Programming Models and Middleware, ESPM 2015, Austin, Texas, USA, November 15, 2015. ACM 2015, ISBN 978-1-4503-3996-4 [contents] - 2014
- [j44]Hao Wang
, Sreeram Potluri, Devendar Bureddy, Carlos Rosales, Dhabaleswar K. Panda:
GPU-Aware MPI on RDMA-Enabled Clusters: Design, Implementation and Evaluation. IEEE Trans. Parallel Distributed Syst. 25(10): 2595-2605 (2014) - [c314]Nusrat Sharmin Islam, Xiaoyi Lu, Md. Wasi-ur-Rahman, Raghunath Rajachandrasekar, Dhabaleswar K. Panda:
In-memory I/O and replication for HDFS with Memcached: Early experiences. IEEE BigData 2014: 213-218 - [c313]Jithin Jose, Khaled Hamidouche, Xiaoyi Lu, Sreeram Potluri, Jie Zhang, Karen Tomko
, Dhabaleswar K. Panda:
High performance OpenSHMEM for Xeon Phi clusters: Extensions, runtime designs and application co-design. CLUSTER 2014: 10-18 - [c312]Mingzhe Li, Xiaoyi Lu, Sreeram Potluri, Khaled Hamidouche, Jithin Jose, Karen Tomko
, Dhabaleswar K. Panda:
Scalable Graph500 design with MPI-3 RMA. CLUSTER 2014: 230-238 - [c311]Jie Zhang, Xiaoyi Lu, Jithin Jose, Rong Shi, Dhabaleswar K. Panda:
Can Inter-VM Shmem Benefit MPI Applications on SR-IOV Based Virtualized Infiniband Clusters? Euro-Par 2014: 342-353 - [c310]Md. Wasi-ur-Rahman, Xiaoyi Lu, Nusrat Sharmin Islam, Raghunath Rajachandrasekar, Dhabaleswar K. Panda:
MapReduce over Lustre: Can RDMA-Based Approach Benefit? Euro-Par 2014: 644-655 - [c309]Rong Shi, Sreeram Potluri, Khaled Hamidouche, Jonathan L. Perkins, Mingzhe Li, Davide Rossetti, Dhabaleswar K. Panda:
Designing efficient small message transfer mechanism for inter-node MPI communication on InfiniBand GPU clusters. HiPC 2014: 1-10 - [c308]Akshay Venkatesh, Hari Subramoni, Khaled Hamidouche, Dhabaleswar K. Panda:
A high performance broadcast design with hardware multicast and GPUDirect RDMA for streaming applications on Infiniband clusters. HiPC 2014: 1-10 - [c307]Jie Zhang, Xiaoyi Lu, Jithin Jose, Mingzhe Li, Rong Shi, Dhabaleswar K. Panda:
High performance MPI library over SR-IOV enabled infiniband clusters. HiPC 2014: 1-10 - [c306]Xiaoyi Lu, Md. Wasi-ur-Rahman, Nusrat S. Islam, Dipti Shankar, Dhabaleswar K. Panda:
Accelerating Spark with RDMA for Big Data Processing: Early Experiences. Hot Interconnects 2014: 9-16 - [c305]Raghunath Rajachandrasekar, Sreeram Potluri, Akshay Venkatesh, Khaled Hamidouche, Md. Wasi-ur-Rahman, Dhabaleswar K. Panda:
MIC-Check: a distributed check pointing framework for the intel many integrated cores architecture. HPDC 2014: 121-124 - [c304]Nusrat S. Islam, Xiaoyi Lu, Md. Wasi-ur-Rahman, Dhabaleswar K. Panda:
SOR-HDFS: a SEDA-based approach to maximize overlapping in RDMA-enhanced HDFS. HPDC 2014: 261-264 - [c303]Prasad Calyam, Alex Berryman, Erik Saule
, Hari Subramoni, Paul Schopis, Gordon Springer, Ümit V. Çatalyürek
, Dhabaleswar K. Panda:
Wide-area overlay networking to manage science DMZ accelerated flows. ICNC 2014: 269-275 - [c302]Dhabaleswar K. Panda, Jang-Ping Sheu:
Message from the general co-chairs IEEE ICPADS 2014. ICPADS 2014: xv - [c301]Md. Wasi-ur-Rahman, Xiaoyi Lu, Nusrat Sharmin Islam, Dhabaleswar K. Panda:
Performance Modeling for RDMA-Enhanced Hadoop MapReduce. ICPP 2014: 50-59 - [c300]Rong Shi, Xiaoyi Lu, Sreeram Potluri, Khaled Hamidouche, Jie Zhang, Dhabaleswar K. Panda:
HAND: A Hybrid Approach to Accelerate Non-contiguous Data Movement Using MPI Datatypes on GPU Clusters. ICPP 2014: 221-230 - [c299]Hari Subramoni, Krishna Chaitanya Kandalla, Jithin Jose, Karen Tomko
, Karl W. Schulz, Dmitry Pekurovsky
, Dhabaleswar K. Panda:
Designing Topology-Aware Communication Schedules for Alltoall Operations in Large InfiniBand Clusters. ICPP 2014: 231-240 - [c298]Md. Wasi-ur-Rahman, Xiaoyi Lu, Nusrat Sharmin Islam, Dhabaleswar K. Panda:
HOMR: a hybrid approach to exploit maximum overlapping in MapReduce over high performance interconnects. ICS 2014: 33-42 - [c297]Jithin Jose, Khaled Hamidouche, Jie Zhang, Akshay Venkatesh, Dhabaleswar K. Panda:
Optimizing Collective Communication in UPC. IPDPS Workshops 2014: 361-370 - [c296]Akshay Venkatesh, Sreeram Potluri, Raghunath Rajachandrasekar, Miao Luo, Khaled Hamidouche, Dhabaleswar K. Panda:
High Performance Alltoall and Allgather Designs for InfiniBand MIC Clusters. IPDPS 2014: 637-646 - [c295]Jithin Jose, Jie Zhang, Akshay Venkatesh, Sreeram Potluri, Dhabaleswar K. Panda:
A Comprehensive Performance Evaluation of OpenSHMEM Libraries on InfiniBand Clusters. OpenSHMEM 2014: 14-28 - [c294]Jithin Jose, Sreeram Potluri, Hari Subramoni, Xiaoyi Lu, Khaled Hamidouche, Karl W. Schulz, Hari Sundar, Dhabaleswar K. Panda:
Designing Scalable Out-of-core Sorting with Hybrid MPI+PGAS Programming Models. PGAS 2014: 7:1-7:9 - [c293]Mingzhe Li, Jian Lin
, Xiaoyi Lu, Khaled Hamidouche, Karen Tomko
, Dhabaleswar K. Panda:
Scalable MiniMD Design with Hybrid MPI and OpenSHMEM. PGAS 2014: 24:1-24:4 - [c292]Miao Luo, Xiaoyi Lu, Khaled Hamidouche, Krishna Chaitanya Kandalla, Dhabaleswar K. Panda:
Initial study of multi-endpoint runtime for MPI+OpenMP hybrid programming model on multi-core systems. PPoPP 2014: 395-396 - [c291]Sourav Chakraborty, Hari Subramoni, Jonathan L. Perkins, Adam Moody, Mark Daniel Arnold, Dhabaleswar K. Panda:
PMI Extensions for Scalable MPI Startup. EuroMPI/ASIA 2014: 21 - [c290]Raghunath Rajachandrasekar, Jonathan L. Perkins, Khaled Hamidouche, Mark Daniel Arnold, Dhabaleswar K. Panda:
Understanding the Memory-Utilization of MPI Libraries: Challenges and Designs in Implementing the MPI_T Interface. EuroMPI/ASIA 2014: 97 - [c289]Hari Subramoni, Khaled Hamidouche, Akshay Venkatesh, Sourav Chakraborty, Dhabaleswar K. Panda:
Designing MPI Library with Dynamic Connected Transport (DCT) of InfiniBand: Early Experiences. ISC 2014: 278-295 - [c288]Dipti Shankar, Xiaoyi Lu, Md. Wasi-ur-Rahman, Nusrat S. Islam, Dhabaleswar K. Panda:
A Micro-benchmark Suite for Evaluating Hadoop MapReduce on High-Performance Networks. BPOE@ASPLOS/VLDB 2014: 19-33 - 2013
- [j43]Miao Luo, Hao Wang, Jérôme Vienne, Dhabaleswar K. Panda:
Redesigning MPI shared memory communication for large multi-core architecture. Comput. Sci. Res. Dev. 28(2-3): 137-146 (2013) - [c287]Sreeram Potluri, Akshay Venkatesh, Devendar Bureddy, Krishna Chaitanya Kandalla, Dhabaleswar K. Panda:
Efficient Intra-node Communication on Intel-MIC Clusters. CCGRID 2013: 128-135 - [c286]Jithin Jose, Mingzhe Li, Xiaoyi Lu, Krishna Chaitanya Kandalla, Mark Daniel Arnold, Dhabaleswar K. Panda:
SR-IOV Support for Virtualization on InfiniBand Clusters: Early Experience. CCGRID 2013: 385-392 - [c285]Md. Wasi-ur-Rahman, Xiaoyi Lu, Nusrat S. Islam, Dhabaleswar K. Panda:
Does RDMA-based enhanced Hadoop MapReduce need a new performance model? SoCC 2013: 45:1-45:2 - [c284]Rong Shi, Sreeram Potluri, Khaled Hamidouche, Xiaoyi Lu, Karen Tomko
, Dhabaleswar K. Panda:
A scalable and portable approach to accelerate hybrid HPL on heterogeneous CPU-GPU clusters. CLUSTER 2013: 1-8 - [c283]Hari Subramoni, Devendar Bureddy, Krishna Chaitanya Kandalla, Karl W. Schulz, Bill Barth, Jonathan L. Perkins, Mark Daniel Arnold, Dhabaleswar K. Panda:
Design of network topology aware scheduling services for large InfiniBand clusters. CLUSTER 2013: 1-8 - [c282]Dhabaleswar K. Panda, Xiaoyi Lu:
Tutorials. Hot Interconnects 2013 - [c281]Krishna Chaitanya Kandalla, Akshay Venkatesh, Khaled Hamidouche, Sreeram Potluri, Devendar Bureddy, Dhabaleswar K. Panda:
Designing Optimized MPI Broadcast and Allreduce for Many Integrated Core (MIC) InfiniBand Clusters. Hot Interconnects 2013: 63-70 - [c280]Nusrat S. Islam, Xiaoyi Lu, Md. Wasi-ur-Rahman, Dhabaleswar K. Panda:
Can Parallel Replication Benefit Hadoop Distributed File System for High Performance Interconnects? Hot Interconnects 2013: 75-78 - [c279]Raghunath Rajachandrasekar, Adam Moody, Kathryn M. Mohror
, Dhabaleswar K. Panda:
A 1 PB/s file system to checkpoint three million MPI tasks. HPDC 2013: 143-154 - [c278]Sreeram Potluri, Khaled Hamidouche, Akshay Venkatesh, Devendar Bureddy, Dhabaleswar K. Panda:
Efficient Inter-node MPI Communication Using GPUDirect RDMA for InfiniBand Clusters with NVIDIA GPUs. ICPP 2013: 80-89 - [c277]Krishna Chaitanya Kandalla, Hari Subramoni, Karen Tomko
, Dmitry Pekurovsky
, Dhabaleswar K. Panda:
A Novel Functional Partitioning Approach to Design High-Performance MPI-3 Non-blocking Alltoallv Collective on Multi-core Systems. ICPP 2013: 611-620 - [c276]Xiaoyi Lu, Nusrat S. Islam, Md. Wasi-ur-Rahman, Jithin Jose, Hari Subramoni, Hao Wang, Dhabaleswar K. Panda:
High-Performance Design of Hadoop RPC with RDMA over InfiniBand. ICPP 2013: 641-650 - [c275]Khaled Hamidouche, Sreeram Potluri, Hari Subramoni, Krishna Chaitanya Kandalla, Dhabaleswar K. Panda:
MIC-RO: enabling efficient remote offload on heterogeneous many integrated core (MIC) clusters with InfiniBand. ICS 2013: 399-408 - [c274]Akshay Venkatesh, Krishna Chaitanya Kandalla, Dhabaleswar K. Panda:
Evaluation of Energy Characteristics of MPI Communication Primitives with RAPL. IPDPS Workshops 2013: 938-945 - [c273]Sreeram Potluri, Devendar Bureddy, Hao Wang, Hari Subramoni, Dhabaleswar K. Panda:
Extending OpenSHMEM for GPU Computing. IPDPS 2013: 1001-1012 - [c272]Md. Wasi-ur-Rahman, Nusrat Sharmin Islam, Xiaoyi Lu, Jithin Jose, Hari Subramoni, Hao Wang, Dhabaleswar K. Panda:
High-Performance RDMA-based Design of Hadoop MapReduce over InfiniBand. IPDPS Workshops 2013: 1908-1917 - [c271]Mingzhe Li, Sreeram Potluri, Khaled Hamidouche, Jithin Jose, Dhabaleswar K. Panda:
Efficient and truly passive MPI-3 RMA using InfiniBand atomics. EuroMPI 2013: 91-96 - [c270]Sreeram Potluri, Devendar Bureddy, Khaled Hamidouche, Akshay Venkatesh, Krishna Chaitanya Kandalla, Hari Subramoni, Dhabaleswar K. Panda:
MVAPICH-PRISM: a proxy-based communication framework using InfiniBand and SCIF for intel MIC clusters. SC 2013: 54:1-54:11 - [c269]Jithin Jose, Mohammad Banikazemi, Wendy Belluomini, Chet Murthy, Dhabaleswar K. Panda:
MetaData persistence using storage class memory: experiences with flash-backed DRAM. INFLOW@SOSP 2013: 3:1-3:7 - [c268]Jithin Jose, Sreeram Potluri, Karen Tomko
, Dhabaleswar K. Panda:
Designing Scalable Graph500 Benchmark with Hybrid MPI+OpenSHMEM Programming Models. ISC 2013: 109-124 - [c267]Xiaoyi Lu, Md. Wasi-ur-Rahman, Nusrat Sharmin Islam, Dhabaleswar K. Panda:
A Micro-benchmark Suite for Evaluating Hadoop RPC on High-Performance Networks. WBDB 2013: 32-42 - 2012
- [c266]Jithin Jose, Hari Subramoni, Krishna Chaitanya Kandalla, Md. Wasi-ur-Rahman, Hao Wang, Sundeep Narravula, Dhabaleswar K. Panda:
Scalable Memcached Design for InfiniBand Clusters Using Hybrid Transports. CCGRID 2012: 236-243 - [c265]Krishna Chaitanya Kandalla, Aydin Buluç
, Hari Subramoni, Karen Tomko
, Jérôme Vienne, Leonid Oliker, Dhabaleswar K. Panda:
Can Network-Offload Based Non-blocking Neighborhood MPI Collectives Improve Communication Overheads of Irregular Graph Algorithms? CLUSTER Workshops 2012: 222-230 - [c264]Raghunath Rajachandrasekar, Jai Jaswani, Hari Subramoni, Dhabaleswar K. Panda:
Minimizing Network Contention in InfiniBand Clusters with a QoS-Aware Data-Staging Framework. CLUSTER 2012: 329-336 - [c263]Hari Subramoni, Jérôme Vienne, Dhabaleswar K. Panda:
A Scalable InfiniBand Network Topology-Aware Performance Analysis Tool for MPI. Euro-Par Workshops 2012: 439-450 - [c262]Jérôme Vienne, Jitong Chen
, Md. Wasi-ur-Rahman, Nusrat S. Islam, Hari Subramoni, Dhabaleswar K. Panda:
Performance Analysis and Evaluation of InfiniBand FDR and 40GigE RoCE on HPC and Cloud Computing Systems. Hot Interconnects 2012: 48-55 - [c261]Jithin Jose, Krishna Chaitanya Kandalla, Miao Luo, Dhabaleswar K. Panda:
Supporting Hybrid MPI and OpenSHMEM over InfiniBand: Design and Performance Evaluation. ICPP 2012: 219-228 - [c260]Xiangyong Ouyang, Nusrat S. Islam, Raghunath Rajachandrasekar, Jithin Jose, Miao Luo, Hao Wang, Dhabaleswar K. Panda:
SSD-Assisted Hybrid Memory to Accelerate Memcached over High Performance Networks. ICPP 2012: 470-479 - [c259]Miao Luo, Dhabaleswar K. Panda, Khaled Z. Ibrahim, Costin Iancu:
Congestion avoidance on manycore high performance computing systems. ICS 2012: 121-132 - [c258]Jian Huang, Xiangyong Ouyang, Jithin Jose, Md. Wasi-ur-Rahman, Hao Wang, Miao Luo, Hari Subramoni, Chet Murthy, Dhabaleswar K. Panda:
High-Performance Design of HBase with RDMA over InfiniBand. IPDPS 2012: 774-785 - [c257]Raghunath Rajachandrasekar, Xavier Besseron
, Dhabaleswar K. Panda:
Monitoring and Predicting Hardware Failures in HPC Clusters with FTB-IPMI. IPDPS Workshops 2012: 1136-1143 - [c256]Krishna Chaitanya Kandalla, Ulrike Meier Yang
, Jeff Keasler, Tzanio V. Kolev
, Adam Moody, Hari Subramoni, Karen Tomko
, Jérôme Vienne, Bronis R. de Supinski, Dhabaleswar K. Panda:
Designing Non-blocking Allreduce with Collective Offload on InfiniBand Clusters: A Case Study with Conjugate Gradient Solvers. IPDPS 2012: 1156-1167 - [c255]S. Pai Raikar, Hari Subramoni, Krishna Chaitanya Kandalla, Jérôme Vienne, Dhabaleswar K. Panda:
Designing Network Failover and Recovery in MPI for Multi-Rail InfiniBand Clusters. IPDPS Workshops 2012: 1160-1167 - [c254]Sreeram Potluri, Hao Wang, Devendar Bureddy, Ashish Kumar Singh, Carlos Rosales, Dhabaleswar K. Panda:
Optimizing MPI Communication on Multi-GPU Systems Using CUDA Inter-Process Communication. IPDPS Workshops 2012: 1848-1857 - [c253]Md. Wasi-ur-Rahman, Jian Huang, Jithin Jose, Xiangyong Ouyang, Hao Wang, Nusrat S. Islam, Hari Subramoni, Chet Murthy, Dhabaleswar K. Panda:
Understanding the communication characteristics in HBase: What are the fundamental bottlenecks? ISPASS 2012: 122-123 - [c252]Devendar Bureddy, Hao Wang, Akshay Venkatesh, Sreeram Potluri, Dhabaleswar K. Panda:
OMB-GPU: A Micro-Benchmark Suite for Evaluating MPI Libraries on GPU Clusters. EuroMPI 2012: 110-120 - [c251]Nusrat S. Islam, Md. Wasi-ur-Rahman, Jithin Jose, Raghunath Rajachandrasekar, Hao Wang, Hari Subramoni, Chet Murthy, Dhabaleswar K. Panda:
High performance RDMA-based design of HDFS over InfiniBand. SC 2012: 35 - [c250]Hari Subramoni, Sreeram Potluri, Krishna Chaitanya Kandalla, Bill Barth, Jérôme Vienne, Jeff Keasler, Karen A. Tomko
, Karl W. Schulz, Adam Moody, Dhabaleswar K. Panda:
Design of a scalable InfiniBand topology service to enable network-topology-aware placement of processes. SC 2012: 70 - [c249]Nusrat Sharmin Islam, Xiaoyi Lu, Md. Wasi-ur-Rahman, Jithin Jose, Dhabaleswar K. Panda:
A Micro-benchmark Suite for Evaluating HDFS Operations on Modern Clusters. WBDB 2012: 129-147 - 2011
- [j42]Sayantan Sur, Sreeram Potluri, Krishna Chaitanya Kandalla, Hari Subramoni, Dhabaleswar K. Panda, Karen Tomko
:
Codesign for InfiniBand Clusters. Computer 44(11): 31-36 (2011) - [j41]Krishna Chaitanya Kandalla, Hari Subramoni, Karen A. Tomko
, Dmitry Pekurovsky
, Sayantan Sur, Dhabaleswar K. Panda:
High-performance and scalable non-blocking all-to-all with collective offload on InfiniBand clusters: a study with parallel 3D FFT. Comput. Sci. Res. Dev. 26(3-4): 237-246 (2011) - [j40]Hao Wang
, Sreeram Potluri, Miao Luo, Ashish Kumar Singh, Sayantan Sur, Dhabaleswar K. Panda:
MVAPICH2-GPU: optimized GPU to GPU communication for InfiniBand clusters. Comput. Sci. Res. Dev. 26(3-4): 257-266 (2011) - [c248]Xiangyong Ouyang, Raghunath Rajachandrasekar, Xavier Besseron
, Dhabaleswar K. Panda:
High Performance Pipelined Process Migration with RDMA. CCGRID 2011: 314-323 - [c247]Hao Wang, Sreeram Potluri, Miao Luo, Ashish Kumar Singh, Xiangyong Ouyang, Sayantan Sur, Dhabaleswar K. Panda:
Optimized Non-contiguous MPI Datatype Communication for GPU Clusters: Design, Implementation and Evaluation with MVAPICH2. CLUSTER 2011: 308-316 - [c246]Hari Subramoni, Krishna Chaitanya Kandalla, Jérôme Vienne, Sayantan Sur, Bill Barth, Karen A. Tomko
, Robert T. McLay
, Karl W. Schulz, Dhabaleswar K. Panda:
Design and Evaluation of Network Topology-/Speed- Aware Broadcast Algorithms for InfiniBand Clusters. CLUSTER 2011: 317-325 - [c245]Ashish Kumar Singh, Sreeram Potluri, Hao Wang, Krishna Chaitanya Kandalla, Sayantan Sur, Dhabaleswar K. Panda:
MPI Alltoall Personalized Exchange on GPGPU Clusters: Design Alternatives and Benefit. CLUSTER 2011: 420-427 - [c244]Vilobh Meshram, Xavier Besseron
, Xiangyong Ouyang, Raghunath Rajachandrasekar, Ravi Prakash, Dhabaleswar K. Panda:
Can a Decentralized Metadata Service Layer Benefit Parallel Filesystems? CLUSTER 2011: 484-493 - [c243]N. Dandapanthula, Hari Subramoni, Jérôme Vienne, Krishna Chaitanya Kandalla, Sayantan Sur, Dhabaleswar K. Panda, Ron Brightwell:
INAM - A Scalable InfiniBand Network Analysis and Monitoring Tool. Euro-Par Workshops (2) 2011: 166-177 - [c242]Raghunath Rajachandrasekar, Xiangyong Ouyang, Xavier Besseron
, Vilobh Meshram, Dhabaleswar K. Panda:
Can Checkpoint/Restart Mechanisms Benefit from Hierarchical Data Staging? Euro-Par Workshops (2) 2011: 312-321 - [c241]Miao Luo, Jithin Jose, Sayantan Sur, Dhabaleswar K. Panda:
Multi-threaded UPC runtime with network endpoints: Design alternatives and evaluation on multi-core architectures. HiPC 2011: 1-10 - [c240]Krishna Chaitanya Kandalla, Hari Subramoni, Jérôme Vienne, S. Pai Raikar, Karen Tomko
, Sayantan Sur, Dhabaleswar K. Panda:
Designing Non-blocking Broadcast with Collective Offload on InfiniBand Clusters: A Case Study with HPL. Hot Interconnects 2011: 27-34 - [c239]Xiangyong Ouyang, David W. Nellans, Robert Wipfel, David Flynn, Dhabaleswar K. Panda:
Beyond block I/O: Rethinking traditional storage primitives. HPCA 2011: 301-311 - [c238]Xiangyong Ouyang, Raghunath Rajachandrasekar, Xavier Besseron
, Hao Wang, Jian Huang, Dhabaleswar K. Panda:
CRFS: A Lightweight User-Level Filesystem for Generic Checkpoint/Restart. ICPP 2011: 375-384 - [c237]Jithin Jose, Hari Subramoni, Miao Luo, Minjia Zhang, Jian Huang, Md. Wasi-ur-Rahman, Nusrat S. Islam, Xiangyong Ouyang, Hao Wang, Sayantan Sur, Dhabaleswar K. Panda:
Memcached Design on High Performance RDMA Capable Interconnects. ICPP 2011: 743-752 - [c236]Sreeram Potluri, Hao Wang, Vijay Dhanraj, Sayantan Sur, Dhabaleswar K. Panda:
Optimizing MPI One Sided Communication on Multi-core InfiniBand Clusters Using Shared Memory Backed Windows. EuroMPI 2011: 99-109 - [c235]Sreeram Potluri, Sayantan Sur, Devendar Bureddy, Dhabaleswar K. Panda:
Design and Implementation of Key Proposed MPI-3 One-Sided Communication Semantics on InfiniBand. EuroMPI 2011: 321-324 - [r2]Dhabaleswar K. Panda, Sayantan Sur, Hari Subramoni, Krishna Chaitanya Kandalla:
Collective Communication, Network Support For. Encyclopedia of Parallel Computing 2011: 327-334 - [r1]Dhabaleswar K. Panda, Sayantan Sur:
InfiniBand. Encyclopedia of Parallel Computing 2011: 927-935 - 2010
- [j39]Ping Lai, Sayantan Sur, Dhabaleswar K. Panda:
Designing truly one-sided MPI-2 RMA intra-node communication on multi-core systems. Comput. Sci. Res. Dev. 25(1-2): 3-14 (2010) - [c234]Emilio Pasquale Mancini, Gregory Marsh, Dhabaleswar K. Panda:
An MPI-Stream Hybrid Programming Model for Computational Clusters. CCGRID 2010: 323-330 - [c233]Hari Subramoni, Ping Lai, Rajkumar Kettimuthu, Dhabaleswar K. Panda:
High Performance Data Transfer in Grid Environment Using GridFTP over InfiniBand. CCGRID 2010: 557-564 - [c232]Xiangyong Ouyang, Sonya Marcarelli, Raghunath Rajachandrasekar, Dhabaleswar K. Panda:
RDMA-Based Job Migration Framework for MPI over InfiniBand. CLUSTER 2010: 116-125 - [c231]Hari Subramoni, Krishna Chaitanya Kandalla, Sayantan Sur, Dhabaleswar K. Panda:
Design and Evaluation of Generalized Collective Communication Primitives with Overlap Using ConnectX-2 Offload Engine. Hot Interconnects 2010: 40-49 - [c230]Dhabaleswar K. Panda, Sayantan Sur, Pavan Balaji:
Designing High-End Computing Systems with InfiniBand and High-Speed Ethernet. Hot Interconnects 2010: 125-127 - [c229]Krishna Chaitanya Kandalla, Emilio Pasquale Mancini, Sayantan Sur, Dhabaleswar K. Panda:
Designing Power-Aware Collective Communication Algorithms for InfiniBand Clusters. ICPP 2010: 218-227 - [c228]Hari Subramoni, Ping Lai, Sayantan Sur, Dhabaleswar K. Panda:
Improving Application Performance and Predictability Using Multiple Virtual Lanes in Modern Multi-core InfiniBand Clusters. ICPP 2010: 462-471 - [c227]Miao Luo, Sreeram Potluri, Ping Lai, Emilio Pasquale Mancini, Hari Subramoni, Krishna Chaitanya Kandalla, Sayantan Sur, Dhabaleswar K. Panda:
High Performance Design and Implementation of Nemesis Communication Layer for Two-Sided and One-Sided MPI Semantics in MVAPICH2. ICPP Workshops 2010: 377-386 - [c226]Sreeram Potluri, Ping Lai, Karen A. Tomko
, Sayantan Sur, Yifeng Cui, Mahidhar Tatineni, Karl W. Schulz, William L. Barth
, Amitava Majumdar
, Dhabaleswar K. Panda:
Quantifying performance benefits of overlap using MPI-2 in a seismic modeling application. ICS 2010: 17-25 - [c225]Krishna Chaitanya Kandalla, Hari Subramoni, Abhinav Vishnu, Dhabaleswar K. Panda:
Designing topology-aware collective communication algorithms for large scale InfiniBand clusters: Case studies with Scatter and Gather. IPDPS Workshops 2010: 1-8 - [c224]Matthew J. Koop, Pavel Shamis, Ishai Rabinovitz, Dhabaleswar K. Panda:
Designing high-performance and resilient message passing on InfiniBand. IPDPS Workshops 2010: 1-7 - [c223]Jithin Jose, Miao Luo, Sayantan Sur, Dhabaleswar K. Panda:
Unifying UPC and MPI runtimes: experience with MVAPICH. PGAS 2010: 5 - [c222]Yifeng Cui, Kim B. Olsen, Thomas H. Jordan, Kwangyoon Lee, Jun Zhou, Patrick Small, Daniel Roten
, Geoffrey Ely, Dhabaleswar K. Panda, Amit Chourasia, John M. Levesque, Steven M. Day, Philip Maechling
:
Scalable Earthquake Simulation on Petascale Supercomputers. SC 2010: 1-20
2000 – 2009
- 2009
- [j38]Abhinav Vishnu, Matthew J. Koop, Adam Moody, Amith R. Mamidala, Sundeep Narravula, Dhabaleswar K. Panda:
Topology agnostic hot-spot avoidance with InfiniBand. Concurr. Comput. Pract. Exp. 21(3): 301-319 (2009) - [j37]Ping Lai, Pavan Balaji, Rajeev Thakur
, Dhabaleswar K. Panda:
ProOnE: a general-purpose protocol onload engine for multi- and many-core architectures. Comput. Sci. Res. Dev. 23(3-4): 133-142 (2009) - [j36]Dhabaleswar K. Panda:
IPDPS 2007: Comments from the Guest Editor. J. Parallel Distributed Comput. 69(8): 679 (2009) - [c221]Gopalakrishnan Santhanaraman, Pavan Balaji, K. Gopalakrishnan, Rajeev Thakur
, William Gropp
, Dhabaleswar K. Panda:
Natively Supporting True One-Sided Communication in. CCGRID 2009: 380-387 - [c220]Matthew J. Koop, Miao Luo, Dhabaleswar K. Panda:
Reducing network contention with mixed workloads on modern multicore, clusters. CLUSTER 2009: 1-10 - [c219]Gopalakrishnan Santhanaraman, Tejus Gangadharappa, Sundeep Narravula, Amith R. Mamidala, Dhabaleswar K. Panda:
Design alternatives for implementing fence synchronization in MPI-2 one-sided communication for InfiniBand clusters. CLUSTER 2009: 1-9 - [c218]Hari Subramoni, Ping Lai, Miao Luo, Dhabaleswar K. Panda:
RDMA over Ethernet - A preliminary study. CLUSTER 2009: 1-9 - [c217]Abhinav Vishnu, Manojkumar Krishnan, Dhabaleswar K. Panda:
An efficient hardware-software approach to network fault tolerance with InfiniBand. CLUSTER 2009: 1-9 - [c216]Xiangyong Ouyang, Karthik Gopalakrishnan, Tejus Gangadharappa, Dhabaleswar K. Panda:
Fast checkpointing by Write Aggregation with Dynamic Buffer and Interleaving on multicore architecture. HiPC 2009: 99-108 - [c215]Dhabaleswar K. Panda, Matthew J. Koop, Pavan Balaji:
Tutorial: Infiniband and 10-Gigabit Ethernet for Dummies. Hot Interconnects 2009 - [c214]Dhabaleswar K. Panda, Matthew J. Koop, Pavan Balaji:
Tutorial: Designing High-End Computing Systems with Infiniband and 10-Gigabit Ethernet. Hot Interconnects 2009 - [c213]Hari Subramoni, Matthew J. Koop, Dhabaleswar K. Panda:
Designing Next Generation Clusters: Evaluation of InfiniBand DDR/QDR on Intel Computing Platforms. Hot Interconnects 2009: 112-120 - [c212]Xiangyong Ouyang, Karthik Gopalakrishnan, Dhabaleswar K. Panda:
Accelerating Checkpoint Operation by Node-Level Write Aggregation on Multicore Systems. ICPP 2009: 34-41 - [c211]Ping Lai, Hari Subramoni, Sundeep Narravula, Amith R. Mamidala, Dhabaleswar K. Panda:
Designing Efficient FTP Mechanisms for High Performance Data-Transfer over InfiniBand. ICPP 2009: 156-163 - [c210]Rinku Gupta, Peter H. Beckman, Byung-Hoon Park, Ewing L. Lusk, Paul Hargrove
, Al Geist, Dhabaleswar K. Panda, Andrew Lumsdaine
, Jack J. Dongarra:
CIFTS: A Coordinated Infrastructure for Fault-Tolerant Systems. ICPP 2009: 237-245 - [c209]Tejus Gangadharappa, Matthew J. Koop, Dhabaleswar K. Panda:
Designing and Evaluating MPI-2 Dynamic Process Management Support for InfiniBand. ICPP Workshops 2009: 89-96 - [c208]Krishna Chaitanya Kandalla, Hari Subramoni, Gopalakrishnan Santhanaraman, Matthew J. Koop, Dhabaleswar K. Panda:
Designing multi-leader-based Allgather algorithms for multi-core clusters. IPDPS 2009: 1-8 - [c207]Matthew J. Koop, Jaidev K. Sridhar, Dhabaleswar K. Panda:
TupleQ: Fully-asynchronous and zero-copy MPI over InfiniBand. IPDPS 2009: 1-8 - [c206]Jaidev K. Sridhar, Dhabaleswar K. Panda:
Impact of Node Level Caching in MPI Job Launch Mechanisms. PVM/MPI 2009: 230-239 - 2008
- [c205]Amith R. Mamidala, Rahul Kumar, Debraj De, Dhabaleswar K. Panda:
MPI Collectives on Modern Multicore Clusters: Performance Optimizations and Communication Characteristics. CCGRID 2008: 130-137 - [c204]Karthikeyan Vaidyanathan, Ping Lai, Sundeep Narravula, Dhabaleswar K. Panda:
Optimized Distributed Data Sharing Substrate in Multi-core Commodity Clusters: A Comprehensive Study with Applications. CCGRID 2008: 138-145 - [c203]Ping Lai, Sundeep Narravula, Karthikeyan Vaidyanathan, Dhabaleswar K. Panda:
Advanced RDMA-Based Admission Control for Modern Data-Centers. CCGRID 2008: 384-391 - [c202]Wei Huang, Matthew J. Koop, Dhabaleswar K. Panda:
Efficient one-copy MPI shared memory communication in Virtual Machines. CLUSTER 2008: 107-115 - [c201]Dhabaleswar K. Panda:
Designing next generation clusters with InfiniBand and 10GE/iWARP: Opportunities and challenges. CLUSTER 2008: 202 - [c200]Matthew J. Koop, Jaidev K. Sridhar, Dhabaleswar K. Panda:
Scalable MPI design over InfiniBand using eXtended Reliable Connection. CLUSTER 2008: 203-212 - [c199]Jaidev K. Sridhar, Matthew J. Koop, Jonathan L. Perkins, Dhabaleswar K. Panda:
ScELA: Scalable and Extensible Launching Architecture for Clusters. HiPC 2008: 323-335 - [c198]Ranjit Noronha, Xiangyong Ouyang, Dhabaleswar K. Panda:
Designing a High-Performance Clustered NAS: A Case Study with pNFS over RDMA on InfiniBand. HiPC 2008: 465-477 - [c197]Pavan Balaji, Sitha Bhagvat, Rajeev Thakur
, Dhabaleswar K. Panda:
Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet. HiPC 2008: 478-490 - [c196]Matthew J. Koop, Wei Huang, Karthik Gopalakrishnan, Dhabaleswar K. Panda:
Performance Analysis and Evaluation of PCIe 2.0 and Quad-Data Rate InfiniBand. Hot Interconnects 2008: 85-92 - [c195]Lei Chai, Ping Lai, Hyun-Wook Jin, Dhabaleswar K. Panda:
Designing an Efficient Kernel-Level and User-Level Hybrid Approach for MPI Intra-Node Communication on Multi-Core Systems. ICPP 2008: 222-229 - [c194]Sundeep Narravula, Hari Subramoni, Ping Lai, Ranjit Noronha, Dhabaleswar K. Panda:
Performance of HPC Middleware over InfiniBand WAN. ICPP 2008: 304-311 - [c193]Ranjit Noronha, Dhabaleswar K. Panda:
IMCa: A High Performance Caching Front-End for GlusterFS on InfiniBand. ICPP 2008: 462-469 - [c192]Matthew J. Koop, Rahul Kumar, Dhabaleswar K. Panda:
Can software reliability outperform hardware reliability on high performance interconnects?: a case study with MPI over infiniband. ICS 2008: 145-154 - [c191]Matthew J. Koop, Terry R. Jones
, Dhabaleswar K. Panda:
MVAPICH-Aptus: Scalable high-performance multi-transport MPI over InfiniBand. IPDPS 2008: 1-12 - [c190]Rahul Kumar, Amith R. Mamidala, Dhabaleswar K. Panda:
Scaling alltoall collective on multi-core systems. IPDPS 2008: 1-8 - [c189]Gopalakrishnan Santhanaraman, Sundeep Narravula, Dhabaleswar K. Panda:
Designing passive synchronization for MPI-2 one-sided communication to maximize overlap. IPDPS 2008: 1-11 - [c188]Rahul Kumar, Amith R. Mamidala, Matthew J. Koop, Gopalakrishnan Santhanaraman, Dhabaleswar K. Panda:
Lock-Free Asynchronous Rendezvous Design for MPI Point-to-Point Communication. PVM/MPI 2008: 185-193 - [e4]Mark A. Franklin, Dhabaleswar K. Panda, Dimitrios Stiliadis:
Proceedings of the 2008 ACM/IEEE Symposium on Architecture for Networking and Communications Systems, ANCS 2008, San Jose, California, USA, November 6-7, 2008. ACM 2008, ISBN 978-1-60558-346-4 [contents] - 2007
- [c187]Lei Chai, Qi Gao, Dhabaleswar K. Panda:
Understanding the Impact of Multi-Core Architecture in Cluster Computing: A Case Study with Intel Dual-Core System. CCGRID 2007: 471-478 - [c186]Abhinav Vishnu, Matthew J. Koop, Adam Moody, Amith R. Mamidala, Sundeep Narravula, Dhabaleswar K. Panda:
Hot-Spot Avoidance With Multi-Pathing Over InfiniBand: An MPI Perspective. CCGRID 2007: 479-486 - [c185]Matthew J. Koop, Terry R. Jones
, Dhabaleswar K. Panda:
Reducing Connection Memory Requirements of MPI for InfiniBand Clusters: A Message Coalescing Approach. CCGRID 2007: 495-504 - [c184]Sundeep Narravula, A. Marnidala, Abhinav Vishnu, Karthikeyan Vaidyanathan, Dhabaleswar K. Panda:
High Performance Distributed Lock Management Services using Network-based Remote Atomic Operations. CCGRID 2007: 583-590 - [c183]Wei Huang, Qi Gao, Jiuxing Liu, Dhabaleswar K. Panda:
High performance virtual machine migration with RDMA over modern interconnects. CLUSTER 2007: 11-20 - [c182]Karthikeyan Vaidyanathan, Lei Chai, Wei Huang, Dhabaleswar K. Panda:
Efficient asynchronous memory copy operations on multi-core systems and I/OAT. CLUSTER 2007: 159-168 - [c181]Matthew J. Koop, Sayantan Sur, Dhabaleswar K. Panda:
Zero-copy protocol for MPI using infiniband unreliable datagram. CLUSTER 2007: 179-186 - [c180]Hyun-Wook Jin, Sayantan Sur, Lei Chai, Dhabaleswar K. Panda:
Lightweight kernel-level primitives for high-performance MPI intra-node communication over multi-core systems. CLUSTER 2007: 446-451 - [c179]Dhabaleswar K. Panda, Pavan Balaji:
Designing high-end computing systems with InfiniBand and10-Gigabit Ethernet iWARP. CLUSTER 2007 - [c178]Sayantan Sur, Matthew J. Koop, Lei Chai, Dhabaleswar K. Panda:
Performance Analysis and Evaluation of Mellanox ConnectX InfiniBand Architecture with Multi-Core Platforms. Hot Interconnects 2007: 125-134 - [c177]Sundeep Narravula, Amith R. Mamidala, Abhinav Vishnu, Gopalakrishnan Santhanaraman, Dhabaleswar K. Panda:
High Performance MPI over iWARP: Early Experiences. ICPP 2007: 46 - [c176]Qi Gao, Wei Huang, Matthew J. Koop, Dhabaleswar K. Panda:
Group-based Coordinated Checkpointing for MPI: A Case Study on InfiniBand. ICPP 2007: 47 - [c175]Ranjit Noronha, Lei Chai, Thomas Talpey, Dhabaleswar K. Panda:
Designing NFS with RDMA for Security, Performance and Scalability. ICPP 2007: 49 - [c174]Pavan Balaji, Sitha Bhagvat, Dhabaleswar K. Panda, Rajeev Thakur
, William Gropp
:
Advanced Flow-control Mechanisms for the Sockets Direct Protocol over InfiniBand. ICPP 2007: 73 - [c173]Matthew J. Koop, Sayantan Sur, Qi Gao, Dhabaleswar K. Panda:
High performance MPI design using unreliable datagram for ultra-scale InfiniBand clusters. ICS 2007: 180-189 - [c172]Ranjit Noronha, Dhabaleswar K. Panda:
Improving Scalability of OpenMP Applications on Multi-core Systems Using Large Page Support. IPDPS 2007: 1-8 - [c171]Karthikeyan Vaidyanathan, Wei Huang, Lei Chai, Dhabaleswar K. Panda:
Designing Efficient Asynchronous Memory Operations Using Hardware Copy Engine: A Case Study with I/OAT. IPDPS 2007: 1-8 - [c170]Karthikeyan Vaidyanathan, Sundeep Narravula, Pavan Balaji, Dhabaleswar K. Panda:
Designing Efficient Systems Services and Primitives for Next-Generation Data-Centers. IPDPS 2007: 1-6 - [c169]Abhinav Vishnu, Brad Benton, Dhabaleswar K. Panda:
High Performance MPI on IBM 12x InfiniBand Architecture. IPDPS 2007: 1-8 - [c168]Abhinav Vishnu, Amith R. Mamidala, Sundeep Narravula, Dhabaleswar K. Panda:
Automatic Path Migration over InfiniBand: Early Experiences. IPDPS 2007: 1-8 - [c167]Karthikeyan Vaidyanathan, Dhabaleswar K. Panda:
Benefits of I/O Acceleration Technology (I/OAT) in Clusters. ISPASS 2007: 220-229 - [c166]Amith R. Mamidala, Sundeep Narravula, Abhinav Vishnu, Gopalakrishnan Santhanaraman, Dhabaleswar K. Panda:
On using connection-oriented vs. connection-less transport for performance and scalability of collective and one-sided operations: trade-offs and impact. PPoPP 2007: 46-54 - [c165]Gopalakrishnan Santhanaraman, Sundeep Narravula, Amith R. Mamidala, Dhabaleswar K. Panda:
MPI-2 One-Sided Usage and Implementation for Read Modify Write Operations: A Case Study with HPCC. PVM/MPI 2007: 251-259 - [c164]Lei Chai, Xiangyong Ouyang, Ranjit Noronha, Dhabaleswar K. Panda:
pNFS/PVFS2 over InfiniBand: early experiences. PDSW 2007: 5-11 - [c163]Wei Huang, Matthew J. Koop, Qi Gao, Dhabaleswar K. Panda:
Virtual machine aware communication libraries for high performance computing. SC 2007: 9 - [c162]Qi Gao, Feng Qin, Dhabaleswar K. Panda:
DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements. SC 2007: 15 - [c161]Pavan Balaji, Wu-chun Feng, Sitha Bhagvat, Dhabaleswar K. Panda, Rajeev Thakur
, William Gropp
:
Analyzing the impact of supporting out-of-order communication on in-order performance with iWARP. SC 2007: 35 - [c160]Wei Huang, Jiuxing Liu, Matthew J. Koop, Bülent Abali, Dhabaleswar K. Panda:
Nomad: migrating OS-bypass networks in virtual machines. VEE 2007: 158-168 - [e3]John W. Lockwood, Fabrizio Petrini, Ron Brightwell, Dhabaleswar K. Panda:
15th Annual IEEE Symposium on High-Performance Interconnects, HOTI 2007, Stanford, CA, USA, August 22-24, 2007. IEEE Computer Society 2007, ISBN 978-0-7695-2979-0 [contents] - 2006
- [j35]Jarek Nieplocha, Vinod Tipparaju, Manojkumar Krishnan, Dhabaleswar K. Panda:
High Performance Remote Memory Access Communication: The Armci Approach. Int. J. High Perform. Comput. Appl. 20(2): 233-253 (2006) - [j34]Fabrizio Petrini, Adam Moody, Juan Fernández Peinador, Eitan Frachtenberg
, Dhabaleswar K. Panda:
NIC-based reduction algorithms for large-scale clusters. Int. J. High Perform. Comput. Netw. 4(3/4): 122-136 (2006) - [j33]Pavan Balaji, Wu-chun Feng, Dhabaleswar K. Panda:
Bridging the Ethernet-Ethernot Performance Gap. IEEE Micro 26(3): 24-40 (2006) - [c159]Lei Chai, Ranjit Noronha, Dhabaleswar K. Panda:
MPI over uDAPL: Can High Performance and Portability Exist Across Architectures?. CCGRID 2006: 19-26 - [c158]Wei Huang, Gopalakrishnan Santhanaraman, Hyun-Wook Jin, Qi Gao, Dhabaleswar K. Panda:
Design of High Performance MVAPICH2: MPI2 over InfiniBand. CCGRID 2006: 43-48 - [c157]Sundeep Narravula, Hyun-Wook Jin, Karthikeyan Vaidyanathan, Dhabaleswar K. Panda:
Designing Efficient Cooperative Caching Schemes for Multi-Tier Data-Centers over RDMA-enabled Networks. CCGRID 2006: 401-408 - [c156]Lei Chai, Albert Hartono, Dhabaleswar K. Panda:
Designing High Performance and Scalable MPI Intra-node Communication Support for Clusters. CLUSTER 2006 - [c155]Karthikeyan Vaidyanathan, Hyun-Wook Jin, Dhabaleswar K. Panda:
Exploiting RDMA operations for Providing Efficient Fine-Grained Resource Monitoring in Cluster-based Servers. CLUSTER 2006 - [c154]Karthikeyan Vaidyanathan, Sundeep Narravula, Dhabaleswar K. Panda:
DDSS: A Low-Overhead Distributed Data Sharing Substrate for Cluster-Based Data-Centers over Modern Interconnects. HiPC 2006: 472-484 - [c153]Matthew J. Koop, Wei Huang, Abhinav Vishnu, Dhabaleswar K. Panda:
Memory Scalability Evaluation of the Next-Generation Intel Bensley Platform with InfiniBand. Hot Interconnects 2006: 52-60 - [c152]Hyun-Wook Jin, Sundeep Narravula, Karthikeyan Vaidyanathan, Dhabaleswar K. Panda:
NemC: A Network Emulator for Cluster-of-Clusters. ICCCN 2006: 177-182 - [c151]Shuang Liang, Weikuan Yu
, Dhabaleswar K. Panda:
High Performance Block I/O for Global File System (GFS) with InfiniBand RDMA. ICPP 2006: 391-398 - [c150]Qi Gao, Weikuan Yu
, Wei Huang, Dhabaleswar K. Panda:
Application-Transparent Checkpoint/Restart for MPI Programs over InfiniBand. ICPP 2006: 471-478 - [c149]Wei Huang, Jiuxing Liu, Bülent Abali, Dhabaleswar K. Panda:
A case for high performance computing with virtual machines. ICS 2006: 125-134 - [c148]Pavan Balaji, Sitha Bhagvat, Hyun-Wook Jin, Dhabaleswar K. Panda:
Asynchronous zero-copy communication for synchronous sockets in the sockets direct protocol (SDP) over InfiniBand. IPDPS 2006 - [c147]Pavan Balaji, Karthikeyan Vaidyanathan, Sundeep Narravula, Hyun-Wook Jin, Dhabaleswar K. Panda:
Designing next generation data-centers with advanced communication protocols and systems services. IPDPS 2006 - [c146]Amith R. Mamidala, Lei Chai, Hyun-Wook Jin, Dhabaleswar K. Panda:
Efficient SMP-aware MPI-level broadcast over InfiniBand's hardware multicast. IPDPS 2006 - [c145]Sayantan Sur, Lei Chai, Hyun-Wook Jin, Dhabaleswar K. Panda:
Shared receive queue based scalable MPI design for InfiniBand clusters. IPDPS 2006 - [c144]Weikuan Yu
, Qi Gao, Dhabaleswar K. Panda:
Adaptive connection management for scalable MPI over InfiniBand. IPDPS 2006 - [c143]Weikuan Yu, Ranjit Noronha, Shuang Liang, Dhabaleswar K. Panda:
Benefits of high speed interconnects to cluster file systems: a case study with Lustre. IPDPS 2006 - [c142]Sayantan Sur, Hyun-Wook Jin, Lei Chai, Dhabaleswar K. Panda:
RDMA read based rendezvous protocol for MPI over InfiniBand: design alternatives and benefits. PPoPP 2006: 32-39 - [c141]Amith R. Mamidala, Abhinav Vishnu, Dhabaleswar K. Panda:
Efficient Shared Memory and RDMA Based Design for MPI_Allgather over InfiniBand. PVM/MPI 2006: 66-75 - [c140]Leslie S. Perkins, Phil Andrews, Dhabaleswar K. Panda, Dave Morton, Ron Bonica, Nick Henry Werstiuk, Randy Kreiser:
Panel: Data intensive computing. SC 2006: 69 - [c139]Abhinav Vishnu, Prachi Gupta, Amith R. Mamidala, Dhabaleswar K. Panda:
Scalable systems software - A software based approach for providing network fault tolerance in clusters with uDAPL interface: MPI level design and performance evaluation. SC 2006: 85 - [c138]Sayantan Sur, Matthew J. Koop, Dhabaleswar K. Panda:
MPI and communication - High-performance and scalable MPI over InfiniBand with reduced memory usage: an in-depth performance analysis. SC 2006: 105 - [c137]Jiuxing Liu, Wei Huang, Bülent Abali, Dhabaleswar K. Panda:
High Performance VMM-Bypass I/O in Virtual Machines. USENIX ATC, General Track 2006: 29-42 - 2005
- [j32]Gopalakrishnan Santhanaraman, Jiesheng Wu, Wei Huang, Dhabaleswar K. Panda:
Designing Zero-Copy Message Passing Interface Derived Datatype Communication Over Infiniband: Alternative Approaches and Performance Evaluation. Int. J. High Perform. Comput. Appl. 19(2): 129-142 (2005) - [j31]Weikuan Yu
, Sayantan Sur, Dhabaleswar K. Panda, Rob T. Aulwes, Richard L. Graham:
High Performance Broadcast Support in La-Mpi Over Quadrics. Int. J. High Perform. Comput. Appl. 19(4): 453-463 (2005) - [j30]Rajkumar Kettimuthu, Vijay Subramani, Srividya Srinivasan, Thiagaraja Gopalsamy, Dhabaleswar K. Panda, P. Sadayappan:
Selective preemption strategies for parallel job scheduling. Int. J. High Perform. Comput. Netw. 3(2/3): 122-152 (2005) - [j29]Hyun-Wook Jin, Pavan Balaji, Chuck Yoo, Jin-Young Choi
, Dhabaleswar K. Panda:
Exploiting NIC architectural support for enhancing IP-based protocols on high-performance networks. J. Parallel Distributed Comput. 65(11): 1348-1365 (2005) - [j28]Jiuxing Liu, Amith R. Mamidala, Abhinav Vishnu, Dhabaleswar K. Panda:
Evaluating InfiniBand Performance with PCI Express. IEEE Micro 25(1): 20-29 (2005) - [c136]Sundeep Narravula, Pavan Balaji, Karthikeyan Vaidyanathan, Hyun-Wook Jin, Dhabaleswar K. Panda:
Architecture for caching responses with multiple dynamic dependencies in multi-tier data-centers over InfiniBand. CCGRID 2005: 374-381 - [c135]Ranjit Noronha, Dhabaleswar K. Panda:
Can high performance software DSM systems designed with InfiniBand features benefit from PCI-Express? CCGRID 2005: 945-952 - [c134]Pavan Balaji, Wu-chun Feng, Qi Gao, Ranjit Noronha, Weikuan Yu
, Dhabaleswar K. Panda:
Head-to-TOE Evaluation of High-Performance Sockets over Protocol Offload Engines. CLUSTER 2005: 1-10 - [c133]Pavan Balaji, Hyun-Wook Jin, Karthikeyan Vaidyanathan, Dhabaleswar K. Panda:
Supporting iWARP Compatibility and Features for Regular Network Adapters. CLUSTER 2005: 1-10 - [c132]Shuang Liang, Ranjit Noronha, Dhabaleswar K. Panda:
Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device. CLUSTER 2005: 1-10 - [c131]Ranjit Noronha, Dhabaleswar K. Panda:
Performance Evaluation of MM5 on Clusters with Modern Interconnects: Scalability and Impact. Euro-Par 2005: 134-145 - [c130]Abhinav Vishnu, Gopalakrishnan Santhanaraman, Wei Huang, Hyun-Wook Jin, Dhabaleswar K. Panda:
Supporting MPI-2 One Sided Communication on Multi-rail InfiniBand Clusters: Design Challenges and Performance Benefits. HiPC 2005: 137-147 - [c129]Sayantan Sur, Uday Bondhugula, Amith R. Mamidala, Hyun-Wook Jin, Dhabaleswar K. Panda:
High Performance RDMA Based All-to-All Broadcast for InfiniBand Clusters. HiPC 2005: 148-157 - [c128]Sayantan Sur, Abhinav Vishnu, Hyun-Wook Jin, Wei Huang, Dhabaleswar K. Panda:
Can Memory-Less Network Adapters Benefit Next-Generation InfiniBand Systems?. Hot Interconnects 2005: 45-50 - [c127]Wu-chun Feng, Pavan Balaji, Christopher Baron, Laxmi N. Bhuyan, Dhabaleswar K. Panda:
Performance Characterization of a 10-Gigabit Ethernet TOE. Hot Interconnects 2005: 58-63 - [c126]Hyun-Wook Jin, Sayantan Sur, Lei Chai, Dhabaleswar K. Panda:
LiMIC: Support for High-Performance MPI Intra-node Communication on Linux Cluster. ICPP 2005: 184-191 - [c125]Weikuan Yu
, Shuang Liang, Dhabaleswar K. Panda:
High performance support of parallel virtual file system (PVFS2) over Quadrics. ICS 2005: 323-331 - [c124]Lei Chai, Sayantan Sur, Hyun-Wook Jin, Dhabaleswar K. Panda:
Analysis of Design Considerations for Optimizing Multi-Channel MPI over InfiniBand. IPDPS 2005 - [c123]Wei Huang, Gopalakrishnan Santhanaraman, Hyun-Wook Jin, Dhabaleswar K. Panda:
Scheduling of MPI-2 One Sided Operations over InfiniBand. IPDPS 2005 - [c122]Abhinav Vishnu, Amith R. Mamidala, Hyun-Wook Jin, Dhabaleswar K. Panda:
Performance Modeling of Subnet Management on Fat Tree InfiniBand Networks using OpenSM. IPDPS 2005 - [c121]Weikuan Yu
, Timothy S. Woodall, Richard L. Graham, Dhabaleswar K. Panda:
Design and Implementation of Open MPI over Quadrics/Elan4. IPDPS 2005 - [c120]Pavan Balaji, Sundeep Narravula, Karthikeyan Vaidyanathan, Hyun-Wook Jin, Dhabaleswar K. Panda:
On the provision of prioritization and soft qos in dynamically reconfigurable shared data-centers over infiniband. ISPASS 2005: 280-289 - [c119]Wei Huang, Gopalakrishnan Santhanaraman, Hyun-Wook Jin, Dhabaleswar K. Panda:
Design Alternatives and Performance Trade-Offs for Implementing MPI-2 over InfiniBand. PVM/MPI 2005: 191-199 - [c118]Lei Chai, Ranjit Noronha, Prachi Gupta, G. Brown, Dhabaleswar K. Panda:
Designing a Portable MPI-2 over Modern Interconnects Using uDAPL Interface. PVM/MPI 2005: 200-208 - [c117]Amith R. Mamidala, Hyun-Wook Jin, Dhabaleswar K. Panda:
Efficient Hardware Multicast Group Management for Multiple MPI Communicators over InfiniBand. PVM/MPI 2005: 388-398 - 2004
- [j27]Adam Wagner, Darius Buntinas, Ron Brightwell, Dhabaleswar K. Panda:
Application-bypass reduction for large-scale clusters. Int. J. High Perform. Comput. Netw. 2(2/3/4): 99-109 (2004) - [j26]Jarek Nieplocha, Vinod Tipparaju, Manojkumar Krishnan, Gopalakrishnan Santhanaraman, Dhabaleswar K. Panda:
Optimisation and performance evaluation of mechanisms for latency tolerance in remote memory access communication on clusters. Int. J. High Perform. Comput. Netw. 2(2/3/4): 198-209 (2004) - [j25]Jiuxing Liu, Jiesheng Wu, Dhabaleswar K. Panda:
High Performance RDMA-Based MPI Implementation over InfiniBand. Int. J. Parallel Program. 32(3): 167-198 (2004) - [j24]Jiuxing Liu, B. Chandrasekaran, Weikuan Yu
, Jiesheng Wu, Darius Buntinas, Sushmitha P. Kini, Dhabaleswar K. Panda, Pete Wyckoff:
Microbenchmark Performance Comparison of High-Speed Cluster Interconnects. IEEE Micro 24(1): 42-51 (2004) - [c116]Ranjit Noronha, Dhabaleswar K. Panda:
Designing high performance DSM systems using InfiniBand features. CCGRID 2004: 467-474 - [c115]Jiesheng Wu, Pete Wyckoff, Dhabaleswar K. Panda, Robert B. Ross:
Unifier: unifying cache management and communication buffer management for PVFS over InfiniBand. CCGRID 2004: 523-530 - [c114]Weihang Jiang, Jiuxing Liu, Hyun-Wook Jin, Dhabaleswar K. Panda, William Gropp
, Rajeev Thakur:
High performance MPI-2 one-sided communication over InfiniBand. CCGRID 2004: 531-538 - [c113]Dhabaleswar K. Panda:
State of InfiniBand in designing HPC clusters, storage/file systems, and datacenters [datacenters read as data centers]. CLUSTER 2004: 3 - [c112]Weikuan Yu
, Dhabaleswar K. Panda, Darius Buntinas:
Scalable, high-performance NIC-based all-to-all broadcast over Myrinet/GM. CLUSTER 2004: 125-134 - [c111]Amith R. Mamidala, Jiuxing Liu, Dhabaleswar K. Panda:
Efficient Barrier and Allreduce on Infiniband clusters using multicast and adaptive algorithms. CLUSTER 2004: 135-144 - [c110]Adam Wagner, Hyun-Wook Jin, Dhabaleswar K. Panda, Rolf Riesen:
NIC-based offload of dynamic user-defined modules for Myrinet clusters. CLUSTER 2004: 205-214 - [c109]Mohammad Islam, Pavan Balaji, P. Sadayappan, Dhabaleswar K. Panda:
Towards provision of quality of service guarantees in job scheduling. CLUSTER 2004: 245-254 - [c108]Weikuan Yu
, Jiesheng Wu, Dhabaleswar K. Panda:
Fast and Scalable Startup of MPI Programs in InfiniBand Clusters. HiPC 2004: 440-449 - [c107]Jiuxing Liu, Amith R. Mamidala, Abhinav Vishnu, Dhabaleswar K. Panda:
Performance evaluation of InfiniBand with PCI Express. Hot Interconnects 2004: 13-19 - [c106]Sayantan Sur, Hyun-Wook Jin, Dhabaleswar K. Panda:
Efficient and Scalable All-to-All Personalized Exchange for InfiniBand-Based Clusters. ICPP 2004: 275-282 - [c105]Qingda Lu, Jiesheng Wu, Dhabaleswar K. Panda, P. Sadayappan:
Applying MPI Derived Datatypes to the NAS Benchmarks: A Case Study. ICPP Workshops 2004: 538-545 - [c104]Jiuxing Liu, Weihang Jiang, Pete Wyckoff, Dhabaleswar K. Panda, David Ashton, Darius Buntinas, William D. Gropp, Brian R. Toonen:
Design and Implementation of MPICH2 over InfiniBand with RDMA Support. IPDPS 2004 - [c103]Jiuxing Liu, Amith R. Mamidala, Dhabaleswar K. Panda:
Fast and Scalable MPI-Level Broadcast Using InfiniBand?s Hardware Multicast Support. IPDPS 2004 - [c102]Jiuxing Liu, Dhabaleswar K. Panda:
Implementing Efficient and Scalable Flow Control Schemes in MPI over InfiniBand. IPDPS 2004 - [c101]Vinod Tipparaju, Gopalakrishnan Santhanaraman, Jarek Nieplocha, Dhabaleswar K. Panda:
Host-Assisted Zero-Copy Remote Memory Access Communication on InfiniBand. IPDPS 2004 - [c100]Jiesheng Wu, Pete Wyckoff, Dhabaleswar K. Panda:
High Performance Implementation of MPI Derived Datatype Communication over InfiniBand. IPDPS 2004 - [c99]Weikuan Yu, Darius Buntinas, Richard L. Graham, Dhabaleswar K. Panda:
Efficient and Scalable Barrier over Quadrics and Myrinet with a New NIC-Based Collective Message Passing Protocol. IPDPS 2004 - [c98]Pavan Balaji, Sundeep Narravula, Karthikeyan Vaidyanathan, Savitha Krishnamoorthy, Jiesheng Wu, Dhabaleswar K. Panda:
Sockets Direct Protocol over InfiniBand in clusters: is it beneficial? ISPASS 2004: 28-35 - [c97]Gopalakrishnan Santhanaraman, Dhabaleswar Wu, Dhabaleswar K. Panda:
Zero-Copy MPI Derived Datatype Communication over InfiniBand. PVM/MPI 2004: 47-56 - [c96]Weihang Jiang, Jiuxing Liu, Hyun-Wook Jin, Dhabaleswar K. Panda, Darius Buntinas, Rajeev Thakur
, William D. Gropp
:
Efficient Implementation of MPI-2 Passive One-Sided Communication on InfiniBand Clusters. PVM/MPI 2004: 68-76 - [c95]Jiuxing Liu, Abhinav Vishnu, Dhabaleswar K. Panda:
Building Multirail InfiniBand Clusters: MPI-Level Design and Performance Evaluation. SC 2004: 33 - [i2]Weikuan Yu, Darius Buntinas, Richard L. Graham, Dhabaleswar K. Panda:
Efficient and Scalable Barrier over Quadrics and Myrinet with a New NIC-Based Collective Message Passing Protocol. CoRR cs.DC/0402027 (2004) - 2003
- [c94]Darius Buntinas, Dhabaleswar K. Panda, Ron Brightwell:
Application-Bypas Broadcast in MPICH over GM. CCGRID 2003: 2-9 - [c93]Jarek Nieplocha, Vinod Tipparaju, Manojkumar Krishnan, Gopalakrishnan Santhanaraman, Dhabaleswar K. Panda:
Optimizing Mechanisms for Latency Tolerance in Remote Memory Access Communication on Clusters. CLUSTER 2003: 138-147 - [c92]Dhabaleswar K. Panda:
Designing Next Generation Clusters with Infiniband: Opportunities and Challenges. CLUSTER 2003 - [c91]Jiesheng Wu, Pete Wyckoff, Dhabaleswar K. Panda:
Supporting Efficient Noncontiguous Access in PVFS over InfiniBand. CLUSTER 2003: 344- - [c90]Adam Wagner, Darius Buntinas, Dhabaleswar K. Panda, Ron Brightwell:
Application-Bypass Reduction for Large-Scale Clusters. CLUSTER 2003: 404-411 - [c89]B. Chandrasekaran, Pete Wyckoff, Dhabaleswar K. Panda:
MIBA: A Micro-Benchmark Suite for Evaluating InfiniBand Architecture Implementations. Computer Performance Evaluation / TOOLS 2003: 29-46 - [c88]Vinod Tipparaju, Manojkumar Krishnan, Jarek Nieplocha, Gopalakrishnan Santhanaraman, Dhabaleswar K. Panda:
Exploiting Non-blocking Remote Memory Access Communication in Scientific Benchmarks. HiPC 2003: 248-258 - [c87]Jiuxing Liu, Balasubramanian Chandrasekaran, Weikuan Yu
, Jiesheng Wu, Darius Buntinas, Sushmitha P. Kini, Pete Wyckoff, Dhabaleswar K. Panda:
Micro-benchmark level performance comparison of high-speed cluster interconnects. Hot Interconnects 2003: 60-65 - [c86]Pavan Balaji, Jiesheng Wu, Tahsin M. Kurç, Ümit V. Çatalyürek, Dhabaleswar K. Panda, Joel H. Saltz:
Impact of High Performance Sockets on Data Intensive Applications. HPDC 2003: 24-33 - [c85]S. Senapathi, B. Chandrasekaran, Don Stredney, Han-Wei Shen, Dhabaleswar K. Panda:
QoS-Aware Middleware for Cluster-Based Servers to support Interactive and Resource-Adaptive Applications. HPDC 2003: 205-215 - [c84]Jiesheng Wu, Pete Wyckoff, Dhabaleswar K. Panda:
PVFS over InfiniBand: Design and Performance Evaluation. ICPP 2003: 125-132 - [c83]Weikuan Yu
, Darius Buntinas, Dhabaleswar K. Panda:
High Performance and Reliable NIC-Based Multicast over Myrinet/GM-2. ICPP 2003: 197-204 - [c82]Jiuxing Liu, Jiesheng Wu, Sushmitha P. Kini, Pete Wyckoff, Dhabaleswar K. Panda:
High performance RDMA-based MPI implementation over InfiniBand. ICS 2003: 295-304 - [c81]Rinku Gupta, Pavan Balaji, Dhabaleswar K. Panda, Jarek Nieplocha:
Efficient Collective Operations Using Remote Memory Operations on VIA-Based Clusters. IPDPS 2003: 46 - [c80]Vinod Tipparaju, Jarek Nieplocha, Dhabaleswar K. Panda:
Fast Collective Operations Using Shared and Remote Memory Access Protocols on Clusters. IPDPS 2003: 84 - [c79]Darius Buntinas, Amina Saify, Dhabaleswar K. Panda, Jarek Nieplocha:
Optimizing Synchronization Operations for Remote Memory Communication Systems. IPDPS 2003: 199 - [c78]Ranjit Noronha, Dhabaleswar K. Panda:
Implementing TreadMarks over GM on Myrinet: Challenges, Design Experience, and Performance Evaluation. IPDPS 2003: 200 - [c77]Mohammad Islam, Pavan Balaji, P. Sadayappan, Dhabaleswar K. Panda:
QoPS: A QoS Based Scheme for Parallel Job Scheduling. JSSPP 2003: 252-268 - [c76]Matthew Eric Otey, Srinivasan Parthasarathy
, Amol Ghoting, G. Li, Sundeep Narravula, Dhabaleswar K. Panda:
Towards NIC-based intrusion detection. KDD 2003: 723-728 - [c75]Sushmitha P. Kini, Jiuxing Liu, Jiesheng Wu, Pete Wyckoff, Dhabaleswar K. Panda:
Fast and Scalable Barrier Using RDMA and Multicast Mechanisms for InfiniBand-Based Clusters. PVM/MPI 2003: 369-378 - [c74]Jiuxing Liu, B. Chandrasekaran, Jiesheng Wu, Weihang Jiang, Sushmitha P. Kini, Weikuan Yu
, Darius Buntinas, Pete Wyckoff, Dhabaleswar K. Panda:
Performance Comparison of MPI Implementations over InfiniBand, Myrinet and Quadrics. SC 2003: 58 - [c73]Adam Moody, Juan Fernández, Fabrizio Petrini, Dhabaleswar K. Panda:
Scalable NIC-based Reduction on Large-scale Clusters. SC 2003: 59 - [c72]Jiesheng Wu, Pete Wyckoff, Dhabaleswar K. Panda:
Demotion-based exclusive caching through demote buffering: design and evaluations over different networks. SNAPI@PACT 2003: 73-80 - [i1]Jiuxing Liu, Weihang Jiang, Pete Wyckoff, Dhabaleswar K. Panda, David Ashton, Darius Buntinas, William Gropp, Brian R. Toonen:
Design and Implementation of MPICH2 over InfiniBand with RDMA Support. CoRR cs.AR/0310059 (2003) - 2002
- [j23]Rajeev Sivaram, Craig B. Stunkel, Dhabaleswar K. Panda:
HIPIQS: A High-Performance Switch Architecture Using Input Queuing. IEEE Trans. Parallel Distributed Syst. 13(3): 275-289 (2002) - [c71]Rinku Gupta, Vinod Tipparaju, Jarek Nieplocha, Dhabaleswar K. Panda:
Efficient Barrier Using Remote Memory Operations on VIA-Based Clusters. CLUSTER 2002: 83- - [c70]Jiesheng Wu, Jiuxing Liu, Pete Wyckoff, Dhabaleswar K. Panda:
Impact of On-Demand Connection Management in MPI over VIA. CLUSTER 2002: 152-159 - [c69]Pavan Balaji, Piyush Shivam, Pete Wyckoff, Dhabaleswar K. Panda:
High Performance User Level Sockets over Gigabit Ethernet. CLUSTER 2002: 179-186 - [c68]Dhabaleswar K. Panda:
Tutorial 2: InfiniBand Architecture and Where it is Headed. Hot Interconnects 2002: 157-158 - [c67]Thiagaraja Gopalsamy, Mukesh Singhal, Dhabaleswar K. Panda, P. Sadayappan:
A Reliable Multicast Algorithm for Mobile Ad Hoc Networks. ICDCS 2002: 563-570 - [c66]Jarek Nieplocha, Vinod Tipparaju, Amina Saify, Dhabaleswar K. Panda:
Protocols and Strategies for Optimizing Performance of Remote Memory Operations on Clusters. IPDPS 2002 - [c65]Dhabaleswar K. Panda, José Duato, Craig B. Stunkel:
Workshop Introduction. IPDPS 2002 - [c64]Piyush Shivam, Pete Wyckoff, Dhabaleswar K. Panda:
Can User-Level Protocols Take Advantage of Multi-CPU NICs?. IPDPS 2002 - [c63]Jiesheng Wu, Dhabaleswar K. Panda:
MPI/IO on DAFS over VIA: Implementation and Performance Evaluation. IPDPS 2002 - [c62]Dhabaleswar K. Panda:
Active Network Interface: Opportunities and Challenges. LCN 2002: 605 - [c61]Naveen Kumar Polapally, Raghu Machiraju, Dhabaleswar K. Panda:
Feature estimation for efficient streaming. VolVis 2002: 107-114 - 2001
- [j22]Bülent Abali, Craig B. Stunkel, Jay Herring, Mohammad Banikazemi, Dhabaleswar K. Panda, Cevdet Aykanat, Yucel Aydogan:
Adaptive Routing on the New Switch Chip for IBM SP Systems. J. Parallel Distributed Comput. 61(9): 1148-1179 (2001) - [j21]Mohammad Banikazemi, Bülent Abali, Lorraine Herger, Dhabaleswar K. Panda:
Design Alternatives for Virtual Interface Architecture and an Implementation on IBM Netfinity NT Cluster. J. Parallel Distributed Comput. 61(11): 1512-1545 (2001) - [j20]Rajeev Sivaram, Ram Kesavan, Dhabaleswar K. Panda, Craig B. Stunkel:
Architectural Support for Efficient Multicasting in Irregular Networks. IEEE Trans. Parallel Distributed Syst. 12(5): 489-513 (2001) - [j19]Ram Kesavan, Dhabaleswar K. Panda:
Efficient Multicast on Irregular Switch-Based Cut-Through Networks with Up-Down Routing. IEEE Trans. Parallel Distributed Syst. 12(8): 808-828 (2001) - [j18]Mohammad Banikazemi, Rama Govindaraju, Robert Blackmore, Dhabaleswar K. Panda:
MPI-LAPI: An Efficient Implementation of MPI for IBM RS/6000 SP Systems. IEEE Trans. Parallel Distributed Syst. 12(10): 1081-1093 (2001) - [j17]N. S. Sundar, Doddaballapur Narasimha-Murthy Jayasimha, Dhabaleswar K. Panda:
Hybrid Algorithms for Complete Exchange in 2D Meshes. IEEE Trans. Parallel Distributed Syst. 12(12): 1201-1218 (2001) - [c60]Mohammad Banikazemi, Jiuxing Liu, Dhabaleswar K. Panda, P. Sadayappan:
Implementing TreadMarksover VIA on Myrinet and Gigabit Ethernet: Challenges, Design Experience, and Performance Evaluation. ICPP 2001: 167-174 - [c59]Abhishek Gulati, Dhabaleswar K. Panda, P. Sadayappan, Pete Wyckoff:
NIC-Based Rate Control for Proportional Bandwidth Allocation in Myrinet Clusters. ICPP 2001: 305-312 - [c58]Mohammad Banikazemi, Jiuxing Liu, S. Kutlug, P. Sadayappan, H. Shah, Dhabaleswar K. Panda:
VIBe: A Micro-benchmark Suite for Evaluating Virtual Interface Architecture (VIA) Implementations. IPDPS 2001: 24 - [c57]Darius Buntinas, Dhabaleswar K. Panda, P. Sadayappan:
Fast NIC-Based Barrier over Myrinet/GM. IPDPS 2001: 52 - [c56]Amit Singhal, Mohammad Banikazemi, P. Sadayappan, Dhabaleswar K. Panda:
Efficient Multicast Algorithms for Heterogeneous Switch-based Irregular Networks of Workstations. IPDPS 2001: 71 - [c55]Darius Buntinas, Dhabaleswar K. Panda, P. Sadayappan:
Performance Benefits of NIC-Based Barrier on Myrinet/GM. IPDPS 2001: 166 - [c54]Piyush Shivam, Pete Wyckoff, Dhabaleswar K. Panda:
EMP: zero-copy OS-bypass NIC-driven gigabit ethernet message passing. SC 2001: 57 - 2000
- [j16]Rajeev Sivaram, Craig B. Stunkel, Dhabaleswar K. Panda:
Implementing Multidestination Worms in Switch-Based Parallel Systems: Architectural Alternatives and Their Impact. IEEE Trans. Parallel Distributed Syst. 11(8): 794-812 (2000) - [c53]Vijay Moorthy, Dhabaleswar K. Panda, P. Sadayappan:
Fast Collective Communication Algorithms for Reflective Memory Network Clusters. CANPC 2000: 100-114 - [c52]Darius Buntinas, Dhabaleswar K. Panda, José Duato, P. Sadayappan:
Broadcast/Multicast over Myrinet Using NIC-Assisted Multidestination Messages. CANPC 2000: 115-129 - [c51]Mohammad Banikazemi, Bülent Abali, Dhabaleswar K. Panda:
Comparison and Evaluation of Design Choices for Implementing the Virtual Interface Architecture (VIA). CANPC 2000: 145-161 - [c50]Praveen Holenarsipur, Vladimir Yarmolenko, José Duato, Dhabaleswar K. Panda, P. Sadayappan:
Characterization and enhancement of Static Mapping Heuristics for Heterogeneous Systems. HiPC 2000: 37-48 - [c49]Mohammad Banikazemi, Dhabaleswar K. Panda:
Can Scatter Communication Take Advantage of Multidestination Message Passing? HiPC 2000: 204-211 - [c48]Vladimir Yarmolenko, José Duato
, Dhabaleswar K. Panda, P. Sadayappan:
Characterization and Enhancement of Dynamic Mapping Heuristics for Heterogeneous Systems. ICPP Workshops 2000: 437-446 - [c47]Arindam Paul, Wu-chi Feng, Dhabaleswar K. Panda, P. Sadayappan:
Balancing Web Server Load for Adaptable Video Distribution. ICPP Workshops 2000: 469-478 - [c46]Mohammad Banikazemi, Vijay Moorthy, Dhabaleswar K. Panda, Lorraine Herger, Bülent Abali:
Efficient Virtual Interface Architecture (VIA) Support for the IBM SP Switch-Connected NT Clusters. IPDPS 2000: 33-42 - [c45]Mohammad Banikazemi, Dhabaleswar K. Panda, Craig B. Stunkel, Bülent Abali:
Adaptive Routing in RS/6000 SP-Like Bidirectional Multistage Interconnection Networks. IPDPS 2000: 43-52
1990 – 1999
- 1999
- [j15]Donglai Dai, Dhabaleswar K. Panda:
Exploiting the Benefits of Multiple-Path Network DSM Systems: Architectural Alternatives and Performance Evaluation. IEEE Trans. Computers 48(2): 236-244 (1999) - [j14]Dhabaleswar K. Panda, Sanjay Singal, Ram Kesavan:
Multidestination Message Passing in Wormhole k-ary n-cube Networks with Base Routing Conformed Paths. IEEE Trans. Parallel Distributed Syst. 10(1): 76-96 (1999) - [j13]Ram Kesavan, Dhabaleswar K. Panda:
Multiple Multicast with Minimized Node Contention on Wormhole k-ary n-cube Networks. IEEE Trans. Parallel Distributed Syst. 10(4): 371-393 (1999) - [c44]Matthew G. Jacunski, Vijay Moorthy, Peter P. Ware, Manoj Pillai, Dhabaleswar K. Panda, P. Sadayappan:
Low Latency Message-Passing for Reflective Memory Networks. CANPC 1999: 211-224 - [c43]Mohammad Banikazemi, Jayanthi Sampathkumar, Sandeep Prabhu, Dhabaleswar K. Panda, P. Sadayappan:
Communication Modeling of Heterogeneous Networks of Workstations for Performance Characterization of Collective Operations. Heterogeneous Computing Workshop 1999: 125- - [c42]Vijay Moorthy, Matthew G. Jacunski, Manoj Pillai, Peter P. Ware, Dhabaleswar K. Panda, Thomas W. Page Jr., P. Sadayappan, V. Nagarajan, Johns Daniel:
Low-Latency Message Passing on Workstation Clusters using SCRAMNet. IPPS/SPDP 1999: 148-152 - [c41]Mohammad Banikazemi, Rama Govindaraju, Robert Blackmore, Dhabaleswar K. Panda:
Implementing Efficient MPI on LAPI for IBM RS/6000 SP Systems: Experiences and Performance Evaluation. IPPS/SPDP 1999: 183-190 - [c40]Matthew G. Jacunski, P. Sadayappan, Dhabaleswar K. Panda:
All-to-All Broadcast on Switch-Based Clusters of Workstations. IPPS/SPDP 1999: 325-329 - 1998
- [j12]Ravi Prakash
, Dhabaleswar K. Panda:
Designing communication strategies for heterogeneous parallel systems. Parallel Comput. 24(14): 2035-2052 (1998) - [j11]Debashis Basak, Dhabaleswar K. Panda:
Alleviating Consumption Channel Bottleneck in Wormhole-Routed k-ary n-Cube Systems. IEEE Trans. Parallel Distributed Syst. 9(5): 481-496 (1998) - [j10]Rajeev Sivaram, Dhabaleswar K. Panda, Craig B. Stunkel:
Efficient Broadcast and Multicast on Multistage Interconnection Networks Using Multiport Encoding. IEEE Trans. Parallel Distributed Syst. 9(10): 1004-1028 (1998) - [c39]Federico Silla, Manuel P. Malumbres
, José Duato
, Donglai Dai, Dhabaleswar K. Panda:
Impact of Adaptivity on the Behaviour of Networks of Workstations under Bursty Traffic. ICPP 1998: 88-95 - [c38]Rajeev Sivaram, Ram Kesavan, Dhabaleswar K. Panda, Craig B. Stunkel:
Where to Provide Support for Efficient Multicasting in Irregular Networks: Network Interface or Switch? ICPP 1998: 452-459 - [c37]Mohammad Banikazemi, Vijay Moorthy, Dhabaleswar K. Panda:
Efficient Collective Communication on Heterogeneous Networks of Workstations. ICPP 1998: 460-467 - [c36]Aravind Bala, Darshat Shah, Wu-chi Feng, Dhabaleswar K. Panda:
Experiences with Software MPEG-2 Video Decompression on an SMP PC. ICPP Workshops 1998: 29-37 - [c35]Rajeev Sivaram, Craig B. Stunkel, Dhabaleswar K. Panda:
HIPIQS: A High-Performance Switch Architecture Using Input Queuing. IPPS/SPDP 1998: 134-143 - [e2]Dhabaleswar K. Panda, Craig B. Stunkel:
Network-Based Parallel Computing: Communication, Architecture, and Applications, Second International Workshop, CANPC '98, Las Vegas, Nevada, USA, January 31 - February 1, 1998, Proceedings. Lecture Notes in Computer Science 1362, Springer 1998, ISBN 3-540-64140-8 [contents] - 1997
- [j9]Dhabaleswar K. Panda, Lionel M. Ni:
Special Issue on Workstation Clusters and Network-Based Computing: Guest Editors' Introduction. J. Parallel Distributed Comput. 40(1): 1-3 (1997) - [j8]Dhabaleswar K. Panda, Lionel M. Ni:
Special Issue on Workstation Clusters and Network-Based Computing: Guest Editors' Introduction. J. Parallel Distributed Comput. 43(2): 63-64 (1997) - [j7]Yu-Chee Tseng, Ting-Hsien Lin, Sandeep K. S. Gupta, Dhabaleswar K. Panda:
Bandwidth-Optimal Complete Exchange on Wormhole-Routed 2D/3D Torus Networks: A Diagonal-Propagation Approach. IEEE Trans. Parallel Distributed Syst. 8(4): 380-396 (1997) - [c34]Abdel-Halim Smai, Dhabaleswar K. Panda, Lars-Erik Thorelli:
Prioritized demand multiplexing (PDM): a low-latency virtual channel flow control framework for prioritized traffic. HiPC 1997: 449-454 - [c33]Ram Kesavan, Kiran Bondalapati, Dhabaleswar K. Panda:
Multicast on Irregular Switch-Based Networks with Wormhole Routing. HPCA 1997: 48-57 - [c32]Ram Kesavan, Dhabaleswar K. Panda:
Optimal Multicast with Packetization and Network Interface Support. ICPP 1997: 370-377 - [c31]Donglai Dai, Dhabaleswar K. Panda:
How Much Does Network Contention Affect Distributed Shared Memory Performance? ICPP 1997: 454-461 - [c30]Rajeev Sivaram, Craig B. Stunkel, Dhabaleswar K. Panda:
A Reliable Hardware Barrier Synchronization Scheme. IPPS 1997: 274-280 - [c29]Craig B. Stunkel, Rajeev Sivaram, Dhabaleswar K. Panda:
Implementing Multidestination Worms in Switch-Based Parallel Systems: Architectural Alternatives and their Impact. ISCA 1997: 50-61 - [c28]Rajeev Sivaram, Dhabaleswar K. Panda, Craig B. Stunkel:
Multicasting in Irregular Networks with Cut-Through Switches Using Tree-Based Multidestination Worms. PCRCW 1997: 39-54 - [c27]Dhabaleswar K. Panda:
Designing High-Performance Communication Subsystems: Top Five Problems to Solve and Five Problems Not to Solve During the Next Five Years (Panel). PCRCW 1997: 153-158 - [c26]Donglai Dai, Dhabaleswar K. Panda:
How Can We Design Better Networks for DSM Systems? PCRCW 1997: 171-184 - [c25]Ram Kesavan, Dhabaleswar K. Panda:
Multicasting on Switch-Based Irregular Networks Using Multi-drop Path-Based Multidestination Worms. PCRCW 1997: 217-230 - [c24]Dhabaleswar K. Panda, Debashis Basak, Donglai Dai, Ram Kesavan, Rajeev Sivaram, Mohammad Banikazemi, Vijay Moorthy:
Simulation of Modern Parallel Systems: A CSIM-based Approach. WSC 1997: 1013-1020 - [e1]Dhabaleswar K. Panda, Craig B. Stunkel:
Communication and Architectural Support for Network-Based Parallel Computing, First International Workshop, CANPC '97, San Antonio, Texas, USA, February 1-2, 1997, Proceedings. Lecture Notes in Computer Science 1199, Springer 1997, ISBN 3-540-62573-9 [contents] - 1996
- [j6]Yu-Chee Tseng, Dhabaleswar K. Panda, Ten-Hwang Lai:
A Trip-Based Multicasting Model in Wormhole-Routed Networks with Virtual Channels. IEEE Trans. Parallel Distributed Syst. 7(2): 138-150 (1996) - [j5]Debashis Basak, Dhabaleswar K. Panda:
Designing Clustered Multiprocessor Systems under Packaging and Technological Advancements. IEEE Trans. Parallel Distributed Syst. 7(9): 962-978 (1996) - [c23]Donglai Dai, Dhabaleswar K. Panda:
Reducing Cache Invalidation Overheads in Wormhole Routed DSMs Using Multidestination Message Passing. ICPP, Vol. 1 1996: 138-145 - [c22]Ram Kesavan, Dhabaleswar K. Panda:
Minimizing Node Contention in Multiple Multicast on Wormhole k-ary N-Cube Networks. ICPP, Vol. 1 1996: 188-195 - [c21]Debashis Basak, Dhabaleswar K. Panda:
Designing Processor-Cluster Based Systems: Interplay Between Organizations and Broadcasting Algorithms. ICPP, Vol. 1 1996: 271-274 - [c20]N. S. Sundar, Doddaballapur Narasimha-Murthy Jayasimha, Dhabaleswar K. Panda, P. Sadayappan:
Hybrid Algorithms for Complete Exchange in 2D Meshes. International Conference on Supercomputing 1996: 181-188 - [c19]Debashis Basak, Dhabaleswar K. Panda, Mohammad Banikazemi:
Benefits of Processor Clustering in Designing Large Parallel Systems: When and How? IPPS 1996: 286-290 - [c18]Rajeev Sivaram, Dhabaleswar K. Panda, Craig B. Stunkel:
Efficient broadcast and multicast on multistage interconnection networks using multiport encoding. SPDP 1996: 36-45 - 1995
- [j4]Dhabaleswar K. Panda:
Fast barrier synchronization in wormhole k-ary n-cube networks with multidestination worms. Future Gener. Comput. Syst. 11(6): 585-602 (1995) - [c17]Dhabaleswar K. Panda:
Fast Barrier Synchronization in Wormhole k-ary n-cube Networks with Multidestination Worms. HPCA 1995: 200-209 - [c16]Yu-Chee Tseng, Sandeep K. S. Gupta, Dhabaleswar K. Panda:
An efficient scheme for complete exchange in 2D tori. IPPS 1995: 532-536 - [c15]Dhabaleswar K. Panda:
Global reduction in wormhole k-ary n-cube networks with multidestination exchange worms. IPPS 1995: 652-659 - 1994
- [c14]Debashis Basak, Dhabaleswar K. Panda:
Designing Large Hierarchical Multiprocessor Systems under Processor, Interconnection, and Packaging Advancements. ICPP (1) 1994: 63-66 - [c13]Vibha A. Dixit-Radiya, Dhabaleswar K. Panda:
Clustering and Intra-Processor Scheduling for Explicitly-Parallel Programs on Distributed-Memory Systems. IPPS 1994: 609-616 - [c12]Ravi Prakash, Dhabaleswar K. Panda:
Architectural issues in designing heterogeneous parallel systems with passive star-coupled optical interconnection. ISPAN 1994: 246-253 - [c11]Dhabaleswar K. Panda, Sanjay Singal, Pradeep Prabhakaran:
Multidestination Message Passing Mechanism Conforming to Base Wormhole Routing Scheme. PCRCW 1994: 131-145 - 1993
- [c10]Shobana Balakrishnan, Dhabaleswar K. Panda:
Impact of Multiple Consumption Channels on Wormhole Routed k-ary n-cube Networks. IPPS 1993: 163-167 - [c9]