default search action
Stanimire Tomov
Stan Tomov
Person information
- affiliation: University of Tennessee, Knoxville, TN, USA
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [j55]Piotr Luszczek, Ahmad Abdelfattah, Hartwig Anzt, Atsushi Suzuki, Stanimire Tomov:
Batched sparse and mixed-precision linear algebra interface for efficient use of GPU hardware accelerators in scientific applications. Future Gener. Comput. Syst. 160: 359-374 (2024) - [c115]Julian Halloy, Stephen Qiu, Stanimire Tomov, Kwai Wong:
PyMAGMA: A Python Interface for MAGMA. PEARC 2024: 40:1-40:4 - [c114]Noah Dahle, Meghan Kwon, Kwai Wong, Stanimire Tomov:
Using Graph Neural Networks to Predict Gene-Autoimmune Disease Associations. PEARC 2024: 93:1-93:4 - [c113]Kristina Wilson, Clifford Li, Hon Man Lau, Kwai Wong, Stanimire Tomov:
Implementing Single-precision and Half-precision Tensor Operations. PEARC 2024: 109:1-109:4 - [i13]Ahmad Abdelfattah, Willow Ahrens, Hartwig Anzt, Chris Armstrong, Ben Brock, Aydin Buluç, Federico Busato, Terry Cojean, Timothy A. Davis, Jim Demmel, Grace Dinh, David Gardener, Jan Fiala, Mark Gates, Azzam Haider, Toshiyuki Imamura, Pedro Valero-Lara, José E. Moreira, Xiaoye Sherry Li, Piotr Luszczek, Max Melichenko, Jose Moeira, Yvan Mokwinski, Riley Murray, Spencer Patty, Slaven Peles, Tobias Ribizel, E. Jason Riedy, Siva Rajamanickam, Piyush Sao, Manu Shantharam, Keita Teranishi, Stan Tomov, Yu-Hsiang Tsai, Heiko K. Weichelt:
Interface for Sparse Linear Algebra Operations. CoRR abs/2411.13259 (2024) - 2023
- [c112]Wissam M. Sid-Lakhdar, Sébastien Cayrols, Daniel Bielich, Ahmad Abdelfattah, Piotr Luszczek, Mark Gates, Stanimire Tomov, Hans Johansen, David B. Williams-Young, Timothy A. Davis, Jack J. Dongarra, Hartwig Anzt:
PAQR: Pivoting Avoiding QR factorization. IPDPS 2023: 322-332 - [c111]Ahmad Abdelfattah, Stanimire Tomov, Piotr Luszczek, Hartwig Anzt, Jack J. Dongarra:
GPU-based LU Factorization and Solve on Batches of Matrices with Band Structure. SC Workshops 2023: 1670-1679 - [d5]Jed Brown, Ahmad Abdelfattah, Valeria Barra, Natalie Beams, Jean-Sylvain Camier, Veselin Dobrev, Yohann Dudouit, Leila Ghaffari, Tzanio V. Kolev, David S. Medina, Will Pazner, Thilina Ratnayaka, Rezgar Shakeri, Jeremy L. Thompson, Stanimire Tomov, James Wright:
libCEED: Efficient Extensible Discretization. Version v0.12.0. Zenodo, 2023 [all versions] - 2022
- [c110]Sébastien Cayrols, Jiali Li, George Bosilca, Stanimire Tomov, Alan Ayala, Jack J. Dongarra:
Lossy all-to-all exchange for accelerating parallel 3-D FFTs on hybrid architectures with GPUs. CLUSTER 2022: 152-160 - [c109]Chiang-Heng Chien, Hongyi Fan, Ahmad Abdelfattah, Elias P. Tsigaridas, Stanimire Tomov, Benjamin B. Kimia:
GPU-Based Homotopy Continuation for Minimal Problems in Computer Vision. CVPR 2022: 15744-15755 - [c108]Ahmad Abdelfattah, Stan Tomov, Jack J. Dongarra:
Batch QR Factorization on GPUs: Design, Optimization, and Tuning. ICCS (1) 2022: 60-74 - [c107]Alan Ayala, Stan Tomov, Miroslav Stoyanov, Azzam Haidar, Jack J. Dongarra:
Performance Analysis of Parallel FFT on Large Multi-GPU Systems. IPDPS Workshops 2022: 372-381 - [c106]Ahmad Abdelfattah, Pieter Ghysels, Wajih Boukaram, Stanimire Tomov, Xiaoye Sherry Li, Jack J. Dongarra:
Addressing Irregular Patterns of Matrix Computations on GPUs and Their Impact on Applications Powered by Sparse Direct Solvers. SC 2022: 26:1-26:14 - [c105]Anna Fortenberry, Stanimire Tomov:
Extending MAGMA Portability with OneAPI. WACCPD@SC 2022: 22-31 - [d4]Jed Brown, Ahmad Abdelfattah, Valeria Barra, Natalie Beams, Jean-Sylvain Camier, Veselin Dobrev, Yohann Dudouit, Leila Ghaffari, Tzanio V. Kolev, David S. Medina, Will Pazner, Thilina Ratnayaka, Rezgar Shakeri, Jeremy L. Thompson, Stanimire Tomov, James Wright:
libCEED: Efficient Extensible Discretization. Version 0.11.0. Zenodo, 2022 [all versions] - 2021
- [j54]Zafar Iqbal, Saeid Nooshabadi, Ichitaro Yamazaki, Stanimire Tomov, Jack J. Dongarra:
Exploiting Block Structures of KKT Matrices for Efficient Solution of Convex Optimization Problems. IEEE Access 9: 116604-116611 (2021) - [j53]Ahmad Abdelfattah, Hartwig Anzt, Erik G. Boman, Erin C. Carson, Terry Cojean, Jack J. Dongarra, Alyson Fox, Mark Gates, Nicholas J. Higham, Xiaoye S. Li, Jennifer A. Loe, Piotr Luszczek, Srikara Pranesh, Siva Rajamanickam, Tobias Ribizel, Barry F. Smith, Kasia Swirydowicz, Stephen J. Thomas, Stanimire Tomov, Yaohung M. Tsai, Ulrike Meier Yang:
A survey of numerical linear algebra methods utilizing mixed-precision arithmetic. Int. J. High Perform. Comput. Appl. 35(4) (2021) - [j52]Tzanio V. Kolev, Paul F. Fischer, Misun Min, Jack J. Dongarra, Jed Brown, Veselin Dobrev, Tim Warburton, Stanimire Tomov, Mark S. Shephard, Ahmad Abdelfattah, Valeria Barra, Natalie Beams, Jean-Sylvain Camier, Noel Chalmers, Yohann Dudouit, Ali Karakus, Ian Karlin, Stefan Kerkemeier, Yu-Hsiang Lan, David S. Medina, Elia Merzari, Aleksandr Obabko, Will Pazner, Thilina Rathnayake, Cameron W. Smith, Lukas Spies, Kasia Swirydowicz, Jeremy L. Thompson, Ananias Tomboulides, Vladimir Z. Tomov:
Efficient exascale discretizations: High-order finite element methods. Int. J. High Perform. Comput. Appl. 35(6): 527-552 (2021) - [j51]Jack J. Dongarra, Mark Gates, Piotr Luszczek, Stanimire Tomov:
Translational process: Mathematical software perspective. J. Comput. Sci. 52: 101216 (2021) - [j50]Jed Brown, Ahmad Abdelfattah, Valeria Barra, Natalie N. Beams, Jean-Sylvain Camier, Veselin Dobrev, Yohann Dudouit, Leila Ghaffari, Tzanio V. Kolev, David S. Medina, Will Pazner, Thilina Rathnayake, Jeremy L. Thompson, Stan Tomov:
libCEED: Fast algebra for high-order element-based discretizations. J. Open Source Softw. 6(63): 2945 (2021) - [j49]Ahmad Abdelfattah, Valeria Barra, Natalie Beams, Ryan Bleile, Jed Brown, Jean-Sylvain Camier, Robert Carson, Noel Chalmers, Veselin Dobrev, Yohann Dudouit, Paul F. Fischer, Ali Karakus, Stefan Kerkemeier, Tzanio V. Kolev, Yu-Hsiang Lan, Elia Merzari, Misun Min, Malachi Phillips, Thilina Rathnayake, Robert N. Rieben, Thomas Stitt, Ananias Tomboulides, Stanimire Tomov, Vladimir Z. Tomov, Arturo Vargas, Tim Warburton, Kenneth Weiss:
GPU algorithms for Efficient Exascale Discretizations. Parallel Comput. 108: 102841 (2021) - [j48]Ahmad Abdelfattah, Timothy B. Costa, Jack J. Dongarra, Mark Gates, Azzam Haidar, Sven Hammarling, Nicholas J. Higham, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, Mawussi Zounon:
A Set of Batched Basic Linear Algebra Subprograms and LAPACK Routines. ACM Trans. Math. Softw. 47(3): 21:1-21:23 (2021) - [c104]Alan Ayala, Stan Tomov, Miroslav Stoyanov, Azzam Haidar, Jack J. Dongarra:
Accelerating Multi - Process Communication for Parallel 3-D FFT. ExaMPI@SC 2021: 46-53 - [c103]Daniel Sharp, Miroslav Stoyanov, Stanimire Tomov, Jack J. Dongarra:
A More Portable HeFFTe: Implementing a Fallback Algorithm for Scalable Fourier Transforms. HPEC 2021: 1-5 - [c102]Alan Ayala, Stanimire Tomov, Miroslav Stoyanov, Jack J. Dongarra:
Scalability Issues in FFT Computation. PaCT 2021: 279-287 - [d3]Jed Brown, Ahmad Abdelfattah, Valeria Barra, Natalie Beams, Jean-Sylvain Camier, Veselin Dobrev, Yohann Dudouit, Leila Ghaffari, Tzanio V. Kolev, David S. Medina, Will Pazner, Thilina Ratnayaka, Jeremy L. Thompson, Stan Tomov:
CEED/libCEED: v0.9.0. Version v0.9.0. Zenodo, 2021 [all versions] - [d2]Jed Brown, Ahmad Abdelfattah, Valeria Barra, Natalie Beams, Jean-Sylvain Camier, Veselin Dobrev, Yohann Dudouit, Leila Ghaffari, Tzanio V. Kolev, David S. Medina, Will Pazner, Thilina Ratnayaka, Jeremy L. Thompson, Stanimire Tomov:
libCEED: Efficient Extensible Discretization. Version 0.10.0. Zenodo, 2021 [all versions] - [d1]Jed Brown, Ahmad Abdelfattah, Valeria Barra, Natalie Beams, Jean-Sylvain Camier, Veselin Dobrev, Yohann Dudouit, Leila Ghaffari, Tzanio V. Kolev, David S. Medina, Will Pazner, Thilina Ratnayaka, Jeremy L. Thompson, Stanimire Tomov:
libCEED: Efficient Extensible Discretization. Version 0.10.1. Zenodo, 2021 [all versions] - [i12]Tzanio V. Kolev, Paul F. Fischer, Misun Min, Jack J. Dongarra, Jed Brown, Veselin Dobrev, Tim Warburton, Stanimire Tomov, Mark S. Shephard, Ahmad Abdelfattah, Valeria Barra, Natalie Beams, Jean-Sylvain Camier, Noel Chalmers, Yohann Dudouit, Ali Karakus, Ian Karlin, Stefan Kerkemeier, Yu-Hsiang Lan, David S. Medina, Elia Merzari, Aleksandr Obabko, Will Pazner, Thilina Rathnayake, Cameron W. Smith, Lukas Spies, Kasia Swirydowicz, Jeremy L. Thompson, Ananias Tomboulides, Vladimir Z. Tomov:
Efficient Exascale Discretizations: High-Order Finite Element Methods. CoRR abs/2109.04996 (2021) - [i11]Ahmad Abdelfattah, Valeria Barra, Natalie Beams, Ryan Bleile, Jed Brown, Jean-Sylvain Camier, Robert Carson, Noel Chalmers, Veselin Dobrev, Yohann Dudouit, Paul F. Fischer, Ali Karakus, Stefan Kerkemeier, Tzanio V. Kolev, Yu-Hsiang Lan, Elia Merzari, Misun Min, Malachi Phillips, Thilina Rathnayake, Robert N. Rieben, Thomas Stitt, Ananias Tomboulides, Stanimire Tomov, Vladimir Z. Tomov, Arturo Vargas, Tim Warburton, Kenneth Weiss:
GPU Algorithms for Efficient Exascale Discretizations. CoRR abs/2109.05072 (2021) - [i10]Chiang-Heng Chien, Hongyi Fan, Ahmad Abdelfattah, Elias P. Tsigaridas, Stanimire Tomov, Benjamin B. Kimia:
GPU-Based Homotopy Continuation for Minimal Problems in Computer Vision. CoRR abs/2112.03444 (2021) - 2020
- [j47]Yuechao Lu, Ichitaro Yamazaki, Fumihiko Ino, Yasuyuki Matsushita, Stanimire Tomov, Jack J. Dongarra:
Reducing the amount of out-of-core data access for GPU-accelerated randomized SVD. Concurr. Comput. Pract. Exp. 32(19) (2020) - [j46]Mohammed A. Al Farhan, Ahmad Abdelfattah, Stanimire Tomov, Mark Gates, Dalal Sukkari, Azzam Haidar, Robert Rosenberg, Jack J. Dongarra:
MAGMA templates for scalable linear algebra on emerging architectures. Int. J. High Perform. Comput. Appl. 34(6) (2020) - [j45]Ahmad Abdelfattah, Stanimire Tomov, Jack J. Dongarra:
Matrix multiplication on batches of small matrices in half and half-complex precisions. J. Parallel Distributed Comput. 145: 188-201 (2020) - [j44]Hartwig Anzt, Terry Cojean, Chen Yen-Chen, Jack J. Dongarra, Goran Flegar, Pratik Nayak, Stanimire Tomov, Yuhsiang M. Tsai, Weichung Wang:
Load-balancing Sparse Matrix Vector Product Kernels on GPUs. ACM Trans. Parallel Comput. 7(1): 2:1-2:26 (2020) - [c101]Cade Brown, Ahmad Abdelfattah, Stanimire Tomov, Jack J. Dongarra:
Design, Optimization, and Benchmarking of Dense Linear Algebra Algorithms on AMD GPUs. HPEC 2020: 1-7 - [c100]Ahmad Abdelfattah, Stan Tomov, Jack J. Dongarra:
Investigating the Benefit of FP16-Enabled Mixed-Precision Solvers for Symmetric Positive Definite Matrices Using GPUs. ICCS (2) 2020: 237-250 - [c99]Alan Ayala, Stanimire Tomov, Azzam Haidar, Jack J. Dongarra:
heFFTe: Highly Efficient FFT for Exascale. ICCS (1) 2020: 262-275 - [c98]Florent Lopez, Edmond Chow, Stanimire Tomov, Jack J. Dongarra:
Asynchronous SGD for DNN training on Shared-memory Parallel Architectures. IPDPS Workshops 2020: 995-998 - [c97]Natalie Beams, Ahmad Abdelfattah, Stan Tomov, Jack J. Dongarra, Tzanio V. Kolev, Yohann Dudouit:
High-Order Finite Element Method using Standard and Device-Level Batch GEMM on GPUs. ScalA@SC 2020: 53-60 - [c96]Rick Archibald, Edmond Chow, Eduardo F. D'Azevedo, Jack J. Dongarra, Markus Eisenbach, Rocco Febbo, Florent Lopez, Daniel Nichols, Stanimire Tomov, Kwai Wong, Junqi Yin:
Integrating Deep Learning in Domain Sciences at Exascale. SMC 2020: 35-50 - [i9]Ahmad Abdelfattah, Hartwig Anzt, Erik G. Boman, Erin C. Carson, Terry Cojean, Jack J. Dongarra, Mark Gates, Thomas Grützmacher, Nicholas J. Higham, Xiaoye Sherry Li, Neil Lindquist, Yang Liu, Jennifer A. Loe, Piotr Luszczek, Pratik Nayak, Srikara Pranesh, Sivasankaran Rajamanickam, Tobias Ribizel, Barry Smith, Kasia Swirydowicz, Stephen J. Thomas, Stanimire Tomov, Yaohung M. Tsai, Ichitaro Yamazaki, Ulrike Meier Yang:
A Survey of Numerical Methods Utilizing Mixed Precision Arithmetic. CoRR abs/2007.06674 (2020) - [i8]Rick Archibald, Edmond Chow, Eduardo F. D'Azevedo, Jack J. Dongarra, Markus Eisenbach, Rocco Febbo, Florent Lopez, Daniel Nichols, Stanimire Tomov, Kwai Wong, Junqi Yin:
Integrating Deep Learning in Domain Sciences at Exascale. CoRR abs/2011.11188 (2020)
2010 – 2019
- 2019
- [j43]Azzam Haidar, Heike Jagode, Phil Vaccaro, Asim YarKhan, Stanimire Tomov, Jack J. Dongarra:
Investigating power capping toward energy-efficient scientific applications. Concurr. Comput. Pract. Exp. 31(6) (2019) - [j42]M. Graham Lopez, Wayne Joubert, Verónica G. Vergara Larrea, Oscar R. Hernandez, Azzam Haidar, Stanimire Tomov, Jack J. Dongarra:
Evaluation of directive-based performance portable programming models. Int. J. High Perform. Comput. Netw. 14(2): 165-182 (2019) - [j41]Ian Masliah, Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Marc Baboulin, Joël Falcou, Jack J. Dongarra:
Algorithms and optimization techniques for high-performance matrix-matrix multiplications of very small matrices. Parallel Comput. 81: 1-21 (2019) - [j40]Dmitry Zaitsev, Stanimire Tomov, Jack J. Dongarra:
Solving Linear Diophantine Systems on Parallel Architectures. IEEE Trans. Parallel Distributed Syst. 30(5): 1158-1169 (2019) - [c95]Ahmad Abdelfattah, Stanimire Tomov, Jack J. Dongarra:
Progressive Optimization of Batched LU Factorization on GPUs. HPEC 2019: 1-6 - [c94]Ahmad Abdelfattah, Stanimire Tomov, Jack J. Dongarra:
Fast Batched Matrix Multiplication for Small Sizes Using Half-Precision Arithmetic on GPUs. IPDPS 2019: 111-122 - [c93]Ahmad Abdelfattah, Stanimire Tomov, Jack J. Dongarra:
Towards Half-Precision Computation for Complex Matrices: A Case Study for Mixed Precision Solvers on GPUs. ScalA@SC 2019: 17-24 - [c92]Daniel Nichols, Nathalie-Sofia Tomov, Frank Betancourt, Stanimire Tomov, Kwai Wong, Jack J. Dongarra:
MagmaDNN: Towards High-Performance Data Analytics and Machine Learning for Data-Driven Scientific Computing. ISC Workshops 2019: 490-503 - [c91]Kwai Wong, Stanimire Tomov, Jack J. Dongarra:
Hands-On Research and Training in High Performance Data Sciences, Data Analytics, and Machine Learning for Emerging Environments. ISC Workshops 2019: 643-655 - [c90]Frank Betancourt, Kwai Wong, Efosa Asemota, Quindell Marshall, Daniel Nichols, Stanimire Tomov:
openDIEL: A Parallel Workflow Engine and Data Analytics Framework. PEARC 2019: 20:1-20:7 - [c89]Daniel Nichols, Kwai Wong, Stanimire Tomov, Lucien Ng, Sihan Chen, Alex Gessinger:
MagmaDNN: Accelerated Deep Learning Using MAGMA. PEARC 2019: 71:1-71:6 - 2018
- [j39]Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack J. Dongarra:
Batched one-sided factorizations of tiny matrices using GPUs: Challenges and countermeasures. J. Comput. Sci. 26: 226-236 (2018) - [j38]Tingxing Dong, Azzam Haidar, Stanimire Tomov, Jack J. Dongarra:
Accelerating the SVD bi-diagonalization of a batch of small matrices using GPUs. J. Comput. Sci. 26: 237-245 (2018) - [j37]Mark Gates, Stanimire Tomov, Jack J. Dongarra:
Accelerating the SVD two stage bidiagonal reduction and divide and conquer using GPUs. Parallel Comput. 74: 3-18 (2018) - [j36]Jack J. Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, Ichitaro Yamazaki:
The Singular Value Decomposition: Anatomy of Optimizing an Algorithm for Extreme Scale. SIAM Rev. 60(4): 808-865 (2018) - [j35]Azzam Haidar, Ahmad Abdelfattah, Mawussi Zounon, Stanimire Tomov, Jack J. Dongarra:
A Guide for Achieving High Performance with Very Small Matrices on GPU: A Case Study of Batched LU and Cholesky Factorizations. IEEE Trans. Parallel Distributed Syst. 29(5): 973-984 (2018) - [j34]Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack J. Dongarra:
Analysis and Design Techniques towards High-Performance and Energy-Efficient Dense Linear Solvers on GPUs. IEEE Trans. Parallel Distributed Syst. 29(12): 2700-2712 (2018) - [c88]Anumeena Sorna, Xiaohe Cheng, Eduardo F. D'Azevedo, Kwai Wong, Stanimire Tomov:
Optimizing the Fast Fourier Transform Using Mixed Precision on Tensor Core Hardware. HiPC Workshops 2018: 3-7 - [c87]Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack J. Dongarra:
Optimizing GPU Kernels for Irregular Batch Workloads: A Case Study for Cholesky Factorization. HPEC 2018: 1-7 - [c86]Azzam Haidar, Ahmad Abdelfattah, Mawussi Zounon, Panruo Wu, Srikara Pranesh, Stanimire Tomov, Jack J. Dongarra:
The Design of Fast and Energy-Efficient Linear Solvers: On the Potential of Half-Precision Arithmetic and Iterative Refinement Techniques. ICCS (1) 2018: 586-600 - [c85]Ichitaro Yamazaki, Ahmad Abdelfattah, Akihiro Ida, Satoshi Ohshima, Stanimire Tomov, Rio Yokota, Jack J. Dongarra:
Performance of Hierarchical-matrix BiCGStab Solver on GPU Clusters. IPDPS 2018: 930-939 - [c84]Azzam Haidar, Stanimire Tomov, Jack J. Dongarra, Nicholas J. Higham:
Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers. SC 2018: 47:1-47:11 - [i7]Nathalie-Sofia Tomov, Stanimire Tomov:
On Deep Neural Networks for Detecting Heart Disease. CoRR abs/1808.07168 (2018) - 2017
- [j33]Ichitaro Yamazaki, Stanimire Tomov, Jack J. Dongarra:
Non-GPU-resident symmetric indefinite factorization. Concurr. Comput. Pract. Exp. 29(5) (2017) - [j32]Marc Baboulin, Jack J. Dongarra, Adrien Rémy, Stanimire Tomov, Ichitaro Yamazaki:
Solving dense symmetric indefinite systems using GPUs. Concurr. Comput. Pract. Exp. 29(9) (2017) - [j31]Jack J. Dongarra, Stanimire Tomov, Piotr Luszczek, Jakub Kurzak, Mark Gates, Ichitaro Yamazaki, Hartwig Anzt, Azzam Haidar, Ahmad Abdelfattah:
With Extreme Computing, the Rules Have Changed. Comput. Sci. Eng. 19(3): 52-62 (2017) - [j30]Ichitaro Yamazaki, Saeid Nooshabadi, Stanimire Tomov, Jack J. Dongarra:
Structure-Aware Linear Solver for Realtime Convex Optimization for Embedded Systems. IEEE Embed. Syst. Lett. 9(3): 61-64 (2017) - [j29]Hartwig Anzt, Stanimire Tomov, Jack J. Dongarra:
On the performance and energy efficiency of sparse linear algebra on GPUs. Int. J. High Perform. Comput. Appl. 31(5): 375-390 (2017) - [j28]Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack J. Dongarra:
Fast Cholesky factorization on GPUs for batch and native modes in MAGMA. J. Comput. Sci. 20: 85-93 (2017) - [c83]Ichitaro Yamazaki, Stanimire Tomov, Jack J. Dongarra:
Sampling algorithms to update truncated SVD. IEEE BigData 2017: 817-826 - [c82]Azzam Haidar, Heike Jagode, Asim YarKhan, Phil Vaccaro, Stanimire Tomov, Jack J. Dongarra:
Power-aware computing: Measurement, control, and performance analysis for Intel Xeon Phi. HPEC 2017: 1-7 - [c81]Azzam Haidar, Khairul Kabir, Diana Fayad, Stanimire Tomov, Jack J. Dongarra:
Out of memory SVD solver for big data. HPEC 2017: 1-7 - [c80]Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack J. Dongarra:
Factorization and Inversion of a Million Matrices using GPUs: Challenges and Countermeasures. ICCS 2017: 606-615 - [c79]Tingxing Dong, Azzam Haidar, Stanimire Tomov, Jack J. Dongarra:
Optimizing the SVD Bidiagonalization Process for a Batch of Small Matrices. ICCS 2017: 1008-1018 - [c78]Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack J. Dongarra:
Novel HPC techniques to batch execution of many variable size BLAS computations on GPUs. ICS 2017: 5:1-5:10 - [c77]Azzam Haidar, Ahmad Abdelfattah, Stanimire Tomov, Jack J. Dongarra:
High-performance Cholesky factorization for GPU-only execution. GPGPU@PPoPP 2017: 42-52 - [c76]Azzam Haidar, Panruo Wu, Stanimire Tomov, Jack J. Dongarra:
Investigating half precision arithmetic to accelerate dense linear system solvers. ScalA@SC 2017: 10:1-10:8 - [c75]Khairul Kabir, Azzam Haidar, Stanimire Tomov, Aurélien Bouteiller, Jack J. Dongarra:
A Framework for Out of Memory SVD Algorithms. ISC 2017: 158-178 - [p5]Hartwig Anzt, Jack J. Dongarra, Mark Gates, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, Ichitaro Yamazaki:
Bringing High Performance Computing to Big Data Algorithms. Handbook of Big Data Technologies 2017: 777-806 - 2016
- [j27]Ahmad Abdelfattah, Hartwig Anzt, Jack J. Dongarra, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, Ichitaro Yamazaki, Asim YarKhan:
Linear algebra software for large-scale accelerated multicore computing. Acta Numer. 25: 1-160 (2016) - [j26]Ichitaro Yamazaki, Stanimire Tomov, Jack J. Dongarra:
Stability and Performance of Various Singular Value QR Implementations on Multicore CPU with a GPU. ACM Trans. Math. Softw. 43(2): 10:1-10:18 (2016) - [c74]Ian Masliah, Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Marc Baboulin, Joël Falcou, Jack J. Dongarra:
High-Performance Matrix-Matrix Multiplications of Very Small Matrices. Euro-Par 2016: 659-671 - [c73]Azzam Haidar, Benjamin Brock, Stanimire Tomov, Michael Guidry, Jay Jay Billings, Daniel Shyles, Jack J. Dongarra:
Performance analysis and acceleration of explicit integration for large kinetic networks using batched GPU computations. HPEC 2016: 1-7 - [c72]Azzam Haidar, Stanimire Tomov, Konstantin Arturov, Murat Efe Guney, Shane Story, Jack J. Dongarra:
LU, QR, and Cholesky factorizations: Programming model, performance analysis and optimization techniques for the Intel Knights Landing Xeon Phi. HPEC 2016: 1-7 - [c71]Ahmad Abdelfattah, Marc Baboulin, Veselin Dobrev, Jack J. Dongarra, Christopher W. Earl, Joel Falcou, Azzam Haidar, Ian Karlin, Tzanio V. Kolev, Ian Masliah, Stanimire Tomov:
High-Performance Tensor Contractions for GPUs. ICCS 2016: 108-118 - [c70]Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack J. Dongarra:
Performance Tuning and Optimization Techniques of Fixed and Variable Size Batched Cholesky Factorization on GPUs. ICCS 2016: 119-130 - [c69]Chris J. Newburn, Gaurav Bansal, Michael Wood, Luis Crivelli, Judit Planas, Alejandro Duran, Paulo Souza, Leonardo Borges, Piotr Luszczek, Stanimire Tomov, Jack J. Dongarra, Hartwig Anzt, Mark Gates, Azzam Haidar, Yulu Jia, Khairul Kabir, Ichitaro Yamazaki, Jesús Labarta:
Heterogeneous Streaming. IPDPS Workshops 2016: 611-620 - [c68]Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack J. Dongarra:
On the Development of Variable Size Batched Computation for Heterogeneous Parallel Architectures. IPDPS Workshops 2016: 1249-1258 - [c67]M. Graham Lopez, Verónica G. Vergara Larrea, Wayne Joubert, Oscar R. Hernandez, Azzam Haidar, Stanimire Tomov, Jack J. Dongarra:
Towards Achieving Performance Portability Using Directives for Accelerators. WACCPD@SC 2016: 13-24 - [c66]Ahmad Abdelfattah, Azzam Haidar, Stanimire Tomov, Jack J. Dongarra:
Performance, Design, and Autotuning of Batched GEMM for GPUs. ISC 2016: 21-38 - 2015
- [j25]Hartwig Anzt, Stanimire Tomov, Piotr Luszczek, William B. Sawyer, Jack J. Dongarra:
Acceleration of GPU-based Krylov solvers via data transfer reduction. Int. J. High Perform. Comput. Appl. 29(3): 366-383 (2015) - [j24]Ichitaro Yamazaki, Stanimire Tomov, Jack J. Dongarra:
Mixed-Precision Cholesky QR Factorization and Its Case Studies on Multicore CPU with Multiple GPUs. SIAM J. Sci. Comput. 37(3) (2015) - [j23]Ichitaro Yamazaki, Stanimire Tomov, Jack J. Dongarra:
Computing Low-Rank Approximation of a Dense Matrix on Multicore CPUs with a GPU and Its Application to Solving a Hierarchically Semiseparable Linear System of Equations. Sci. Program. 2015: 246019:1-246019:17 (2015) - [j22]Jack J. Dongarra, Mark Gates, Azzam Haidar, Yulu Jia, Khairul Kabir, Piotr Luszczek, Stanimire Tomov:
HPC Programming on Intel Many-Integrated-Core Hardware with MAGMA Port to Xeon Phi. Sci. Program. 2015: 502593:1-502593:11 (2015) - [j21]Jack J. Dongarra, Maksims Abalenkovs, Ahmad Abdelfattah, Mark Gates, Azzam Haidar, Jakub Kurzak, Piotr Luszczek, Stanimire Tomov, Ichitaro Yamazaki, Asim YarKhan:
Parallel Programming Models for Dense Linear Algebra on Heterogeneous Systems. Supercomput. Front. Innov. 2(4): 67-86 (2015) - [c65]Azzam Haidar, Asim YarKhan, Chongxiao Cao, Piotr Luszczek, Stanimire Tomov, Jack J. Dongarra:
Flexible Linear Algebra Development and Scheduling with Cholesky Factorization. HPCC/CSS/ICESS 2015: 861-864 - [c64]Azzam Haidar, Stanimire Tomov, Piotr Luszczek, Jack J. Dongarra:
MAGMA embedded: Towards a dense linear algebra library for energy efficient extreme computing. HPEC 2015: 1-6 - [c63]Khairul Kabir, Azzam Haidar, Stanimire Tomov, Jack J. Dongarra:
Performance Analysis and Optimisation of Two-sided Factorization Algorithms for Heterogeneous Platform. ICCS 2015: 180-190 - [c62]Marc Baboulin, Jack J. Dongarra, Adrien Rémy, Stanimire Tomov, Ichitaro Yamazaki:
Dense Symmetric Indefinite Factorization on GPU Accelerated Architectures. PPAM (1) 2015: 86-95 - [c61]Hartwig Anzt, Stanimire Tomov, Jack J. Dongarra:
Energy efficiency and performance frontiers for sparse computations on GPU supercomputers. PMAM@PPoPP 2015: 1-10 - [c60]Azzam Haidar, Tingxing Dong, Piotr Luszczek, Stanimire Tomov, Jack J. Dongarra:
Optimization for performance and energy for batched matrix computations on GPUs. GPGPU@PPoPP 2015: 59-69 - [c59]Azzam Haidar, Tingxing Dong, Piotr Luszczek, Stanimire Tomov, Jack J. Dongarra:
Towards batched linear solvers on accelerated hardware platforms. PPoPP 2015: 261-262 - [c58]