default search action
Gennady Pekhimenko
Person information
- affiliation: University of Toronto
- affiliation: Microsoft Research
- affiliation (former): Carnegie Mellon University
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [c61]Qidong Su, Jiacheng Yang, Gennady Pekhimenko:
BOOM: Use your Desktop to Accurately Predict the Performance of Large Deep Neural Networks. PACT 2024: 284-296 - [c60]Jiacheng Yang, Christina Giannoula, Jun Wu, Mostafa Elhoushi, James Gleeson, Gennady Pekhimenko:
Minuet: Accelerating 3D Sparse Convolutions on GPUs. EuroSys 2024: 786-802 - [c59]Renbo Tu, Colin White, Jean Kossaifi, Boris Bonev, Gennady Pekhimenko, Kamyar Azizzadenesheli, Anima Anandkumar:
Guaranteed Approximation Bounds for Mixed-Precision Neural Operators. ICLR 2024 - [c58]Baorun Mu, Christina Giannoula, Shang Wang, Gennady Pekhimenko:
Sylva: Sparse Embedded Adapters via Hierarchical Approximate Second-Order Information. ICS 2024: 485-497 - [c57]Yubo Gao, Maryam Haghifam, Christina Giannoula, Renbo Tu, Gennady Pekhimenko, Nandita Vijaykumar:
Proteus: Preserving Model Confidentiality during Graph Optimizations. MLSys 2024 - [e1]Phillip B. Gibbons, Gennady Pekhimenko, Christopher De Sa:
Proceedings of the Seventh Annual Conference on Machine Learning and Systems, MLSys 2024, Santa Clara, CA, USA, May 13-16, 2024. mlsys.org 2024 [contents] - [i48]Jiacheng Yang, Christina Giannoula, Jun Wu, Mostafa Elhoushi, James Gleeson, Gennady Pekhimenko:
Minuet: Accelerating 3D Sparse Convolutions on GPUs. CoRR abs/2401.06145 (2024) - [i47]Christina Giannoula, Peiming Yang, Ivan Fernandez Vega, Jiacheng Yang, Yu Xin Li, Juan Gómez-Luna, Mohammad Sadrosadati, Onur Mutlu, Gennady Pekhimenko:
Accelerating Graph Neural Networks on Real Processing-In-Memory Systems. CoRR abs/2402.16731 (2024) - [i46]Yubo Gao, Maryam Haghifam, Christina Giannoula, Renbo Tu, Gennady Pekhimenko, Nandita Vijaykumar:
Proteus: Preserving Model Confidentiality during Graph Optimizations. CoRR abs/2404.12512 (2024) - [i45]Honghua Dong, Qidong Su, Yubo Gao, Zhaoyu Li, Yangjun Ruan, Gennady Pekhimenko, Chris J. Maddison, Xujie Si:
APPL: A Prompt Programming Language for Harmonious Integration of Programs and Large Language Model Prompts. CoRR abs/2406.13161 (2024) - [i44]Wei Zhao, Anand Jayarajan, Gennady Pekhimenko:
Tally: Non-Intrusive Performance Isolation for Concurrent Deep Learning Workloads. CoRR abs/2410.07381 (2024) - 2023
- [j9]Alexandros Karargyris, Renato Umeton, Micah J. Sheller, Alejandro Aristizabal, Johnu George, Anna Wuest, Sarthak Pati, Hasan Kassem, Maximilian Zenk, Ujjwal Baid, Prakash Narayana Moorthy, Alexander Chowdhury, Junyi Guo, Sahil S. Nalawade, Jacob Rosenthal, David Kanter, Maria Xenochristou, Daniel J. Beutel, Verena Chung, Timothy Bergquist, James A. Eddy, Abubakar Abid, Lewis Tunstall, Omar Sanseviero, Dimitrios Dimitriadis, Yiming Qian, Xinxing Xu, Yong Liu, Rick Siow Mong Goh, Srini Bala, Victor Bittorf, Sreekar Reddy Puchala, Biagio Ricciuti, Soujanya Samineni, Eshna Sengupta, Akshay Chaudhari, Cody Coleman, Bala Desinghu, Gregory F. Diamos, Debo Dutta, Diane Feddema, Grigori Fursin, Xinyuan Huang, Satyananda Kashyap, Nicholas D. Lane, Indranil Mallick, Pietro Mascagni, Virendra Mehta, Cassiano Ferro Moraes, Vivek Natarajan, Nikola Nikolov, Nicolas Padoy, Gennady Pekhimenko, Vijay Janapa Reddi, G. Anthony Reina, Pablo Ribalta, Abhishek Singh, Jayaraman J. Thiagarajan, Jacob Albrecht, Thomas Wolf, Geralyn Miller, Huazhu Fu, Prashant Shah, Daguang Xu, Poonam Yadav, David Talby, Mark M. Awad, Jeremy P. Howard, Michael Rosenthal, Luigi Marchionni, Massimo Loda, Jason M. Johnson, Spyridon Bakas, Peter Mattson:
Federated benchmarking of medical artificial intelligence with MedPerf. Nat. Mac. Intell. 5(7): 799-810 (2023) - [c56]Qidong Su, Chuqin Geng, Gennady Pekhimenko, Xujie Si:
TorchProbe: Fuzzing Dynamic Deep Learning Compilers. APLAS 2023: 310-331 - [c55]Yaoyao Ding, Cody Hao Yu, Bojian Zheng, Yizhi Liu, Yida Wang, Gennady Pekhimenko:
Hidet: Task-Mapping Programming Paradigm for Deep Learning Tensor Programs. ASPLOS (2) 2023: 370-384 - [c54]Anand Jayarajan, Wei Zhao, Yudi Sun, Gennady Pekhimenko:
TiLT: A Time-Centric Approach for Stream Query Optimization and Parallelization. ASPLOS (2) 2023: 818-832 - [c53]Bojian Zheng, Cody Hao Yu, Jie Wang, Yaoyao Ding, Yizhi Liu, Yida Wang, Gennady Pekhimenko:
Grape: Practical and Efficient Graphed Execution for Dynamic Deep Neural Networks on GPUs. MICRO 2023: 1364-1380 - [c52]Daniel Snider, Fanny Chevalier, Gennady Pekhimenko:
Hotline Profiler: Automatic Annotation and A Multi-Scale Timeline for Visualizing Time-Use in DNN Training. MLSys 2023 - [c51]Chenhao Jiang, Anand Jayarajan, Hao Lu, Gennady Pekhimenko:
Arbitor: A Numerically Accurate Hardware Emulation Tool for DNN Accelerators. USENIX ATC 2023: 519-536 - [i43]Anand Jayarajan, Wei Zhao, Yudi Sun, Gennady Pekhimenko:
TiLT: A Time-Centric Approach for Stream Query Optimization and Parallelization. CoRR abs/2301.12030 (2023) - [i42]Colin White, Renbo Tu, Jean Kossaifi, Gennady Pekhimenko, Kamyar Azizzadenesheli, Anima Anandkumar:
Speeding up Fourier Neural Operators via Mixed Precision. CoRR abs/2307.15034 (2023) - [i41]Qidong Su, Christina Giannoula, Gennady Pekhimenko:
The Synergy of Speculative Decoding and Batching in Serving Large Language Models. CoRR abs/2310.18813 (2023) - [i40]Qidong Su, Chuqin Geng, Gennady Pekhimenko, Xujie Si:
TorchProbe: Fuzzing Dynamic Deep Learning Compilers. CoRR abs/2310.20078 (2023) - [i39]Kevin Song, Jiacheng Yang, Sihang Liu, Gennady Pekhimenko:
Lightweight Frequency-Based Tiering for CXL Memory Systems. CoRR abs/2312.04789 (2023) - 2022
- [c50]Han Jie Qiu, Sihang Liu, Xinyang Song, Samira Manabi Khan, Gennady Pekhimenko:
Pavise: Integrating Fault Tolerance Support for Persistent Memory Applications. PACT 2022: 109-123 - [c49]Xiaodan Serina Tan, Pavel Golikov, Nandita Vijaykumar, Gennady Pekhimenko:
GPUPool: A Holistic Approach to Fine-Grained GPU Sharing in the Cloud. PACT 2022: 317-332 - [c48]Ao Li, Bojian Zheng, Gennady Pekhimenko, Fan Long:
Automatic Horizontal Fusion for GPU Kernels. CGO 2022: 14-27 - [c47]Sana Tonekaboni, Gabriela Morgenshtern, Azadeh Assadi, Aslesha Pokhrel, Xi Huang, Anand Jayarajan, Robert Greer, Gennady Pekhimenko, Melissa D. McCradden, Mjaye Mazwi, Anna Goldenberg:
How to validate Machine Learning Models Prior to Deployment: Silent trial protocol for evaluation of real-time models at ICU. CHIL 2022: 169-182 - [c46]Gennady Pekhimenko:
Keynote Talk 1: Efficient DNN Training at Scale: from Algorithms to Hardware. IPDPS Workshops 2022: 1244 - [c45]Bojian Zheng, Ziheng Jiang, Cody Hao Yu, Haichen Shen, Joshua Fromm, Yizhi Liu, Yida Wang, Luis Ceze, Tianqi Chen, Gennady Pekhimenko:
DietCode: Automatic Optimization for Dynamic Tensor Programs. MLSys 2022 - [c44]Muralidhar Andoorveedu, Zhanda Zhu, Bojian Zheng, Gennady Pekhimenko:
Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction. NeurIPS 2022 - [c43]Hongyu Zhu, Ruofan Wu, Yijia Diao, Shanbin Ke, Haoyu Li, Chen Zhang, Jilong Xue, Lingxiao Ma, Yuqing Xia, Wei Cui, Fan Yang, Mao Yang, Lidong Zhou, Asaf Cidon, Gennady Pekhimenko:
ROLLER: Fast and Efficient Tensor Compilation for Deep Learning. OSDI 2022: 233-248 - [i38]James Gleeson, Daniel Snider, Yvonne Yang, Moshe Gabel, Eyal de Lara, Gennady Pekhimenko:
Optimizing Data Collection in Deep Reinforcement Learning. CoRR abs/2207.07736 (2022) - [i37]Yaoyao Ding, Cody Hao Yu, Bojian Zheng, Yizhi Liu, Yida Wang, Gennady Pekhimenko:
Hidet: Task Mapping Programming Paradigm for Deep Learning Tensor Programs. CoRR abs/2210.09603 (2022) - [i36]Muralidhar Andoorveedu, Zhanda Zhu, Bojian Zheng, Gennady Pekhimenko:
Tempo: Accelerating Transformer-Based Model Training through Memory Footprint Reduction. CoRR abs/2210.10246 (2022) - 2021
- [j8]Anirudh Mohan Kaushik, Gennady Pekhimenko, Hiren D. Patel:
Gretch: A Hardware Prefetcher for Graph Analytics. ACM Trans. Archit. Code Optim. 18(2): 18:1-18:25 (2021) - [c42]Anand Jayarajan, Kimberly Hau, Andrew Goodwin, Gennady Pekhimenko:
LifeStream: a high-performance stream processing engine for periodic streams. ASPLOS 2021: 107-122 - [c41]Ziqi Wang, Chul-Hwan Choo, Michael A. Kozuch, Todd C. Mowry, Gennady Pekhimenko, Vivek Seshadri, Dimitrios Skarlatos:
NVOverlay: Enabling Efficient and Scalable High-Frequency Snapshotting to NVM. ISCA 2021: 498-511 - [c40]Omar Mohamed Awad, Mostafa Mahmoud, Isak Edo, Ali Hadi Zadeh, Ciaran Bannon, Anand Jayarajan, Gennady Pekhimenko, Andreas Moshovos:
FPRaker: A Processing Element For Accelerating Neural Network Training. MICRO 2021: 857-869 - [c39]James Gleeson, Moshe Gabel, Gennady Pekhimenko, Eyal de Lara, Srivatsan Krishnan, Vijay Janapa Reddi:
RL-Scope: Cross-stack Profiling for Deep Reinforcement Learning Workloads. MLSys 2021 - [c38]Shang Wang, Peiming Yang, Yuxuan Zheng, Xin Li, Gennady Pekhimenko:
Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models. MLSys 2021 - [c37]Yaoyao Ding, Ligeng Zhu, Zhihao Jia, Gennady Pekhimenko, Song Han:
IOS: Inter-Operator Scheduler for CNN Acceleration. MLSys 2021 - [c36]Isak Edo Vivancos, Sayeh Sharify, Daniel Ly-Ma, Ameer Abdelhadi, Ciaran Bannon, Milos Nikolic, Mostafa Mahmoud, Alberto Delmas Lascorz, Gennady Pekhimenko, Andreas Moshovos:
Boveda: Building an On-Chip Deep Learning Memory Hierarchy Brick by Brick. MLSys 2021 - [c35]Michael Diskin, Alexey Bukhtiyarov, Max Ryabinin, Lucile Saulnier, Quentin Lhoest, Anton Sinitsin, Dmitry Popov, Dmitry V. Pyrkin, Maxim Kashirin, Alexander Borzunov, Albert Villanova del Moral, Denis Mazur, Ilia Kobelev, Yacine Jernite, Thomas Wolf, Gennady Pekhimenko:
Distributed Deep Learning In Open Collaborations. NeurIPS 2021: 7879-7897 - [c34]Max Ryabinin, Eduard Gorbunov, Vsevolod Plokhotnyuk, Gennady Pekhimenko:
Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices. NeurIPS 2021: 18195-18211 - [c33]Geoffrey X. Yu, Yubo Gao, Pavel Golikov, Gennady Pekhimenko:
Habitat: A Runtime-Based Computational Performance Predictor for Deep Neural Network Training. USENIX ATC 2021: 503-521 - [i35]Geoffrey X. Yu, Yubo Gao, Pavel Golikov, Gennady Pekhimenko:
Computational Performance Predictions for Deep Neural Network Training: A Runtime-Based Approach. CoRR abs/2102.00527 (2021) - [i34]Shang Wang, Peiming Yang, Yuxuan Zheng, Xin Li, Gennady Pekhimenko:
Horizontally Fused Training Array: An Effective Hardware Utilization Squeezer for Training Novel Deep Learning Models. CoRR abs/2102.02344 (2021) - [i33]James Gleeson, Srivatsan Krishnan, Moshe Gabel, Vijay Janapa Reddi, Eyal de Lara, Gennady Pekhimenko:
RL-Scope: Cross-Stack Profiling for Deep Reinforcement Learning Workloads. CoRR abs/2102.04285 (2021) - [i32]Max Ryabinin, Eduard Gorbunov, Vsevolod Plokhotnyuk, Gennady Pekhimenko:
Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices. CoRR abs/2103.03239 (2021) - [i31]Michael Diskin, Alexey Bukhtiyarov, Max Ryabinin, Lucile Saulnier, Quentin Lhoest, Anton Sinitsin, Dmitry Popov, Dmitry V. Pyrkin, Maxim Kashirin, Alexander Borzunov, Albert Villanova del Moral, Denis Mazur, Ilia Kobelev, Yacine Jernite, Thomas Wolf, Gennady Pekhimenko:
Distributed Deep Learning in Open Collaborations. CoRR abs/2106.10207 (2021) - [i30]Alexandros Karargyris, Renato Umeton, Micah J. Sheller, Alejandro Aristizabal, Johnu George, Srini Bala, Daniel J. Beutel, Victor Bittorf, Akshay Chaudhari, Alexander Chowdhury, Cody Coleman, Bala Desinghu, Gregory F. Diamos, Debo Dutta, Diane Feddema, Grigori Fursin, Junyi Guo, Xinyuan Huang, David Kanter, Satyananda Kashyap, Nicholas D. Lane, Indranil Mallick, Pietro Mascagni, Virendra Mehta, Vivek Natarajan, Nikola Nikolov, Nicolas Padoy, Gennady Pekhimenko, Vijay Janapa Reddi, G. Anthony Reina, Pablo Ribalta, Jacob Rosenthal, Abhishek Singh, Jayaraman J. Thiagarajan, Anna Wuest, Maria Xenochristou, Daguang Xu, Poonam Yadav, Michael Rosenthal, Massimo Loda, Jason M. Johnson, Peter Mattson:
MedPerf: Open Benchmarking Platform for Medical Artificial Intelligence using Federated Evaluation. CoRR abs/2110.01406 (2021) - 2020
- [c32]Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, Ramesh Chukka, Cody Coleman, Sam Davis, Pan Deng, Greg Diamos, Jared Duke, Dave Fick, J. Scott Gardner, Itay Hubara, Sachin Idgunji, Thomas B. Jablin, Jeff Jiao, Tom St. John, Pankaj Kanwar, David Lee, Jeffery Liao, Anton Lokhmotov, Francisco Massa, Peng Meng, Paulius Micikevicius, Colin Osborne, Gennady Pekhimenko, Arun Tejusve Raghunath Rajan, Dilip Sequeira, Ashish Sirasao, Fei Sun, Hanlin Tang, Michael Thomson, Frank Wei, Ephrem Wu, Lingjie Xu, Koichi Yamada, Bing Yu, George Yuan, Aaron Zhong, Peizhao Zhang, Yuchen Zhou:
MLPerf Inference Benchmark. ISCA 2020: 446-459 - [c31]Bojian Zheng, Nandita Vijaykumar, Gennady Pekhimenko:
Echo: Compiler-based GPU Memory Footprint Reduction for LSTM RNN Training. ISCA 2020: 1089-1102 - [c30]Mostafa Mahmoud, Isak Edo, Ali Hadi Zadeh, Omar Mohamed Awad, Gennady Pekhimenko, Jorge Albericio, Andreas Moshovos:
TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training. MICRO 2020: 781-795 - [c29]Shang Wang, Yifan Bai, Gennady Pekhimenko:
BPPSA: Scaling Back-propagation by Parallel Scan Algorithm. MLSys 2020 - [c28]Peter Mattson, Christine Cheng, Gregory F. Diamos, Cody Coleman, Paulius Micikevicius, David A. Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, David Brooks, Dehao Chen, Debo Dutta, Udit Gupta, Kim M. Hazelwood, Andy Hock, Xinyuan Huang, Daniel Kang, David Kanter, Naveen Kumar, Jeffery Liao, Deepak Narayanan, Tayo Oguntebi, Gennady Pekhimenko, Lillian Pentecost, Vijay Janapa Reddi, Taylor Robie, Tom St. John, Carole-Jean Wu, Lingjie Xu, Cliff Young, Matei Zaharia:
MLPerf Training Benchmark. MLSys 2020 - [c27]Geoffrey X. Yu, Tovi Grossman, Gennady Pekhimenko:
Skyline: Interactive In-Editor Computational Performance Profiling for Deep Neural Network Training. UIST 2020: 126-139 - [c26]Hongyu Zhu, Amar Phanishayee, Gennady Pekhimenko:
Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training. USENIX ATC 2020: 337-352 - [i29]Hongyu Zhu, Amar Phanishayee, Gennady Pekhimenko:
Daydream: Accurately Estimating the Efficacy of Optimizations for DNN Training. CoRR abs/2006.03318 (2020) - [i28]Ao Li, Bojian Zheng, Gennady Pekhimenko, Fan Long:
Automatic Horizontal Fusion for GPU Kernels. CoRR abs/2007.01277 (2020) - [i27]Jiahuang Lin, Xin Li, Gennady Pekhimenko:
Multi-node Bert-pretraining: Cost-efficient Approach. CoRR abs/2008.00177 (2020) - [i26]Geoffrey X. Yu, Tovi Grossman, Gennady Pekhimenko:
Skyline: Interactive In-Editor Computational Performance Profiling for Deep Neural Network Training. CoRR abs/2008.06798 (2020) - [i25]Mostafa Mahmoud, Isak Edo Vivancos, Ali Hadi Zadeh, Omar Mohamed Awad, Gennady Pekhimenko, Jorge Albericio, Andreas Moshovos:
TensorDash: Exploiting Sparsity to Accelerate Deep Neural Network Training and Inference. CoRR abs/2009.00748 (2020) - [i24]Omar Mohamed Awad, Mostafa Mahmoud, Isak Edo Vivancos, Ali Hadi Zadeh, Ciaran Bannon, Anand Jayarajan, Gennady Pekhimenko, Andreas Moshovos:
FPRaker: A Processing Element For Accelerating Neural Network Training. CoRR abs/2010.08065 (2020) - [i23]Yaoyao Ding, Ligeng Zhu, Zhihao Jia, Gennady Pekhimenko, Song Han:
IOS: Inter-Operator Scheduler for CNN Acceleration. CoRR abs/2011.01302 (2020) - [i22]Anand Jayarajan, Kimberly Hau, Andrew Goodwin, Gennady Pekhimenko:
LifeStream: A High-performance Stream Processing Engine for Waveform Data. CoRR abs/2012.00192 (2020)
2010 – 2019
- 2019
- [c25]Hongyu Miao, Myeongjae Jeon, Gennady Pekhimenko, Kathryn S. McKinley, Felix Xiaozhu Lin:
StreamBox-HBM: Stream Analytics on High Bandwidth Hybrid Memory. ASPLOS 2019: 167-181 - [c24]Sihang Liu, Korakit Seemakhupt, Gennady Pekhimenko, Aasheesh Kolli, Samira Manabi Khan:
Janus: optimizing memory and storage support for non-volatile memory systems. ISCA 2019: 143-156 - [c23]Anand Jayarajan, Jinliang Wei, Garth Gibson, Alexandra Fedorova, Gennady Pekhimenko:
Priority-based Parameter Propagation for Distributed DNN Training. SysML 2019 - [p2]Amir Yazdanbakhsh, Gennady Pekhimenko, Hadi Esmaeilzadeh, Onur Mutlu, Todd C. Mowry:
Towards Breaking the Memory Bandwidth Wall Using Approximate Value Prediction. Approximate Circuits 2019: 417-441 - [i21]Hongyu Miao, Myeongjae Jeon, Gennady Pekhimenko, Kathryn S. McKinley, Felix Xiaozhu Lin:
StreamBox-HBM: Stream Analytics on High Bandwidth Hybrid Memory. CoRR abs/1901.01328 (2019) - [i20]Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Eric S. Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros G. Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim M. Hazelwood, Furong Huang, Martin Jaggi, Kevin G. Jamieson, Michael I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konecný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Gordon Murray, Dimitris S. Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Randall Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric P. Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar:
SysML: The New Frontier of Machine Learning Systems. CoRR abs/1904.03257 (2019) - [i19]Anand Jayarajan, Jinliang Wei, Garth Gibson, Alexandra Fedorova, Gennady Pekhimenko:
Priority-based Parameter Propagation for Distributed DNN Training. CoRR abs/1905.03960 (2019) - [i18]Shang Wang, Yifan Bai, Gennady Pekhimenko:
Scaling Back-propagation by Parallel Scan Algorithm. CoRR abs/1907.10134 (2019) - [i17]Peter Mattson, Christine Cheng, Cody Coleman, Greg Diamos, Paulius Micikevicius, David A. Patterson, Hanlin Tang, Gu-Yeon Wei, Peter Bailis, Victor Bittorf, David Brooks, Dehao Chen, Debojyoti Dutta, Udit Gupta, Kim M. Hazelwood, Andrew Hock, Xinyuan Huang, Bill Jia, Daniel Kang, David Kanter, Naveen Kumar, Jeffery Liao, Guokai Ma, Deepak Narayanan, Tayo Oguntebi, Gennady Pekhimenko, Lillian Pentecost, Vijay Janapa Reddi, Taylor Robie, Tom St. John, Carole-Jean Wu, Lingjie Xu, Cliff Young, Matei Zaharia:
MLPerf Training Benchmark. CoRR abs/1910.01500 (2019) - [i16]Vijay Janapa Reddi, Christine Cheng, David Kanter, Peter Mattson, Guenther Schmuelling, Carole-Jean Wu, Brian Anderson, Maximilien Breughe, Mark Charlebois, William Chou, Ramesh Chukka, Cody Coleman, Sam Davis, Pan Deng, Greg Diamos, Jared Duke, Dave Fick, J. Scott Gardner, Itay Hubara, Sachin Idgunji, Thomas B. Jablin, Jeff Jiao, Tom St. John, Pankaj Kanwar, David Lee, Jeffery Liao, Anton Lokhmotov, Francisco Massa, Peng Meng, Paulius Micikevicius, Colin Osborne, Gennady Pekhimenko, Arun Tejusve Raghunath Rajan, Dilip Sequeira, Ashish Sirasao, Fei Sun, Hanlin Tang, Michael Thomson, Frank Wei, Ephrem Wu, Lingjie Xu, Koichi Yamada, Bing Yu, George Yuan, Aaron Zhong, Peizhao Zhang, Yuchen Zhou:
MLPerf Inference Benchmark. CoRR abs/1911.02549 (2019) - 2018
- [c22]Gennady Pekhimenko, Ettore Tiotto:
Compiler-driven performance workshop. CASCON 2018: 374-376 - [c21]Hongyu Zhu, Mohamed Akrout, Bojian Zheng, Andrew Pelegris, Anand Jayarajan, Amar Phanishayee, Bianca Schroeder, Gennady Pekhimenko:
Benchmarking and Analyzing Deep Neural Network Training. IISWC 2018: 88-100 - [c20]Nandita Vijaykumar, Abhilasha Jain, Diptesh Majumdar, Kevin Hsieh, Gennady Pekhimenko, Eiman Ebrahimi, Nastaran Hajinazar, Phillip B. Gibbons, Onur Mutlu:
A Case for Richer Cross-Layer Abstractions: Bridging the Semantic Gap with Expressive Memory. ISCA 2018: 207-220 - [c19]Animesh Jain, Amar Phanishayee, Jason Mars, Lingjia Tang, Gennady Pekhimenko:
Gist: Efficient Data Encoding for Deep Neural Network Training. ISCA 2018: 776-789 - [c18]Gennady Pekhimenko, Chuanxiong Guo, Myeongjae Jeon, Peng Huang, Lidong Zhou:
TerseCades: Efficient Data Compression in Stream Processing. USENIX ATC 2018: 307-320 - [i15]Nandita Vijaykumar, Kevin Hsieh, Gennady Pekhimenko, Samira Manabi Khan, Ashish Shrestha, Saugata Ghose, Phillip B. Gibbons, Onur Mutlu:
Zorua: Enhancing Programming Ease, Portability, and Performance in GPUs by Decoupling Programming Models from Resource Management. CoRR abs/1802.02573 (2018) - [i14]Hongyu Zhu, Mohamed Akrout, Bojian Zheng, Andrew Pelegris, Amar Phanishayee, Bianca Schroeder, Gennady Pekhimenko:
TBD: Benchmarking and Analyzing Deep Neural Network Training. CoRR abs/1803.06905 (2018) - [i13]Nandita Vijaykumar, Kevin Hsieh, Gennady Pekhimenko, Samira Manabi Khan, Ashish Shrestha, Saugata Ghose, Adwait Jog, Phillip B. Gibbons, Onur Mutlu:
Decoupling GPU Programming Models from Resource Management for Enhanced Programming Ease, Portability, and Performance. CoRR abs/1805.02498 (2018) - [i12]Donghyuk Lee, Yoongu Kim, Gennady Pekhimenko, Samira Manabi Khan, Vivek Seshadri, Kevin K. Chang, Onur Mutlu:
Adaptive-Latency DRAM: Reducing DRAM Latency by Exploiting Timing Margins. CoRR abs/1805.03047 (2018) - [i11]Kevin K. Chang, Abhijith Kashyap, Hasan Hassan, Saugata Ghose, Kevin Hsieh, Donghyuk Lee, Tianshi Li, Gennady Pekhimenko, Samira Manabi Khan, Onur Mutlu:
Flexible-Latency DRAM: Understanding and Exploiting Latency Variation in Modern DRAM Chips. CoRR abs/1805.03154 (2018) - [i10]Hasan Hassan, Nandita Vijaykumar, Samira Manabi Khan, Saugata Ghose, Kevin K. Chang, Gennady Pekhimenko, Donghyuk Lee, Oguz Ergin, Onur Mutlu:
SoftMC: Practical DRAM Characterization Using an FPGA-Based Infrastructure. CoRR abs/1805.03195 (2018) - [i9]Vivek Seshadri, Yoongu Kim, Chris Fallin, Donghyuk Lee, Rachata Ausavarungnirun, Gennady Pekhimenko, Yixin Luo, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, Todd C. Mowry:
RowClone: Accelerating Data Movement and Initialization Using DRAM. CoRR abs/1805.03502 (2018) - [i8]Hasan Hassan, Gennady Pekhimenko, Nandita Vijaykumar, Vivek Seshadri, Donghyuk Lee, Oguz Ergin, Onur Mutlu:
Exploiting Row-Level Temporal Locality in DRAM to Reduce the Memory Access Latency. CoRR abs/1805.03969 (2018) - [i7]Bojian Zheng, Akshay Nair, Qiongsi Wu, Nandita Vijaykumar, Gennady Pekhimenko:
EcoRNN: Fused LSTM RNN Implementation with Data Layout Optimization. CoRR abs/1805.08899 (2018) - 2017
- [j7]Donghyuk Lee, Samira Manabi Khan, Lavanya Subramanian, Saugata Ghose, Rachata Ausavarungnirun, Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu:
Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms. Proc. ACM Meas. Anal. Comput. Syst. 1(1): 26:1-26:36 (2017) - [c17]Hasan Hassan, Nandita Vijaykumar, Samira Manabi Khan, Saugata Ghose, Kevin K. Chang, Gennady Pekhimenko, Donghyuk Lee, Oguz Ergin, Onur Mutlu:
SoftMC: A Flexible and Practical Open-Source Infrastructure for Enabling Experimental DRAM Studies. HPCA 2017: 241-252 - [c16]Donghyuk Lee, Samira Manabi Khan, Lavanya Subramanian, Saugata Ghose, Rachata Ausavarungnirun, Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu:
Design-Induced Latency Variation in Modern DRAM Chips: Characterization, Analysis, and Latency Reduction Mechanisms. SIGMETRICS (Abstracts) 2017: 54 - [c15]Hongyu Miao, Heejin Park, Myeongjae Jeon, Gennady Pekhimenko, Kathryn S. McKinley, Felix Xiaozhu Lin:
StreamBox: Modern Stream Processing on a Multicore Machine. USENIX ATC 2017: 617-629 - 2016
- [j6]Hongyi Xin, Sunny Nahar, Richard L. Zhu, John Emmons, Gennady Pekhimenko, Carl Kingsford, Can Alkan, Onur Mutlu:
Optimal seed solver: optimizing seed selection in read mapping. Bioinform. 32(11): 1632-1642 (2016) - [j5]Amir Yazdanbakhsh, Bradley Thwaites, Hadi Esmaeilzadeh, Gennady Pekhimenko, Onur Mutlu, Todd C. Mowry:
Mitigating the Memory Bottleneck With Approximate Load Value Prediction. IEEE Des. Test 33(1): 32-42 (2016) - [j4]Amir Yazdanbakhsh, Gennady Pekhimenko, Bradley Thwaites, Hadi Esmaeilzadeh, Onur Mutlu, Todd C. Mowry:
RFVP: Rollback-Free Value Prediction with Safe-to-Approximate Loads. ACM Trans. Archit. Code Optim. 12(4): 62:1-62:26 (2016) - [j3]Donghyuk Lee, Saugata Ghose, Gennady Pekhimenko, Samira Manabi Khan, Onur Mutlu:
Simultaneous Multi-Layer Access: Improving 3D-Stacked Memory Bandwidth at Low Cost. ACM Trans. Archit. Code Optim. 12(4): 63:1-63:29 (2016) - [c14]Gennady Pekhimenko, Evgeny Bolotin, Nandita Vijaykumar, Onur Mutlu, Todd C. Mowry, Stephen W. Keckler:
A case for toggle-aware compression for GPU systems. HPCA 2016: 188-200 - [c13]Hasan Hassan, Gennady Pekhimenko, Nandita Vijaykumar, Vivek Seshadri, Donghyuk Lee, Oguz Ergin, Onur Mutlu:
ChargeCache: Reducing DRAM latency by exploiting row access locality. HPCA 2016: 581-593 - [c12]Nandita Vijaykumar, Kevin Hsieh, Gennady Pekhimenko, Samira Manabi Khan, Ashish Shrestha, Saugata Ghose, Adwait Jog, Phillip B. Gibbons, Onur Mutlu:
Zorua: A holistic approach to resource virtualization in GPUs. MICRO 2016: 15:1-15:14 - [c11]Kevin K. Chang, Abhijith Kashyap, Hasan Hassan, Saugata Ghose, Kevin Hsieh, Donghyuk Lee, Tianshi Li, Gennady Pekhimenko, Samira Manabi Khan, Onur Mutlu:
Understanding Latency Variation in Modern DRAM Chips: Experimental Characterization, Analysis, and Optimization. SIGMETRICS 2016: 323-336 - [i6]Nandita Vijaykumar, Gennady Pekhimenko, Adwait Jog, Saugata Ghose, Abhishek Bhowmick, Rachata Ausavarungnirun, Chita R. Das, Mahmut T. Kandemir, Todd C. Mowry, Onur Mutlu:
A Framework for Accelerating Bottlenecks in GPU Execution with Assist Warps. CoRR abs/1602.01348 (2016) - [i5]Donghyuk Lee, Yoongu Kim, Gennady Pekhimenko, Samira Manabi Khan, Vivek Seshadri, Kevin Kai-Wei Chang, Onur Mutlu:
Adaptive-Latency DRAM (AL-DRAM). CoRR abs/1603.08454 (2016) - [i4]Gennady Pekhimenko:
Practical Data Compression for Modern Memory Hierarchies. CoRR abs/1609.02067 (2016) - [i3]Donghyuk Lee, Samira Manabi Khan, Lavanya Subramanian, Rachata Ausavarungnirun, Gennady Pekhimenko, Vivek Seshadri, Saugata Ghose, Onur Mutlu:
Reducing DRAM Latency by Exploiting Design-Induced Latency Variation in Modern DRAM Chips. CoRR abs/1610.09604 (2016) - 2015
- [j2]Hongyi Xin, John Greth, John Emmons, Gennady Pekhimenko, Carl Kingsford, Can Alkan, Onur Mutlu:
Shifted Hamming distance: a fast and accurate SIMD-friendly filter to accelerate alignment verification in read mapping. Bioinform. 31(10): 1553-1560 (2015) - [j1]Gennady Pekhimenko, Evgeny Bolotin, Mike O'Connor, Onur Mutlu, Todd C. Mowry, Stephen W. Keckler:
Toggle-Aware Compression for GPUs. IEEE Comput. Archit. Lett. 14(2): 164-168 (2015) - [c10]Gennady Pekhimenko, Tyler Huberty, Rui Cai, Onur Mutlu,