default search action
William J. Dally
William (Bill) J. Dally – Bill Dally
Person information
- affiliation: Stanford University, USA
- affiliation: NVIDIA
- award (2010): Eckert-Mauchly Award
Refine list
refinements active!
zoomed in on ?? of ?? records
view refined list in
export refined list as
showing all ?? records
2020 – today
- 2024
- [j78]Yoshinori Nishi, John W. Poulton, Walker J. Turner, Xi Chen, Sanquan Song, Brian Zimmer, Stephen G. Tell, Nikola Nedovic, John M. Wilson, William J. Dally, C. Thomas Gray:
A 0.190-pJ/bit 25.2-Gb/s/wire Inverter-Based AC-Coupled Transceiver for Short-Reach Die-to-Die Interfaces in 5-nm CMOS. IEEE J. Solid State Circuits 59(4): 1146-1157 (2024) - [c177]Walker J. Turner, John W. Poulton, Yoshinori Nishi, Xi Chen, Brian Zimmer, Sanquan Song, John M. Wilson, William J. Dally, C. Thomas Gray:
Leveraging Micro-Bump Pitch Scaling to Accelerate Interposer Link Bandwidths for Future High-Performance Compute Applications. CICC 2024: 1-7 - 2023
- [j77]Yoshinori Nishi, John W. Poulton, Walker J. Turner, Xi Chen, Sanquan Song, Brian Zimmer, Stephen G. Tell, Nikola Nedovic, John M. Wilson, William J. Dally, C. Thomas Gray:
A 0.297-pJ/Bit 50.4-Gb/s/Wire Inverter-Based Short-Reach Simultaneous Bi-Directional Transceiver for Die-to-Die Interface in 5-nm CMOS. IEEE J. Solid State Circuits 58(4): 1062-1073 (2023) - [j76]Ben Keller, Rangharajan Venkatesan, Steve Dai, Stephen G. Tell, Brian Zimmer, Charbel Sakr, William J. Dally, C. Thomas Gray, Brucek Khailany:
A 95.6-TOPS/W Deep Learning Inference Accelerator With Per-Vector Scaled 4-bit Quantization in 5 nm. IEEE J. Solid State Circuits 58(4): 1129-1141 (2023) - [j75]Tuofei Chen, Lei Gu, William J. Dally, Juan Rivas-Davila, John D. Fox:
A Novel High-Efficiency Three-Phase Multilevel PV Inverter With Reduced DC-Link Capacitance. IEEE Trans. Ind. Electron. 70(5): 4751-4761 (2023) - [c176]Zaid Qureshi, Vikram Sharma Mailthody, Isaac Gelado, Seungwon Min, Amna Masood, Jeongmin Brian Park, Jinjun Xiong, Chris J. Newburn, Dmitri Vainbrand, I-Hsin Chung, Michael Garland, William J. Dally, Wen-Mei W. Hwu:
GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture. ASPLOS (2) 2023: 325-339 - [c175]Bill Dally:
Hardware for Deep Learning. HCS 2023: 1-58 - [c174]Yoshinori Nishi, John W. Poulton, Xi Chen, Sanquan Song, Brian Zimmer, Walker J. Turner, Stephen G. Tell, Nikola Nedovic, John M. Wilson, William J. Dally, C. Thomas Gray:
A 0.190-pJ/bit 25.2-Gb/s/wire Inverter-Based AC-Coupled Transceiver for Short-Reach Die-to-Die Interfaces in 5-nm CMOS. VLSI Technology and Circuits 2023: 1-2 - [i25]Chenzhuo Zhu, Alexander C. Rucker, Yawen Wang, William J. Dally:
SatIn: Hardware for Boolean Satisfiability Inference. CoRR abs/2303.02588 (2023) - [i24]Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally:
Retrospective: EIE: Efficient Inference Engine on Sparse and Compressed Neural Network. CoRR abs/2306.09552 (2023) - [i23]Mingjie Liu, Teodor-Dumitru Ene, Robert Kirby, Chris Cheng, Nathaniel Ross Pinckney, Rongjian Liang, Jonah Alben, Himyanshu Anand, Sanmitra Banerjee, Ismet Bayraktaroglu, Bonita Bhaskaran, Bryan Catanzaro, Arjun Chaudhuri, Sharon Clay, Bill Dally, Laura Dang, Parikshit Deshpande, Siddhanth Dhodhi, Sameer Halepete, Eric Hill, Jiashang Hu, Sumit Jain, Brucek Khailany, Kishor Kunal, Xiaowei Li, Hao Liu, Stuart F. Oberman, Sujeet Omar, Sreedhar Pratty, Jonathan Raiman, Ambar Sarkar, Zhengjiang Shao, Hanfei Sun, Pratik P. Suthar, Varun Tej, Kaizhe Xu, Haoxing Ren:
ChipNeMo: Domain-Adapted LLMs for Chip Design. CoRR abs/2311.00176 (2023) - 2022
- [j74]William J. Dally:
On the model of computation: point. Commun. ACM 65(9): 30-32 (2022) - [j73]Jiawei Zhao, Steve Dai, Rangharajan Venkatesan, Brian Zimmer, Mustafa Fayez Ali, Ming-Yu Liu, Brucek Khailany, William J. Dally, Anima Anandkumar:
LNS-Madam: Low-Precision Training in Logarithmic Number System Using Multiplicative Weight Update. IEEE Trans. Computers 71(12): 3179-3190 (2022) - [c173]Charbel Sakr, Steve Dai, Rangharajan Venkatesan, Brian Zimmer, William J. Dally, Brucek Khailany:
Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training. ICML 2022: 19123-19138 - [c172]Peter M. Kogge, William J. Dally:
Frontier vs the Exascale Report: Why so long? and Are We Really There Yet? PMBS@SC 2022: 26-35 - [c171]Ben Keller, Rangharajan Venkatesan, Steve Dai, Stephen G. Tell, Brian Zimmer, William J. Dally, C. Thomas Gray, Brucek Khailany:
A 17-95.6 TOPS/W Deep Learning Inference Accelerator with Per-Vector Scaled 4-bit Quantization for Transformers in 5nm. VLSI Technology and Circuits 2022: 16-17 - [c170]Yoshinori Nishi, John W. Poulton, Xi Chen, Sanquan Song, Brian Zimmer, Walker J. Turner, Stephen G. Tell, Nikola Nedovic, John M. Wilson, William J. Dally, C. Thomas Gray:
A 0.297-pJ/bit 50.4-Gb/s/wire Inverter-Based Short-Reach Simultaneous Bidirectional Transceiver for Die-to-Die Interface in 5nm CMOS. VLSI Technology and Circuits 2022: 154-155 - [d1]Zaid Qureshi, Vikram Sharma Mailthody, Isaac Gelago, Seungwon Min, Amna Masood, Jeongmin Brian Park, Jinjun Xiong, Chris J. Newburn, Dmitri Vainbrand, I-Hsin Chung, Michael Garland, William J. Dally, Wen-mei W. Hwu:
GPU-Initiated On-Demand High-Throughput Storage Access in the BaM System Architecture. Zenodo, 2022 - [i22]Zaid Qureshi, Vikram Sharma Mailthody, Isaac Gelado, Seungwon Min, Amna Masood, Jeongmin Brian Park, Jinjun Xiong, Chris J. Newburn, Dmitri Vainbrand, I-Hsin Chung, Michael Garland, William J. Dally, Wen-Mei W. Hwu:
BaM: A Case for Enabling Fine-grain High Throughput GPU-Orchestrated Access to Storage. CoRR abs/2203.04910 (2022) - [i21]Charbel Sakr, Steve Dai, Rangharajan Venkatesan, Brian Zimmer, William J. Dally, Brucek Khailany:
Optimal Clipping and Magnitude-aware Differentiation for Improved Quantization-aware Training. CoRR abs/2206.06501 (2022) - 2021
- [j72]Yakun Sophia Shao, Jason Clemons, Rangharajan Venkatesan, Brian Zimmer, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Ross Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel S. Emer, C. Thomas Gray, Brucek Khailany, Stephen W. Keckler:
Simba: scaling deep-learning inference with chiplet-based architecture. Commun. ACM 64(6): 107-116 (2021) - [j71]William J. Dally, Stephen W. Keckler, David Blair Kirk:
Evolution of the Graphics Processing Unit (GPU). IEEE Micro 41(6): 42-51 (2021) - [j70]William J. Dally:
OP-VENT: A Low-Cost, Easily Assembled, Open-Source Medical Ventilator. GetMobile Mob. Comput. Commun. 25(4): 12-18 (2021) - [c169]Steve Dai, Rangharajan Venkatesan, Mark Ren, Brian Zimmer, William J. Dally, Brucek Khailany:
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference. MLSys 2021 - [c168]Guy E. Blelloch, William J. Dally, Margaret Martonosi, Uzi Vishkin, Katherine A. Yelick:
SPAA'21 Panel Paper: Architecture-Friendly Algorithms versus Algorithm-Friendly Architectures. SPAA 2021: 1-7 - [i20]Steve Dai, Rangharajan Venkatesan, Haoxing Ren, Brian Zimmer, William J. Dally, Brucek Khailany:
VS-Quant: Per-vector Scaled Quantization for Accurate Low-Precision Neural Network Inference. CoRR abs/2102.04503 (2021) - [i19]Huizi Mao, Sibo Zhu, Song Han, William J. Dally:
PatchNet - Short-range Template Matching for Efficient Video Processing. CoRR abs/2103.07371 (2021) - [i18]Jiawei Zhao, Steve Dai, Rangharajan Venkatesan, Ming-Yu Liu, Brucek Khailany, Bill Dally, Anima Anandkumar:
Low-Precision Training in Logarithmic Number System using Multiplicative Weight Update. CoRR abs/2106.13914 (2021) - 2020
- [j69]William J. Dally, Yatish Turakhia, Song Han:
Domain-specific hardware accelerators. Commun. ACM 63(7): 48-57 (2020) - [j68]Brian Zimmer, Rangharajan Venkatesan, Yakun Sophia Shao, Jason Clemons, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Ross Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel S. Emer, C. Thomas Gray, Stephen W. Keckler, Brucek Khailany:
A 0.32-128 TOPS, Scalable Multi-Chip-Module-Based Deep Neural Network Inference Accelerator With Ground-Referenced Signaling in 16 nm. IEEE J. Solid State Circuits 55(4): 920-932 (2020) - [j67]Brucek Khailany, Haoxing Ren, Steve Dai, Saad Godil, Ben Keller, Robert Kirby, Alicia Klinefelter, Rangharajan Venkatesan, Yanqing Zhang, Bryan Catanzaro, William J. Dally:
Accelerating Chip Design With Machine Learning. IEEE Micro 40(6): 23-32 (2020) - [j66]Milad Mohammadi, Song Han, Ehsan Atoofian, Amirali Baniasadi, Tor M. Aamodt, William J. Dally:
Energy Efficient On-Demand Dynamic Branch Prediction Models. IEEE Trans. Computers 69(3): 453-465 (2020) - [c167]Jongho Kim, Youngsuk Park, John D. Fox, Stephen P. Boyd, William J. Dally:
Optimal Operation of a Plug-in Hybrid Vehicle with Battery Thermal and Degradation Model. ACC 2020: 3083-3090 - [c166]Zhekai Zhang, Hanrui Wang, Song Han, William J. Dally:
SpArch: Efficient Architecture for Sparse Matrix Multiplication. HPCA 2020: 261-274 - [i17]Zhekai Zhang, Hanrui Wang, Song Han, William J. Dally:
SpArch: Efficient Architecture for Sparse Matrix Multiplication. CoRR abs/2002.08947 (2020)
2010 – 2019
- 2019
- [j65]John W. Poulton, John M. Wilson, Walker J. Turner, Brian Zimmer, Xi Chen, Sudhir S. Kudva, Sanquan Song, Stephen G. Tell, Nikola Nedovic, Wenxu Zhao, Sunil R. Sudhakaran, C. Thomas Gray, William J. Dally:
A 1.17-pJ/b, 25-Gb/s/pin Ground-Referenced Single-Ended Serial Link for Off- and On-Package Communication Using a Process- and Temperature-Adaptive Voltage Regulator. IEEE J. Solid State Circuits 54(1): 43-54 (2019) - [j64]Yatish Turakhia, Gill Bejerano, William J. Dally:
Darwin: A Genomics Coprocessor. IEEE Micro 39(3): 29-37 (2019) - [c165]Matthew Fojtik, Ben Keller, Alicia Klinefelter, Nathaniel Ross Pinckney, Stephen G. Tell, Brian Zimmer, Tezaswi Raja, Kevin Zhou, William J. Dally, Brucek Khailany:
A Fine-Grained GALS SoC with Pausible Adaptive Clocking in 16 nm FinFET. ASYNC 2019: 27-35 - [c164]Sanquan Song, John Poulton, Xi Chen, Brian Zimmer, Stephen G. Tell, Walker J. Turner, Sudhir S. Kudva, Nikola Nedovic, John M. Wilson, C. Thomas Gray, William J. Dally:
A 2-to-20 GHz Multi-Phase Clock Generator with Phase Interpolators Using Injection-Locked Oscillation Buffers for High-Speed IOs in 16nm FinFET. CICC 2019: 1-4 - [c163]Angad S. Rekhi, Brian Zimmer, Nikola Nedovic, Ningxi Liu, Rangharajan Venkatesan, Miaorong Wang, Brucek Khailany, William J. Dally, C. Thomas Gray:
Analog/Mixed-Signal Hardware Error Modeling for Deep Learning Inference. DAC 2019: 81 - [c162]Rangharajan Venkatesan, Yakun Sophia Shao, Brian Zimmer, Jason Clemons, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Ross Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel S. Emer, C. Thomas Gray, Stephen W. Keckler, Brucek Khailany:
A 0.11 PJ/OP, 0.32-128 Tops, Scalable Multi-Chip-Module-Based Deep Neural Network Accelerator Designed with A High-Productivity vlsi Methodology. Hot Chips Symposium 2019: 1-24 - [c161]Yatish Turakhia, Sneha D. Goenka, Gill Bejerano, William J. Dally:
Darwin-WGA: A Co-processor Provides Increased Sensitivity in Whole Genome Alignments with High Speedup. HPCA 2019: 359-372 - [c160]Rangharajan Venkatesan, Yakun Sophia Shao, Miaorong Wang, Jason Clemons, Steve Dai, Matthew Fojtik, Ben Keller, Alicia Klinefelter, Nathaniel Ross Pinckney, Priyanka Raina, Yanqing Zhang, Brian Zimmer, William J. Dally, Joel S. Emer, Stephen W. Keckler, Brucek Khailany:
MAGNet: A Modular Accelerator Generator for Neural Networks. ICCAD 2019: 1-8 - [c159]Huizi Mao, Xiaodong Yang, Bill Dally:
A Delay Metric for Video Object Detection: What Average Precision Fails to Tell. ICCV 2019: 573-582 - [c158]Yakun Sophia Shao, Jason Clemons, Rangharajan Venkatesan, Brian Zimmer, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Ross Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel S. Emer, C. Thomas Gray, Brucek Khailany, Stephen W. Keckler:
Simba: Scaling Deep-Learning Inference with Multi-Chip-Module-Based Architecture. MICRO 2019: 14-27 - [c157]Huizi Mao, Taeyoung Kong, Bill Dally:
CaTDet: Cascaded Tracked Detector for Efficient Object Detection from Video. SysML 2019 - [c156]Yatish Turakhia, Gill Bejerano, William J. Dally:
Darwin: A Genomics Co-processor Provides up to 15, 000X Acceleration on Long Read Assembly. USENIX ATC 2019 - [c155]Brian Zimmer, Rangharajan Venkatesan, Yakun Sophia Shao, Jason Clemons, Matthew Fojtik, Nan Jiang, Ben Keller, Alicia Klinefelter, Nathaniel Ross Pinckney, Priyanka Raina, Stephen G. Tell, Yanqing Zhang, William J. Dally, Joel S. Emer, C. Thomas Gray, Stephen W. Keckler, Brucek Khailany:
A 0.11 pJ/Op, 0.32-128 TOPS, Scalable Multi-Chip-Module-based Deep Neural Network Accelerator with Ground-Reference Signaling in 16nm. VLSI Circuits 2019: 300- - [i16]Alexander Ratner, Dan Alistarh, Gustavo Alonso, David G. Andersen, Peter Bailis, Sarah Bird, Nicholas Carlini, Bryan Catanzaro, Eric S. Chung, Bill Dally, Jeff Dean, Inderjit S. Dhillon, Alexandros G. Dimakis, Pradeep Dubey, Charles Elkan, Grigori Fursin, Gregory R. Ganger, Lise Getoor, Phillip B. Gibbons, Garth A. Gibson, Joseph E. Gonzalez, Justin Gottschlich, Song Han, Kim M. Hazelwood, Furong Huang, Martin Jaggi, Kevin G. Jamieson, Michael I. Jordan, Gauri Joshi, Rania Khalaf, Jason Knight, Jakub Konecný, Tim Kraska, Arun Kumar, Anastasios Kyrillidis, Jing Li, Samuel Madden, H. Brendan McMahan, Erik Meijer, Ioannis Mitliagkas, Rajat Monga, Derek Gordon Murray, Dimitris S. Papailiopoulos, Gennady Pekhimenko, Theodoros Rekatsinas, Afshin Rostamizadeh, Christopher Ré, Christopher De Sa, Hanie Sedghi, Siddhartha Sen, Virginia Smith, Alex Smola, Dawn Song, Evan Randall Sparks, Ion Stoica, Vivienne Sze, Madeleine Udell, Joaquin Vanschoren, Shivaram Venkataraman, Rashmi Vinayak, Markus Weimer, Andrew Gordon Wilson, Eric P. Xing, Matei Zaharia, Ce Zhang, Ameet Talwalkar:
SysML: The New Frontier of Machine Learning Systems. CoRR abs/1904.03257 (2019) - [i15]Huizi Mao, Xiaodong Yang, William J. Dally:
A Delay Metric for Video Object Detection: What Average Precision Fails to Tell. CoRR abs/1908.06368 (2019) - 2018
- [j63]Jason A. Platt, Nicholas Moehle, John D. Fox, William J. Dally:
Optimal Operation of a Plug-In Hybrid Vehicle. IEEE Trans. Veh. Technol. 67(11): 10366-10377 (2018) - [c154]Yatish Turakhia, Gill Bejerano, William J. Dally:
Darwin: A Genomics Co-processor Provides up to 15, 000X Acceleration on Long Read Assembly. ASPLOS 2018: 199-213 - [c153]Walker J. Turner, John W. Poulton, John M. Wilson, Xi Chen, Stephen G. Tell, Matthew Fojtik, Thomas H. Greer, Brian Zimmer, Sanquan Song, Nikola Nedovic, Sudhir S. Kudva, Sunil R. Sudhakaran, Rizwan Bashirullah, Wenxu Zhao, William J. Dally, C. Thomas Gray:
Ground-referenced signaling for intra-chip and short-reach chip-to-chip interconnects. CICC 2018: 1-8 - [c152]Song Han, William J. Dally:
Bandwidth-efficient deep learning. DAC 2018: 147:1-147:6 - [c151]Yujun Lin, Song Han, Huizi Mao, Yu Wang, Bill Dally:
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training. ICLR (Poster) 2018 - [c150]Xingyu Liu, Jeff Pool, Song Han, William J. Dally:
Efficient Sparse-Winograd Convolutional Neural Networks. ICLR (Poster) 2018 - [c149]John M. Wilson, Walker J. Turner, John W. Poulton, Brian Zimmer, Xi Chen, Sudhir S. Kudva, Sanquan Song, Stephen G. Tell, Nikola Nedovic, Wenxu Zhao, Sunil R. Sudhakaran, C. Thomas Gray, William J. Dally:
A 1.17pJ/b 25Gb/s/pin ground-referenced single-ended serial link for off- and on-package communication in 16nm CMOS using a process- and temperature-adaptive voltage regulator. ISSCC 2018: 276-278 - [c148]William J. Dally, C. Thomas Gray, John Poulton, Brucek Khailany, John M. Wilson, Larry R. Dennison:
Hardware-Enabled Artificial Intelligence. VLSI Circuits 2018: 3-6 - [i14]Xingyu Liu, Jeff Pool, Song Han, William J. Dally:
Efficient Sparse-Winograd Convolutional Neural Networks. CoRR abs/1802.06367 (2018) - [i13]Huizi Mao, Taeyoung Kong, William J. Dally:
CaTDet: Cascaded Tracked Detector for Efficient Object Detection from Video. CoRR abs/1810.00434 (2018) - 2017
- [j62]Babak Falsafi, Bill Dally, Desh Singh, Derek Chiou, Joshua J. Yi, Resit Sendag:
FPGAs versus GPUs in Data centers. IEEE Micro 37(1): 60-72 (2017) - [j61]Milad Mohammadi, Tor M. Aamodt, William J. Dally:
CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution Near In-Order Energy with Near Out-of-Order Performance. ACM Trans. Archit. Code Optim. 14(4): 39:1-39:26 (2017) - [c147]Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, William J. Dally:
Exploring the Granularity of Sparsity in Convolutional Neural Networks. CVPR Workshops 2017: 1927-1934 - [c146]Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, William (Bill) J. Dally:
ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA. FPGA 2017: 75-84 - [c145]Niladrish Chatterjee, Mike O'Connor, Donghyuk Lee, Daniel R. Johnson, Stephen W. Keckler, Minsoo Rhu, William J. Dally:
Architecting an Energy-Efficient DRAM System for GPUs. HPCA 2017: 73-84 - [c144]Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Enhao Gong, Shijian Tang, Erich Elsen, Peter Vajda, Manohar Paluri, John Tran, Bryan Catanzaro, William J. Dally:
DSD: Dense-Sparse-Dense Training for Deep Neural Networks. ICLR (Poster) 2017 - [c143]Xingyu Liu, Song Han, Huizi Mao, William J. Dally:
Efficient Sparse-Winograd Convolutional Neural Networks. ICLR (Workshop) 2017 - [c142]Chenzhuo Zhu, Song Han, Huizi Mao, William J. Dally:
Trained Ternary Quantization. ICLR (Poster) 2017 - [c141]Bill Dally:
Efficient methods and hardware for deep learning. TIML@ISCA 2017: 2 - [c140]Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel S. Emer, Stephen W. Keckler, William J. Dally:
SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. ISCA 2017: 27-40 - [c139]Mike O'Connor, Niladrish Chatterjee, Donghyuk Lee, John M. Wilson, Aditya Agrawal, Stephen W. Keckler, William J. Dally:
Fine-grained DRAM: energy-efficient DRAM for extreme bandwidth systems. MICRO 2017: 41-54 - [i12]Yatish Turakhia, Subhasis Das, Tor M. Aamodt, William J. Dally:
HoLiSwap: Reducing Wire Energy in L1 Caches. CoRR abs/1701.03878 (2017) - [i11]Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, William J. Dally:
Exploring the Regularity of Sparse Structure in Convolutional Neural Networks. CoRR abs/1705.08922 (2017) - [i10]Morteza Mardani, Enhao Gong, Joseph Y. Cheng, Shreyas Vasanawala, Greg Zaharchuk, Marcus T. Alley, Neil Thakur, Song Han, William J. Dally, John M. Pauly, Lei Xing:
Deep Generative Adversarial Networks for Compressed Sensing Automates MRI. CoRR abs/1706.00051 (2017) - [i9]Angshuman Parashar, Minsoo Rhu, Anurag Mukkara, Antonio Puglielli, Rangharajan Venkatesan, Brucek Khailany, Joel S. Emer, Stephen W. Keckler, William J. Dally:
SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. CoRR abs/1708.04485 (2017) - [i8]Yujun Lin, Song Han, Huizi Mao, Yu Wang, William J. Dally:
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training. CoRR abs/1712.01887 (2017) - 2016
- [j60]Mahmut E. Sinangil, John W. Poulton, Matthew R. Fojtik, Thomas H. Greer, Stephen G. Tell, Andreas J. Gotterba, Jesse Wang, Jason Golbus, Brian Zimmer, William J. Dally, C. Thomas Gray:
A 28 nm 2 Mbit 6 T SRAM With Highly Configurable Low-Voltage Write-Ability Assist Implementation and Capacitor-Based Sense-Amplifier Input Offset Compensation. IEEE J. Solid State Circuits 51(2): 557-567 (2016) - [j59]Subhasis Das, Tor M. Aamodt, William J. Dally:
Reuse Distance-Based Probabilistic Cache Replacement. ACM Trans. Archit. Code Optim. 12(4): 33:1-33:22 (2016) - [c138]Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark Horowitz, Bill Dally:
Deep compression and EIE: Efficient inference engine on compressed deep neural network. Hot Chips Symposium 2016: 1-6 - [c137]Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally:
EIE: Efficient Inference Engine on Compressed Deep Neural Network. ISCA 2016: 243-254 - [c136]John M. Wilson, Matthew R. Fojtik, John W. Poulton, Xi Chen, Stephen G. Tell, Thomas H. Greer, C. Thomas Gray, William J. Dally:
8.6 A 6.5-to-23.3fJ/b/mm balanced charge-recycling bus in 16nm FinFET CMOS at 1.7-to-2.6Gb/s/wire with clock forwarding and low-crosstalk contraflow wiring. ISSCC 2016: 156-157 - [c135]Song Han, Huizi Mao, William J. Dally:
Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding. ICLR 2016 - [i7]Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, William J. Dally:
EIE: Efficient Inference Engine on Compressed Deep Neural Network. CoRR abs/1602.01528 (2016) - [i6]Forrest N. Iandola, Matthew W. Moskewicz, Khalid Ashraf, Song Han, William J. Dally, Kurt Keutzer:
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs/1602.07360 (2016) - [i5]Milad Mohammadi, Tor M. Aamodt, William J. Dally:
CG-OoO: Energy-Efficient Coarse-Grain Out-of-Order Execution. CoRR abs/1606.01607 (2016) - [i4]Song Han, Jeff Pool, Sharan Narang, Huizi Mao, Shijian Tang, Erich Elsen, Bryan Catanzaro, John Tran, William J. Dally:
DSD: Regularizing Deep Neural Networks with Dense-Sparse-Dense Training Flow. CoRR abs/1607.04381 (2016) - [i3]Song Han, Junlong Kang, Huizi Mao, Yiming Hu, Xin Li, Yubin Li, Dongliang Xie, Hong Luo, Song Yao, Yu Wang, Huazhong Yang, William J. Dally:
ESE: Efficient Speech Recognition Engine with Compressed LSTM on FPGA. CoRR abs/1612.00694 (2016) - [i2]Chenzhuo Zhu, Song Han, Huizi Mao, William J. Dally:
Trained Ternary Quantization. CoRR abs/1612.01064 (2016) - 2015
- [j58]Milad Mohammadi, Song Han, Tor M. Aamodt, William J. Dally:
On-Demand Dynamic Branch Prediction. IEEE Comput. Archit. Lett. 14(1): 50-53 (2015) - [j57]R. Curtis Harting, William J. Dally:
On-Chip Active Messages for Speed, Scalability, and Efficiency. IEEE Trans. Parallel Distributed Syst. 26(2): 507-515 (2015) - [c134]Subhasis Das, Tor M. Aamodt, William J. Dally:
SLIP: reducing wire energy in the memory hierarchy. ISCA 2015: 349-361 - [c133]Song Han, Jeff Pool, John Tran, William J. Dally:
Learning both Weights and Connections for Efficient Neural Network. NIPS 2015: 1135-1143 - [c132]Nan Jiang, Larry R. Dennison, William J. Dally:
Network endpoint congestion control for fine-grained communication. SC 2015: 35:1-35:12 - [i1]Song Han, Jeff Pool, John Tran, William J. Dally:
Learning both Weights and Connections for Efficient Neural Networks. CoRR abs/1506.02626 (2015) - 2014
- [c131]William J. Dally, James D. Balfour:
Author retrospective for design tradeoffs for tiled CMP on-chip networks. ICS 25th Anniversary 2014: 77-79 - [c130]Oreste Villa, Daniel R. Johnson, Mike O'Connor, Evgeny Bolotin, David W. Nellans, Justin Luitjens, Nikolai Sakharnykh, Peng Wang, Paulius Micikevicius, Anthony Scudiero, Stephen W. Keckler, William J. Dally:
Scaling the Power Wall: A Path to Exascale. SC 2014: 830-841 - 2013
- [j56]John W. Poulton, William J. Dally, Xi Chen, John G. Eyles, Thomas H. Greer, Stephen G. Tell, John M. Wilson, C. Thomas Gray:
A 0.54 pJ/b 20 Gb/s Ground-Referenced Single-Ended Short-Reach Serial Link in 28 nm CMOS for Advanced Packaging Applications. IEEE J. Solid State Circuits 48(12): 3206-3218 (2013) - [j55]George Michelogiannakis, William J. Dally:
Elastic Buffer Flow Control for On-Chip Networks. IEEE Trans. Computers 62(2): 295-309 (2013) - [c129]William J. Dally, Chris Malachowsky, Stephen W. Keckler:
21st century digital design tools. DAC 2013: 94:1-94:6 - [c128]Nan Jiang, Daniel U. Becker, George Michelogiannakis, James D. Balfour, Brian Towles, David E. Shaw, John Kim, William J. Dally:
A detailed and flexible cycle-accurate Network-on-Chip simulator. ISPASS 2013: 86-96 - [c127]John W. Poulton, William J. Dally, Xi Chen, John G. Eyles, Thomas H. Greer, Stephen G. Tell, C. Thomas Gray:
A 0.54pJ/b 20Gb/s ground-referenced single-ended short-haul serial link in 28nm CMOS for advanced packaging applications. ISSCC 2013: 404-405 - [c126]George Michelogiannakis, Nan Jiang, Daniel Becker, William J. Dally:
Channel reservation protocol for over-subscribed channels and destinations. SC 2013: 52:1-52:12 - 2012
- [j54]Mark Gebhart, Daniel R. Johnson, David Tarjan, Stephen W. Keckler, William J. Dally, Erik Lindholm, Kevin Skadron:
A Hierarchical Thread Scheduler and Register File for Energy-Efficient Throughput Processors. ACM Trans. Comput. Syst. 30(2): 8:1-8:38 (2012) - [c125]Nan Jiang, Daniel U. Becker, George Michelogiannakis, William J. Dally:
Network congestion avoidance through Speculative Reservation. HPCA 2012: 443-454 - [c124]Daniel U. Becker, Nan Jiang, George Michelogiannakis, William J. Dally:
Adaptive Backpressure: Efficient buffer management for on-chip networks. ICCD 2012: 419-426 - [c123]Mark Gebhart, Stephen W. Keckler, Brucek Khailany, Ronny Krashinsky, William J. Dally:
Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor. MICRO 2012: 96-106 - 2011
- [j53]George Michelogiannakis, Nan Jiang, Daniel Becker, William J. Dally:
Packet Chaining: Efficient Single-Cycle Allocation for On-Chip Networks. IEEE Comput. Archit. Lett. 10(2): 33-36 (2011) - [j52]Stephen W. Keckler, William J. Dally, Brucek Khailany, Michael Garland, David Glasco:
GPUs and the Future of Parallel Computing. IEEE Micro 31(5): 7-17 (2011) - [j51]George Michelogiannakis, Daniel Becker, William J. Dally:
Evaluating Elastic Buffer and Wormhole Flow Control. IEEE Trans. Computers 60(6): 896-903 (2011) - [c122]