


default search action
54th MICRO 2021: Virtual Event, Greece
- MICRO '21: 54th Annual IEEE/ACM International Symposium on Microarchitecture, Virtual Event, Greece, October 18-22, 2021. ACM 2021, ISBN 978-1-4503-8557-2

Session 1: Best Paper Session
- Zhiyao Xie, Xiaoqing Xu, Matt Walker, Joshua Knebel, Kumaraguru Palaniswamy, Nicolas Hebert, Jiang Hu, Huanrui Yang, Yiran Chen

, Shidhartha Das:
APOLLO: An Automated Power Modeling Framework for Runtime Power Introspection in High-Volume Commercial Microprocessors. 1-14 - Björn Gottschall, Lieven Eeckhout, Magnus Jahre

:
TIP: Time-Proportional Instruction Profiling. 15-27 - Yu-Chia Liu, Hung-Wei Tseng

:
NDS: N-Dimensional Storage. 28-45 - Harini Muthukrishnan, Daniel Lustig, David W. Nellans, Thomas F. Wenisch:

GPS: A Global Publish-Subscribe Model for Multi-GPU Memory Management. 46-58
Session 2A: Non-Volatile Memory
- Congming Gao, Xin Xin, Youyou Lu, Youtao Zhang, Jun Yang, Jiwu Shu:

ParaBit: Processing Parallel Bitwise Operations in NAND Flash Memory based SSDs. 59-70 - Apostolos Kokolis

, Antonis Psistakis
, Benjamin Reidys, Jian Huang, Josep Torrellas:
Distributed Data Persistency. 71-85 - Marina Vemmou, Alexandros Daglis:

COSPlay: Leveraging Task-Level Parallelism for High-Throughput Synchronous Persistence. 86-99 - Minh S. Q. Truong, Eric Chen, Deanyone Su

, Liting Shen, Alexander Glass, L. Richard Carley, James A. Bain, Saugata Ghose:
RACER: Bit-Pipelined Processing Using Resistive Memory. 100-116 - Md Hafizul Islam Chowdhuryy

, Muhammad Rashedul Haq Rashed
, Amro Awad
, Rickard Ewetz
, Fan Yao
:
LADDER: Architecting Content and Location-aware Writes for Crossbar Resistive Memories. 117-130
Session 2B: Energy Efficiency & Low Power
- Seunghak Lee

, Ki-Dong Kang, Hwanjun Lee, Hyungwon Park, Young Hoon Son, Nam Sung Kim, Daehoon Kim:
GreenDIMM: OS-assisted DRAM Power Management for DRAM with a Sub-array Granularity Power-Down State. 131-142 - Ki-Dong Kang, Gyeongseo Park, Hyosang Kim, Mohammad Alian, Nam Sung Kim, Daehoon Kim:

NMAP: Power Management Based on Network Packet Processing Mode Transition for Latency-Critical Workloads. 143-154 - Jawad Haj-Yahya, Jisung Park

, Rahul Bera
, Juan Gómez-Luna, Efraim Rotem, Taha Shahroodi, Jeremie S. Kim, Onur Mutlu
:
BurstLink: Techniques for Energy-Efficient Video Display for Conventional and Virtual Reality Systems. 155-169 - Jianping Zeng

, Jongouk Choi, Xinwei Fu, Ajay Paddayuru Shreepathi, Dongyoon Lee
, Changwoo Min, Changhee Jung:
ReplayCache: Enabling Volatile Cachesfor Energy Harvesting Systems. 170-182 - Young Geun Kim, Carole-Jean Wu:

AutoFL: Enabling Heterogeneity-Aware Energy Efficient Federated Learning. 183-198
Session 3A: Security & Privacy I
- Luyi Kang

, Yuqi Xue
, Weiwei Jia
, Xiaohao Wang, Jongryool Kim, Changhwan Youn
, Myeong Joon Kang, Hyung Jin Lim, Bruce L. Jacob, Jian Huang:
IceClave: A Trusted Execution Environment for In-Storage Computing. 199-211 - Hanieh Hashemi, Yongqin Wang

, Murali Annavaram:
DarKnight: An Accelerated Framework for Privacy and Integrity Preserving Deep Learning Using Trusted Hardware. 212-224 - Yonggan Fu, Yang Zhao

, Qixuan Yu, Chaojian Li, Yingyan Lin:
2-in-1 Accelerator: Enabling Random Precision Switch for Winning Both Adversarial Robustness and Efficiency. 225-237 - Nikola Samardzic, Axel Feldmann, Aleksandar Krastev

, Srinivas Devadas, Ronald G. Dreslinski, Christopher Peikert, Daniel Sánchez:
F1: A Fast and Programmable Accelerator for Fully Homomorphic Encryption. 238-252 - Michael LeMay

, Joydeep Rakshit, Sergej Deutsch, David M. Durham
, Santosh Ghosh, Anant Nori, Jayesh Gaur, Andrew Weiler, Salmin Sultana, Karanvir Grewal, Sreenivas Subramoney:
Cryptographic Capability Computing. 253-267
Session 3B: Processing In/Near Memory
- Jaehyun Park

, Byeongho Kim, Sungmin Yun
, Eojin Lee, Minsoo Rhu, Jung Ho Ahn
:
TRiM: Enhancing Processor-Memory Interfaces with Scalable Tensor Reduction in Memory. 268-281 - Maciej Besta, Raghavendra Kanakagiri

, Grzegorz Kwasniewski, Rachata Ausavarungnirun, Jakub Beránek
, Konstantinos Kanellopoulos, Kacper Janda, Zur Vonarburg-Shmaria, Lukas Gianinazzi, Ioana Stefan, Juan Gómez-Luna, Jakub Golinowski, Marcin Copik
, Lukas Kapp-Schwoerer
, Salvatore Di Girolamo, Nils Blach, Marek Konieczny
, Onur Mutlu
, Torsten Hoefler:
SISA: Set-Centric Instruction Set Architecture for Graph Mining on Processing-in-Memory Systems. 282-297 - Anirban Nag, Rajeev Balasubramonian:

OrderLight: Lightweight Memory-Ordering Primitive for Efficient Fine-Grained PIM Computations. 298-310 - Elaheh Sadredini

, Reza Rahimi, Mohsen Imani, Kevin Skadron
:
Sunder: Enabling Low-Overhead and Scalable Near-Data Pattern Matching Acceleration. 311-323 - Xin Xin, Yanan Guo

, Youtao Zhang, Jun Yang:
SAM: Accelerating Strided Memory Accesses. 324-336
Session 4A: Parallelism
- Eduardo José Gómez-Hernández

, Juan M. Cebrian
, J. Rubén Titos Gil
, Stefanos Kaxiras, Alberto Ros
:
Efficient, Distributed, and Non-Speculative Multi-Address Atomic Operations. 337-349 - Joseph Zuckerman, Davide Giri, Jihye Kwon

, Paolo Mantovani, Luca P. Carloni
:
Cohmeleon: Learning-Based Orchestration of Accelerator Coherence in Heterogeneous SoCs. 350-365 - Vanshika Baoni, Adarsh Mittal

, Gurindar S. Sohi:
Fat Loads: Exploiting Locality Amongst Contemporaneous Load Operations to Optimize Cache Accesses. 366-379 - Aniket Deshmukh, Yale N. Patt:

Criticality Driven Fetch. 380-391 - Philip Bedoukian, Neil Adit, Edwin Peguero, Adrian Sampson

:
Software-Defined Vector Processing on Manycore Fabrics. 392-406
Session 4B: Accelerators I
- Arash Pourhabibi Zarandi

, Mark Sutherland, Alexandros Daglis, Babak Falsafi:
Cerebros: Evading the RPC Tax in Datacenters. 407-420 - Mario Drumond, Louis Coulon, Arash Pourhabibi Zarandi

, Ahmet Caner Yüzügüler, Babak Falsafi, Martin Jaggi:
Equinox: Training (for Free) on a Custom Inference Accelerator. 421-433 - Seongyoung Kang, Jiyoung An, Jinpyo Kim, Sang-Woo Jun:

: Near-Storage Accelerator for High-Performance Log Analytics. 434-448 - Yujun Lin

, Zhekai Zhang, Haotian Tang, Hanrui Wang, Song Han:
PointAcc: Efficient Point Cloud Accelerator. 449-461 - Sagar Karandikar, Chris Leary, Chris Kennelly, Jerry Zhao, Dinesh Parimi, Borivoje Nikolic

, Krste Asanovic, Parthasarathy Ranganathan:
A Hardware Accelerator for Protocol Buffers. 462-478
Session 5A: Accelerators II
- Weizhuang Liu, Bo Yu, Yiming Gan, Qiang Liu, Jie Tang, Shaoshan Liu, Yuhao Zhu:

Archytas: A Framework for Synthesizing and Dynamically Optimizing Accelerators for Robotic Localization. 479-493 - Shulin Zhao, Haibo Zhang, Cyan Subhra Mishra, Sandeepa Bhuyan, Ziyu Ying, Mahmut Taylan Kandemir, Anand Sivasubramaniam, Chita R. Das:

HoloAR: On-the-fly Optimization of 3D Holographic Processing for Augmented Reality. 494-506 - David Trilla, John-David Wellman, Alper Buyuktosunoglu, Pradip Bose:

NOVIA: A Framework for Discovering Non-Conventional Inline Accelerators. 507-521 - Ameer M. S. Abdelhadi, Eugene Sha, Ciaran Bannon, Hendrik Steenland, Andreas Moshovos:

Noema: Hardware-Efficient Template Matching for Neural Population Pattern Detection. 522-534 - Timothy Dunn, Harisankar Sadasivan

, Jack Wadden, Kush Goliya, Kuan-Yu Chen
, David T. Blaauw, Reetuparna Das
, Satish Narayanasamy
:
SquiggleFilter: An Accelerator for Portable Virus Detection. 535-549
Session 5B: Security & Privacy II
- Joonsung Kim, Hamin Jang

, Hunjun Lee, Seungho Lee, Jangwoo Kim:
UC-Check: Characterizing Micro-operation Caches in x86 Processors and Implications in Security and Performance. 550-564 - Jaeguk Ahn

, Jiho Kim, Hans Kasan, Zhixian Jin, Leila Delshadtehrani, WonJun Song, Ajay Joshi, John Kim:
Network-on-Chip Microarchitecture-based Covert Channel in GPUs. 565-577 - Pablo Buiras, Hamed Nemati

, Andreas Lindner, Roberto Guanciale:
Validation of Side-Channel Models via Observation Refinement. 578-591 - Sam Ainsworth

:
GhostMinion: A Strictness-Ordered Cache System for Spectre Mitigation. 592-606 - Rutvik Choudhary, Jiyong Yu

, Christopher W. Fletcher, Adam Morrison:
Speculative Privacy Tracking (SPT): Leaking Information From Speculative Execution Without Compromising Privacy. 607-622
Session 6A: Reliabiity & Verification
- Minesh Patel, Geraldo F. Oliveira

, Onur Mutlu
:
HARP: Practically and Effectively Identifying Uncorrectable Errors in Memory Chips That Use On-Die Error-Correcting Codes. 623-640 - Michael B. Sullivan, Nirmal R. Saxena, Mike O'Connor

, Donghyuk Lee, Paul Racunas, Saurabh Hukerikar, Timothy Tsai, Siva Kumar Sastry Hari, Stephen W. Keckler:
Characterizing and Mitigating Soft Errors in GPU DRAM. 641-653 - Jianping Zeng

, Hongjune Kim, Jaejin Lee, Changhee Jung:
Turnpike: Lightweight Soft Error Resilience for In-Order Cores. 654-666 - Nursultan Kabylkas

, Tommy Thorn, Shreesha Srinath, Polychronis Xekalakis, Jose Renau
:
Effective Processor Verification with Logic Fuzzer Enhanced Co-simulation. 667-678 - Yao Hsiao, Dominic P. Mulligan, Nikos Nikoleris, Gustavo Petri, Caroline Trippel:

Synthesizing Formal Models of Hardware from RTL for Efficient Verification of Memory Model Implementations. 679-694
Session 6B: GPGPU
- Jie Zhang, Myoungsoo Jung:

Ohm-GPU: Integrating New Optical Network and Heterogeneous Memory into GPU Multi-Processors. 695-708 - Lufei Liu

, Wesley Chang, Francois Demoullin, Yuan-Hsi Chou
, Mohammadreza Saed, David Pankratz, Tyler Nowicki, Tor M. Aamodt:
Intersection Prediction for Accelerated GPU Ray Tracing. 709-723 - Cesar Avalos Baddouh, Mahmoud Khairy, Roland N. Green, Mathias Payer, Timothy G. Rogers

:
Principal Kernel Analysis: A Tractable Methodology to Simulate Scaled GPU Workloads. 724-737 - Vijay Kandiah, Scott Peverelle, Mahmoud Khairy, Junrui Pan, Amogh Manjunath, Timothy G. Rogers

, Tor M. Aamodt, Nikos Hardavellas
:
AccelWattch: A Power Modeling Framework for Modern GPUs. 738-753 - Blaise Tine, Krishna Praveen Yalamarthy, Fares Elsabbagh, Hyesoon Kim:

Vortex: Extending the RISC-V ISA for GPGPU and 3D-Graphics. 754-766
Session 7A: Microarchitecture I
- Stijn Eyerman, Wim Heirman, Sam Van den Steen, Ibrahim Hur:

Enabling Branch-Mispredict Level Parallelism by Selectively Flushing Instructions. 767-778 - Niranjan K. Soundararajan, Peter Braun, Tanvir Ahmed Khan

, Baris Kasikci
, Heiner Litz, Sreenivas Subramoney:
PDede: Partitioned, Deduplicated, Delta Branch Target Buffer. 779-791 - Arthur Perais

:
Leveraging Targeted Value Prediction to Unlock New Hardware Strength Reduction Potential. 792-803 - Stephen Pruett, Yale N. Patt:

Branch Runahead: An Alternative to Branch Prediction for Impossible to Predict Branches. 804-815 - Tanvir Ahmed Khan

, Nathan Brown, Akshitha Sriraman, Niranjan K. Soundararajan, Rakesh Kumar, Joseph Devietti
, Sreenivas Subramoney, Gilles A. Pokam, Heiner Litz, Baris Kasikci
:
Twig: Profile-Guided BTB Prefetching for Data Center Applications. 816-829
Session 7B: Accelerators III
- Thierry Tambe

, Coleman Hooper, Lillian Pentecost
, Tianyu Jia, En-Yu Yang, Marco Donato, Victor Sanh, Paul N. Whatmough, Alexander M. Rush
, David Brooks, Gu-Yeon Wei:
EdgeBERT: Sentence-Level Energy Optimizations for Latency-Aware Multi-Task NLP Inference. 830-844 - Yaoyu Tao, Zhengya Zhang

:
HiMA: A Fast and Scalable History-based Memory Access Engine for Differentiable Neural Computer. 845-856 - Omar Mohamed Awad, Mostafa Mahmoud, Isak Edo, Ali Hadi Zadeh

, Ciaran Bannon, Anand Jayarajan, Gennady Pekhimenko, Andreas Moshovos:
FPRaker: A Processing Element For Accelerating Neural Network Training. 857-869 - Udit Gupta

, Samuel Hsia, Jeff Zhang, Mark Wilkening, Javin Pombra, Hsien-Hsin Sean Lee, Gu-Yeon Wei, Carole-Jean Wu, David Brooks:
RecPipe: Co-designing Models and Hardware to Jointly Optimize Recommendation Quality and Performance. 870-884 - Qiyu Wan, Haojun Xia

, Xingyao Zhang, Lening Wang, Shuaiwen Leon Song
, Xin Fu:
Shift-BNN: Highly-Efficient Probabilistic Bayesian Neural Network Training via Memory-Friendly Pattern Retrieving. 885-897
Session 8A: Superconducting & Quantum
- Mengyu Zhang, Lei Xie, Zhenxing Zhang, Qiaonian Yu, Guanglei Xi, Hualiang Zhang, Fuming Liu, Yarui Zheng, Yicong Zheng, Shengyu Zhang:

Exploiting Different Levels of Parallelism in the Quantum Control Microarchitecture for Superconducting Qubits. 898-911 - Farzaneh Zokaee, Lei Jiang:

SMART: A Heterogeneous Scratchpad Memory Architecture for Superconductor SFQ-based Systolic CNN Accelerators. 912-924 - Fei Hua, Yan-Hao Chen, Yuwei Jin, Chi Zhang, Ari B. Hayes, Youtao Zhang, Eddy Z. Zhang:

AutoBraid: A Framework for Enabling Efficient Surface Code Communication in Quantum Computing. 925-936 - Poulami Das, Swamit S. Tannu

, Moinuddin K. Qureshi:
JigSaw: Boosting Fidelity of NISQ Programs via Measurement Subsetting. 937-949 - Poulami Das, Swamit S. Tannu

, Siddharth Dangwal
, Moinuddin K. Qureshi:
ADAPT: Mitigating Idling Errors in Qubits via Adaptive Dynamical Decoupling. 950-962
Session 8B: Sparse Processing
- Hang Lu, Liang Chang, Chenglong Li, Zixuan Zhu

, Shengjian Lu, Yanhuan Liu, Mingzhe Zhang:
Distilling Bit-level Sparsity Parallelism for General Purpose Deep Learning Acceleration. 963-976 - Liqiang Lu, Yicheng Jin, Hangrui Bi, Zizhang Luo

, Peng Li
, Tao Wang, Yun Liang:
Sanger: A Co-Design Framework for Enabling Sparse Attention using Reconfigurable Architecture. 977-991 - Shiyu Li

, Edward Hanson
, Xuehai Qian, Hai (Helen) Li, Yiran Chen:
ESCALATE: Boosting the Efficiency of Sparse CNN Accelerator with Kernel Decomposition. 992-1004 - Subhankar Pal

, Aporva Amarnath, Siying Feng, Michael F. P. O'Boyle, Ronald G. Dreslinski, Christophe Dubach:
SparseAdapt: Runtime Control for Sparse Linear Algebra on a Reconfigurable Accelerator. 1005-1021 - Alexander Rucker, Matthew Vilim, Tian Zhao, Yaqi Zhang, Raghu Prabhakar, Kunle Olukotun:

Capstan: A Vector RDA for Sparsity. 1022-1035
Session 9A: Graph Processing
- Abanti Basak, Zheng Qu, Jilan Lin

, Alaa R. Alameldeen, Zeshan Chishti, Yufei Ding, Yuan Xie:
Improving Streaming Graph Processing Performance using Input Knowledge. 1036-1050 - Tong Geng, Chunshu Wu, Yongan Zhang, Cheng Tan

, Chenhao Xie
, Haoran You, Martin C. Herbordt, Yingyan Lin, Ang Li:
I-GCN: A Graph Convolutional Network Accelerator with Runtime Locality Enhancement through Islandization. 1051-1063 - Quan M. Nguyen, Daniel Sánchez:

Fifer: Practical Acceleration of Irregular Applications on Reconfigurable Architectures. 1064-1077 - Jie-Fang Zhang

, Zhengya Zhang
:
Point-X: A Spatial-Locality-Aware Architecture for Energy-Efficient Graph-Based Point-Cloud Deep Learning. 1078-1090 - Shafiur Rahman, Mahbod Afarin

, Nael B. Abu-Ghazaleh, Rajiv Gupta
:
JetStream: Graph Analytics on Streaming Data with Event-Driven Hardware Accelerator. 1091-1105
Session 9B: Virtual Memory & Prefetching
- Venkat Sri Sai Ram, Ashish Panwar, Arkaprava Basu:

Trident: Harnessing Architectural Resources for All Page Sizes in x86 Processors. 1106-1120 - Rahul Bera

, Konstantinos Kanellopoulos, Anant Nori, Taha Shahroodi, Sreenivas Subramoney, Onur Mutlu
:
Pythia: A Customizable Hardware Prefetching Framework Using Online Reinforcement Learning. 1121-1137 - Georgios Vavouliotis

, Lluc Alvarez, Boris Grot, Daniel A. Jiménez
, Marc Casas
:
Morrigan: A Composite Instruction TLB Prefetcher. 1138-1153 - Bingyao Li

, Jieming Yin, Youtao Zhang, Xulong Tang
:
Improving Address Translation in Multi-GPUs via Sharing and Spilling aware TLB Design. 1154-1168 - Jagadish B. Kotra, Michael LeBeane, Mahmut T. Kandemir, Gabriel H. Loh:

Increasing GPU Translation Reach by Leveraging Under-Utilized On-Chip Resources. 1169-1181
Session 10A: Security & Privacy III
- Lois Orosa

, Abdullah Giray Yaglikçi
, Haocong Luo, Ataberk Olgun, Jisung Park
, Hasan Hassan, Minesh Patel, Jeremie S. Kim, Onur Mutlu
:
A Deeper Look into RowHammer's Sensitivities: Experimental Analysis of Real DRAM Chipsand Implications on Future Attacks and Defenses. 1182-1197 - Hasan Hassan, Yahya Can Tugrul, Jeremie S. Kim, Victor van der Veen, Kaveh Razavi, Onur Mutlu

:
Uncovering In-DRAM RowHammer Protection Mechanisms: A New Methodology, Custom RowHammer Patterns, and Implications. 1198-1213 - Kazi Abu Zubair, Sudhanva Gurumurthi, Vilas Sridharan, Amro Awad

:
Soteria: Towards Resilient Integrity-Protected and Encrypted Non-Volatile Memories. 1214-1226 - Alexander Freij, Huiyang Zhou, Yan Solihin:

Bonsai Merkle Forests: Efficiently Achieving Crash Consistency in Secure Persistent Memory. 1227-1240 - Xijing Han, James Tuck

, Amro Awad
:
Dolos: Improving the Performance of Persistent Applications in ADR-Supported Secure Memory. 1241-1253
Session 10B: Microarchitecture II
- Vasileios Tsoutsouras, Orestis Kaparounakis

, Bilgesu Arif Bilgin, Chatura Samarakoon, James Timothy Meech, Jan Heck, Phillip Stanley-Marbell:
The Laplace Microarchitecture for Tracking Data Uncertainty and Its Implementation in a RISC-V Processor. 1254-1269 - Chanchal Kumar

, Anirudh Seshadri, Aayush Chaudhary, Shubham Bhawalkar, Rohit Singh, Eric Rotenberg
:
Post-Fabrication Microarchitecture. 1270-1281 - Yuanchao Xu

, Mehmet Esat Belviranli, Xipeng Shen, Jeffrey S. Vetter:
PCCS: Processor-Centric Contention-aware Slowdown Model for Heterogeneous System-on-Chips. 1282-1295 - Josué Feliu

, Alberto Ros
, Manuel E. Acacio
, Stefanos Kaxiras:
ITSLF: Inter-Thread Store-to-Load Forwardingin Simultaneous Multithreading. 1296-1308 - Liu Liu

, Jilan Lin
, Zheng Qu, Yufei Ding, Yuan Xie:
ENMC: Extreme Near-Memory Classification via Approximate Screening. 1309-1322

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














