


default search action
58th MICRO 2025: Seoul, Korea
- Proceedings of the 58th IEEE/ACM International Symposium on Microarchitecture, MICRO 2025, Seoul, Republic of Korea, October 18-22, 2025. ACM 2025, ISBN 979-8-4007-1573-0

1A: Systems for AI (LLMs) - 1
- Yue Pan, Zihan Xia, Po-Kai Hsu, Lanxiang Hu, Hyungyo Kim, Janak Sharda, Minxuan Zhou, Nam Sung Kim, Shimeng Yu, Tajana Rosing, Mingu Kang:

Stratum: System-Hardware Co-Design with Tiered Monolithic 3D-Stackable DRAM for Efficient MoE Serving. 1-17 - Tianhua Xia, Sai Qian Zhang:

Kelle: Co-design KV Caching and eDRAM for Efficient LLM Serving in Edge Computing. 18-33 - Derrick Quinn, E. Ezgi Yücel, Jinkwon Kim, José F. Martínez, Mohammad Alian:

LongSight: Compute-Enabled Memory to Accelerate Large-Context LLMs via Sparse Attention. 34-48
1B: Processing-In-Memory - 1
- Seunghyuk Yu, Hyeonu Kim, Kyoungho Jeun, Sunyoung Hwang, Seongmin Cho, Eojin Lee:

ComPASS: A Compatible PIM Protocol Architecture and Scheduling Solution for Processor-PIM Collaboration. 49-62 - Jeehyun Kim, Donghyeon Kim, Seokwon Kang, Bongjoon Hyun, Inho Lee, Yongjun Park:

PIM-CCA: An Efficient PIM Architecture with Optimized Integration of Configurable Functional Units. 63-77 - Zhiheng Yue, Yang Wang, Chao Li, Shaojun Wei, Yang Hu, Shouyi Yin:

3D-PATH: A Hierarchy LUT Processing-in-memory Accelerator with Thermal-aware Hybrid Bonding Integration. 78-93
1C: Security and Privacy - Side Channels
- Silvan Niederer, Sandro Rüegge, Ali Hajiabadi, Kaveh Razavi:

One Flew over the Stack Engine's Nest: Practical Microarchitectural Attacks on the Stack Engine. 94-110 - Md. Sadik Awal, Md Tauhidur Rahman:

DExiM: Exposing Impedance-Based Data Leakage in Emerging Memories. 111-124 - Kanqi Zhang, Peinan Li, Miao Li, Xin Tian, Zelong Du, Quanchen Liu, Yongqiang Lyu, Yu Jiang, Dan Meng, Rui Hou:

Sonar: A Hardware Fuzzing Framework to Uncover Contention Side Channels in Processors. 125-139
1D: Microarchitecture - Prefetching 1
- Gilead Posluns, Mark C. Jeffrey:

Symbiotic Task Scheduling and Data Prefetching. 140-155 - Yanhua Chen, Jiong Feng, Zhe Wang, Christopher J. Hughes, Jiayi Huang:

Software Prefetch Multicast: Sharer-Exposed Prefetching for Bandwidth Efficiency in Manycore Processors. 156-169 - Ningzhi Ai, Wenjian He, Hu He, Jing Xia, Heng Liao, Guowei Zhang:

RICH Prefetcher: Storing Rich Information in Memory to Trade Capacity and Bandwidth for Latency Hiding. 170-183
2A: Systems for AI (LLMs) - 2
- Gerasimos Gerogiannis, Stijn Eyerman, Evangelos Georganas, Wim Heirman, Josep Torrellas:

DECA: A Near-Core LLM Decompression Accelerator Grounded on a 3D Roofline Model. 184-200 - Hanchen Ye, Deming Chen:

StreamTensor: Make Tensors Stream in Dataflow Accelerators for LLMs. 201-216 - Nikoleta Iliakopoulou, Jovan Stojkovic, Chloe Alverti, Tianyin Xu, Hubertus Franke, Josep Torrellas:

Chameleon: Adaptive Caching and Scheduling for Many-Adapter LLM Inference Environments. 217-231 - Donghyeon Joo, Helya Hosseini, Ramyad Hadidi, Bahar Asgari:

Coruscant: Co-Designing GPU Kernel and Sparse Tensor Core to Advocate Unstructured Sparsity in Efficient LLM Inference. 232-245
2B: Processing-In-Memory - 2
- Je-Woo Jang, Junyong Oh, Youngbae Kong, Jae-Youn Hong, Sung-Hyuk Cho, Jeongyeol Lee, Hoeseok Yang, Joon-Sung Yang:

Accelerating Retrieval Augmented Language Model via PIM and PNM Integration. 246-262 - Ruiyang Chen, Zhuoran Song, Yicheng Zheng, Zeyu Zhu, Gang Li, Naifeng Jing, Xiaoyao Liang, Haibing Guan:

HEAT: NPU-NDP HEterogeneous Architecture for Transformer-Empowered Graph Neural Networks. 263-276 - Mohammadreza Saed, Prashant J. Nair, Tor M. Aamodt:

RayN: Ray Tracing Acceleration with Near-memory Computing. 277-291 - Wonung Kim, Yubin Lee, Yoonsung Kim, Jinwoo Hwang, Seongryong Oh, Jiyong Jung, Aziz Huseynov, Woong Gyu Park, Chang Hyun Park, Divya Mahajan, Jongse Park:

Pimba: A Processing-in-Memory Acceleration for Post-Transformer Large Language Model Serving. 292-307
2C: Security and Privacy - Machine Learning
- Joshua Kalyanapu, Farshad Dizani, Darsh Asher, Azam Ghanbari, Rosario Cammarota, Aydin Aysu, Samira Mirbagher Ajorpaz:

GateBleed: Exploiting On-Core Accelerator Power Gating for High Performance and Stealthy Attacks on AI. 308-325 - Yinghao Yang, Xicheng Xu, Liang Chang, Hang Lu, Xiaowei Li:

Athena: Accelerating Quantized Convolutional Neural Networks under Fully Homomorphic Encryption. 326-339 - Chenxu Wang, Danqing Tang, Changxu Ci, Junjie Huang, Yankai Xu, Fengwei Zhang, Jiannong Cao, Jie Song, Shoumeng Yan, Tao Wei, Zhengyu He:

ccAI: A Compatible and Confidential System for AI Computing. 340-353 - Chenqi Lin, Kang Yang, Tianshi Xu, Ling Liang, Yufei Wang, Zhaohui Chen, Runsheng Wang, Mingyu Gao, Meng Li:

Ironman: Accelerating Oblivious Transfer Extension for Privacy-Preserving AI with Near-Memory Processing. 354-368
2D: GPU - 1
- Rodrigo Huerta, Mojtaba Abaie Shoushtary, José-Lorenzo Cruz, Antonio González:

Dissecting and Modeling the Architecture of Modern GPU Cores. 369-384 - Tianao Ge, Xiaowen Chu, Hongyuan Liu:

Interleaved Bitstream Execution for Multi-Pattern Regex Matching on GPUs. 385-400 - Sungbin Jang, Junhyeok Park, Yongho Lee, Osang Kwon, Donghyun Kim, Juyoung Seok, Seokin Hong:

SoftWalker: Supporting Software Page Table Walk for Irregular GPU Applications. 401-417 - Yeonan Ha, Jiho Park, Hanna Cha, Jiwon Lee, Joonsung Kim, Won Woo Ro, Youngsok Kim:

LATPC: Accelerating GPU Address Translation Using Locality-Aware TLB Prefetching and MSHR Compression. 418-431
3A: Systems for AI (Emerging Applications)
- Zihan Zou, Xinming Yan, Shun Zhang, Peng Zheng, Guang Yang, Hao Cai, Bo Liu:

S-DMA: Sparse Diffusion Models Acceleration via Spatiality-Aware Prediction and Dimension-Adaptive Dataflow. 432-444 - Ceyu Xu, Yongji Wu, Xinyu Yang, Beidi Chen, Matthew Lentz, Danyang Zhuo, Lisa Wu Wills:

LLM.265: Video Codecs are Secretly Tensor Codecs. 445-460 - In-Jun Jung, Gyeongrok Yang, Jaeha Min, Joo-Young Kim:

HLX: A Unified Pipelined Architecture for Optimized Performance of Hybrid Transformer-Mamba Language Models. 461-475 - Sixu Li, Yuzhou Chen, Chaojian Li, Yonggan Fu, Zheng Wang, Zhongzhi Yu, Haoran You, Zhifan Ye, Wei Zhou, Yongan Zhang, Yingyan (Celine) Lin:

ORCHES: Orchestrated Test-Time-Compute-based LLM Reasoning on Collaborative GPU-PIM HEterogeneous System. 476-489
3B: Microarchitecture I
- Márton Erdos, Utpal Bora, Akshay Bhosale, Bob Lytton, Ali Mustafa Zaidi, Alexandra W. Chadwick, Yuxin Guo, Giacomo Gabrielli, Timothy M. Jones:

LoopFrog: In-Core Hint-Based Loop Parallelization. 490-503 - Qingxuan Kang, Trevor E. Carlson:

Multi-Stream Squash Reuse for Control-Independent Processors. 504-518 - Sweta, Prerna Priyadarshini, Biswabandan Panda:

Drishti: Do Not Forget Slicing While Designing Last-Level Cache Replacement Policies for Many-Core Systems. 519-532 - Henry Kao, Nikhil Sreekumar, Prabhdeep Singh Soni, Ali Sedaghati, Fang Su, Bryan Chan, Maziar Goudarzi, Reza Azimi:

A TRRIP Down Memory Lane: Temperature-Based Re-Reference Interval Prediction For Instruction Caching. 533-546
3C: Quantum - 1
- Junpyo Kim, Jungmin Cho, Hyeonseong Jeong, Dongmoon Min, Junhyuk Choi, Juwon Hong, Jangwoo Kim:

LANCER: Low-Overhead, Accurate, and Non-Destructive Calibration for Real-World Fault-Tolerant Quantum Applications. 547-563 - Yilun Zhao, Kangding Zhao, Peng Zhou, Dingdong Liu, Tingyu Luo, Yuzhen Zheng, Peng Luo, Shun Hu, Jin Lin, Cheng Guo, Yinhe Han, Ying Wang, Mingtang Deng, Junjie Wu, Xiang Fu:

Distributed-HISQ: A Distributed Quantum Control Architecture. 564-578 - Chaithanya Naik Mude, Swamit Tannu:

Accurate Leakage Speculation for Quantum Error Correction. 579-594 - Wuwei Tian, Liqiang Lu, Siwei Tan, Shiyu Li, Hengyi Li, Tianyao Chu, Xuhong Zhang, Mingshuai Chen, Jianwei Yin:

YOUTIAO: Hybrid Multiplexing with Dynamic Qubit Grouping for Low-cost and Scalable Quantum Wiring. 595-608
4A: Systems for AI (Training)
- Jinghan Huang, Hyungyo Kim, Nachuan Wang, Jaeyoung Kang, Hrishi Shah, Eun Kyung Lee, Minjia Zhang, Fan Lai, Nam Sung Kim:

NetZIP: Algorithm/Hardware Co-design of In-network Lossless Compression for Distributed Large Model Training. 609-625 - Seokjin Go, Joongun Park, Spandan More, Hanjiang Wu, Irene Wang, Aaron Jezghani, Tushar Krishna, Divya Mahajan:

Characterizing the Efficiency of Distributed Training: A Power, Performance, and Thermal Perspective. 626-642 - Hans Kasan, Dennis Abts, Jungwook Choi, John Kim:

SkipReduce: (Interconnection) Network Sparsity to Accelerate Distributed Machine Learning. 643-658 - Le Qin, Junwei Cui, Weilin Cai, Meng Niu, Yan Yang, Jiayi Huang:

Optimizing All-to-All Collective Communication with Fault Tolerance on Torus Networks. 659-674
4B: Microarchitecture II
- Jiuyang Liu, Qinjun Li, Yunqian Luo, Hongbin Zhang, Jiongjia Lu, Shupei Fan, Jianhao Ye, Yang Liu, Xiaoyi Liu, Yanqi Yang, Zewen Ye, Yuhang Zeng, Ao Shen, Rui Huang, Wei Cong, Xuecheng Zou, Mingyu Gao:

Titan-I: An Open-Source, High Performance RISC-V Vector Core. 675-690 - Ishita Chaturvedi, Bhargav Reddy Godala, Abiram Gangavaram, Daniel Flyer, Tyler Sorensen, Tor M. Aamodt, David I. August:

SHADOW: Simultaneous Multi-Threading Architecture with Asymmetric Threads. 691-704 - Yinyuan Zhao, Surim Oh, Mingsheng Xu, Heiner Litz:

ATR: Out-of-Order Register Release Exploiting Atomic Regions. 705-718
4C: Quantum - 2
- Kaiwen Zhou, Liqiang Lu, Debin Xiang, Chenning Tao, Anbang Wu, Jingwen Leng, Fangxin Liu, Mingshuai Chen, Jianwei Yin:

Vegapunk: Accurate and Fast Decoding for Quantum LDPC Codes with Online Hierarchical Algorithm and Sparse Accelerator. 719-732 - Hezi Zhang, Jixuan Ruan, Dean Tullsen, Yufei Ding, Ang Li, Travis S. Humble:

OneAdapt: Resource-Adaptive Compilation of Measurement-Based Quantum Computing for Photonic Hardware. 733-748 - Xian Wu, Chenghong Zhu, Jingbo Wang, Xin Wang:

MUSS-TI: Multi-level Shuttle Scheduling for Large-Scale Entanglement Module Linked Trapped-Ion. 749-763 - Qifan Jiang, Liqiang Lu, Debin Xiang, Tianyao Chu, Tianze Zhu, Jingwen Leng, Yun Liang, Xiaoming Sun, Jianwei Yin:

Rasengan: A Transition Hamiltonian-based Approximation Algorithm for Solving Constrained Binary Optimization Problems. 764-777
4D: Sparsity - 1
- Ubaid Bakhtiar, Amirmahdi Namjoo, Bahar Asgari:

Chasoň: Supporting Cross HBM Channel Data Migration to Enable Efficient Sparse Algebraic Acceleration. 778-794 - Ritvik Sharma, Zi Yu Xue, Nathan Zhang, Rubens Lacouture, Fredrik Kjolstad, Sara Achour, Mark Horowitz:

A Probabilistic Perspective on Tiling Sparse Tensor Algebra. 795-808 - Sanjali Yadav, Bahar Asgari:

Bootes: Boosting the Efficiency of Sparse Accelerators Using Spectral Clustering. 809-823 - Sanjali Yadav, Amirmahdi Namjoo, Bahar Asgari:

Misam: Machine Learning Assisted Dataflow Selection in Accelerators for Sparse Matrix Multiplication. 824-838
5A: Systems for AI (Quantization)
- Jiaxiang Zou, Yonghao Chen, Xingyu Chen, Chenxi Xu, Xinyu Chen:

AxCore: A Quantization-Aware Approximate GEMM Unit for LLM Inference. 839-853 - Xilong Xie, Liang Wang, Limin Xiao, Meng Han, Lei Liu, Xiangrong Xu, Jinquan Wang, Zhen Song, Xiaojian Liao:

Amove: Accelerating LLMs through Mitigating Outliers and Salient Points via Fine-Grained Grouped Vectorized Data Type. 854-868 - Jungi Lee, Junyong Park, Soohyun Cha, Jaehoon Cho, Jaewoong Sim:

MX+: Pushing the Limits of Microscaling Formats for Efficient Large Language Model Serving. 869-883
5B: Microarchitecture - Prefetching 2
- Charles Block, Gerasimos Gerogiannis, Josep Torrellas:

Micro-MAMA: Multi-Agent Reinforcement Learning for Multicore Prefetching. 884-898 - Yuxin Guo, Akshay Bhosale, Utpal Bora, Alexandra W. Chadwick, Márton Erdos, Giacomo Gabrielli, Timothy M. Jones:

Ghost Threading: Helper-Thread Prefetching for Real Systems. 899-914 - Shuiyi He, Zicong Wang, Xuan Tang, Hao Tang, Dezun Dong, Liquan Xiao:

Elevating Temporal Prefetching Through Instruction Correlation. 915-928
5C: Sparsity - 2
- Courtney Golden, Axel Feldmann, Joel S. Emer, Daniel Sánchez:

Quartz: A Reconfigurable, Distributed-Memory Accelerator for Sparse Applications. 929-943 - Xintong Li, Jinchen Jiang, Mingyu Gao:

SeaCache: Efficient and Adaptive Caching for Sparse Accelerators. 944-957 - Gerasimos Gerogiannis, Dimitrios Merkouriadis, Charles Block, Annus Zulfiqar, Filippos Tofalos, Muhammad Shahbaz, Josep Torrellas:

NetSparse: In-Network Acceleration of Distributed Sparse Kernels. 958-974
5D: Superconducting Systems
- Ismail Emir Yuksel, Ataberk Olgun, Nisa Bostanci, Haocong Luo, Abdullah Giray Yaglikçi, Onur Mutlu:

ColumnDisturb: Understanding Column-based Read Disturbance in Real DRAM Chips and Implications for Future Systems. 975-994 - Junhyuk Choi, Juwon Hong, Junpyo Kim, Jungmin Cho, Hyeonseong Jeong, Dongmoon Min, Masamitsu Tanaka, Koji Inoue, Jangwoo Kim:

SuperSFQ: A Hardware Design to Realize High-Frequency Superconducting Processors. 995-1010 - Niansong Zhang, Wenbo Zhu, Courtney Golden, Dan Ilan, Hongzheng Chen, Christopher Batten, Zhiru Zhang:

Characterizing and Optimizing Realistic Workloads on a Commercial Compute-in-SRAM Device. 1011-1025
6A: GPUs - 2
- Xiaojie Li, Mingyu Wang, Baiqing Zhong, Haiqiu Huang, Guangjie Cao, Zhiyi Yu:

C3ache: Towards Hierarchical Cache-Centric Computing for Sparse Matrix Multiplication on GPGPUs. 1026-1039 - Junhyeok Park, Sungbin Jang, Osang Kwon, Yongho Lee, Seokin Hong:

Leveraging Chiplet-Locality for Efficient Memory Mapping in Multi-Chip Module GPUs. 1040-1057 - Qizhong Wang, Xiangyue Huang, Yanan Guo, Yuanchao Xu:

Security and Performance Implications of GPU Cache Eviction Priority Hints. 1058-1072
6B: Security and Privacy - Memory
- Haoran Geng, Xiaoyang Lu, Yuezhi Che, Ziang Tian, Dazhao Cheng, Xian-He Sun, Michael T. Niemier, X. Sharon Hu:

COSMOS: RL-Enhanced Locality-Aware Counter Cache Optimization for Secure Memory. 1073-1086 - Debpratim Adak, Eric Rotenberg, Amro Awad, Huiyang Zhou:

CryptoBTB: A Secure Hierarchical BTB for Diverse Instruction Footprint Workloads. 1087-1101 - Chuanhan Li, Jishen Zhao, Yuanchao Xu:

Efficient Security Support for CXL Memory through Adaptive Incremental Offloaded (Re-)Encryption. 1102-1116 - Anish Saxena, Walter Wang, Alexandros Daglis:

Citadel: Rethinking Memory Allocation to Safeguard Against Inter-Domain Rowhammer Exploits. 1117-1131
6C: Energy and Power
- Gyeongseo Park, Minho Kim, Ki-Dong Kang, Yunhyeong Jeon, Seulki Kim, Daehoon Kim:

EcoCore: Dynamic Core Management for Improving Energy Efficiency in Latency-Critical Applications. 1132-1146 - Alireza Raisiardali, Konstantinos Iordanou, Jedrzej Kufel, Kowshik Gudimetla, Kris Myny, Emre Ozer:

Flexing RISC-V Instruction Subset Processors to Extreme Edge. 1147-1159 - Yuqi Xue, Jian Huang:

ReGate: Enabling Power Gating in Neural Processing Units. 1160-1177 - Pingyi Huo, Anusha Devulapally, Hasan Al Maruf, Nandhini Chandramoorthy, Meena Arunachalam, Gulsum Gudukbay Akbulut, Mahmut T. Kandemir, Vijaykrishnan Narayanan:

Multi-Dimensional ML-Pipeline Optimization in Cost-Effective Disaggregated Datacenter. 1178-1192
6D: Reconfigurable Computing and Storage
- Hyunjin Kim, Seunghwan Song, Sukhyun Choi, Jeongin Choe, Sanghyeok Han, Jisung Park, Jinho Lee, Jae-Joon Kim:

CrossBit: Bitwise Computing in NAND Flash Memory with Inter-Bitline Data Communication. 1193-1206 - Jaeyong Lee, Beomjun Kim, Myoungjun Chun, Myungsuk Kim, Jihong Kim:

DEAR: Improving Performance and Lifetime of SSDs Using Dynamic Error-Aware Refresh. 1207-1220 - Rohan Juneja, Pranav Dangi, Thilini Kaushalya Bandara, Tulika Mitra, Li-Shiuan Peh:

Nexus Machine: An Energy-Efficient Active Message Inspired Reconfigurable Architecture. 1221-1235 - Yufei Yang, Chenhao Xie, Chuliang Guo, Liansheng Liu, Xiyuan Peng, Datong Liu, Yu Peng:

FexMo: Enabling Fuse Execution Mode for Multi-task CGRAs. 1236-1249
7A: Systems for AI (HW/SW Support)
- Yu Gong, Lingyi Huang, Haodong Chang, Rongjian Liang, Cheng Yang, Zhexiang Tang, Jiang Hu, Bo Yuan:

Crane: Inter-Layer Scheduling Framework for DNN Inference and Training Co-Support on Tiled Architecture. 1250-1263 - Peng Gao, Yang Liu, Haonan Sun, Jiang Jiang, Jun Wang, Zonghui Hong, Jiali Qu:

OASIS: A Commercial High Performance Terminal AI Processor Supporting RISC-V Tensor Extension Instructions. 1264-1283 - Yiqi Liu, Yuqi Xue, Noelle Crawford, Jilong Xue, Jian Huang:

Elk: Exploring the Efficiency of Inter-core Connected AI Chips with Deep Learning Compiler Techniques. 1284-1299 - Mohammadreza Esmali Nojehdeh, Hossein Mokhtarnia, Julian Pavon, Narcís Rodas, Roger Figueras Bagué, Enrico Reggiani, Miquel Moretó, Osman S. Unsal, Adrián Cristal, Eduard Ayguadé:

Empowering Vector Architectures for ML: The CAMP Architecture for Matrix Multiplication. 1300-1315 - Devansh Jain, Marco Frigo, Jai Arora, Akash Pardeshi, Zhihao Wang, Krut Patel, Charith Mendis:

TAIDL: Tensor Accelerator ISA Definition Language with Auto-generation of Scalable Test Oracles. 1316-1333
7B: Tools and Simulators
- Rishov Sarkar, Cong Hao:

OmniSim: Simulating Hardware with C Speed and RTL Accuracy for High-Level Synthesis Designs. 1334-1346 - Tiantian Lin, Cheng Qiu, Xiaohang Wang, Ling Wang, Zhulin Zheng, Yingtao Jiang, Amit Kumar Singh, Jieming Yin, Sihai Qiu, Xiaodong Li, Xin Tang, Jie Song, Mingzhe Zhang, Kui Ren:

LEGOSim: A Unified Parallel Simulation Framework for Multi-chiplet Heterogeneous Integration. 1347-1362 - Wonhyuk Yang, Yunseon Shin, Okkyun Woo, Geonwoo Park, Hyungkyu Ham, Jeehoon Kang, Jongse Park, Gwangsun Kim:

PyTorchSim: A Comprehensive, Fast, and Accurate NPU Simulation Framework. 1363-1380 - Kaiyan Chang, Wenlong Zhu, Shengwen Liang, Huawei Li, Ying Wang:

LLMulator: Generalizable Cost Modeling for Dataflow Accelerators with Input-Adaptive Control Flow. 1381-1396 - Euijun Chung, Seonjin Na, Sung Ha Kang, Hyesoon Kim:

Swift and Trustworthy Large-Scale GPU Simulation with Fine-Grained Error Modeling and Hierarchical Clustering. 1397-1411
7C: Reliability, Fault-tolerance
- F. Nisa Bostanci, Oguzhan Canpolat, Ataberk Olgun, Ismail Emir Yüksel, Konstantinos Kanellopoulos, Mohammad Sadrosadati, Abdullah Giray Yaglikçi, Onur Mutlu:

Understanding and Mitigating Covert Channel and Side Channel Vulnerabilities Introduced by RowHammer Defenses. 1412-1432 - Weijie Chen, Shan Tang, Yulin Tang, Xiapu Luo, Yinqian Zhang, Weizhong Qiang:

ρHammer: Reviving RowHammer Attacks on New Architectures via Prefetching. 1433-1447 - Hoiju Chung, Euisang Oh, Seungmin Baek, Hyeongshin Yoon, Jaesung Yoo, Sanghwan Lee, Yongjun Lee, Arhatha Bramhanand, Brett Dodds, Yang Zhou, Nam Sung Kim:

DRAM Fault Classification through Large-Scale Field Monitoring for Robust Memory RAS Management. 1448-1461 - Kunlin You, Yinan Xu, Kehan Feng, Luoshan Cai, Yaoyang Zhou, Yungang Bao:

DiffTest-H: Toward Semantic-Aware Communication in Hardware-Accelerated Processor Verification. 1462-1476 - Samit Shahnawaz Miftah, Amisha Srivastava, Hyunmin Kim, Shiyi Wei, Kanad Basu:

SymbFuzz: Symbolic Execution Guided Hardware Fuzzing. 1477-1490
7D: Graph Processing and HPC
- Linxuan Zhang, José Nelson Amaral, Di Niu:

TransFusion: End-to-End Transformer Acceleration via Graph Fusion and Pipelining. 1491-1504 - Chenxi Xu, Tianhui Shi, Shixuan Sun, Jidong Zhai, Xinyu Chen:

X-SET: An Efficient Graph Pattern Matching Accelerator With Order-Aware Parallel Intersection Units. 1505-1519 - Changmin Shin, Jaeyong Song, Seongmin Na, Jun Sung, Hongsun Jang, Jinho Lee:

FALA: Locality-Aware PIM-Host Cooperation for Graph Processing with Fine-Grained Column Access. 1520-1534 - Amir Ghazizadeh Ahsaei, Lingxiang Yin, Shilin Tian, Fangzhou Ye, Fan Yao, Hao Zheng:

Rethinking Tiling and Dataflow for SpMM Acceleration: A Graph Transformation Framework. 1535-1548 - Lucas Morais, Juan Miguel De Haro Ruiz, Alfredo Goldman, Guido Araujo, Giacomo Pedretti, Jim Ignowski, Michael Frank, Xavier Martorell, Daniel Jiménez-González, Carlos Álvarez:

Boosting Task Scheduling Data Locality with Low-latency, HW-accelerated Label Propagation. 1549-1564
8A: Systems for AI (Data Representations)
- Seunghyun Lee, Dongho Ha, Sungbin Kim, Sungwoo Kim, Hyunwuk Lee, Won Woo Ro:

BitL: A Hybrid Bit-Serial and Parallel Deep Learning Accelerator for Critical Path Reduction. 1565-1578 - Yao Chen, Cheng Gong, Bingsheng He:

HiPACK: Efficient Sub-8-Bit Direct Convolution with SIMD and Bitwise Management. 1579-1591 - Huizheng Wang, Zichuan Wang, Zhiheng Yue, Yousheng Long, Taiquan Wei, Jianxun Yang, Yang Wang, Chao Li, Shaojun Wei, Yang Hu, Shouyi Yin:

MCBP: A Memory-Compute Efficient LLM Inference Accelerator Leveraging Bit-Slice-enabled Sparsity and Repetitiveness. 1592-1608
8B: Systems for AI (Processor Architecture)
- Cheng Zou, Ziling Wei, Jun Yan Lee, Chen Nie, Kang You, Zhezhi He:

PolymorPIC: Embedding Polymorphic Processing-in-Cache in RISC-V based Processor for Full-stack Efficient AI Inference. 1609-1624 - Qizhe Wu, Jinyi Zhou, Zhanhe Hu, Zhichen Zeng, Huawen Liang, Jiuru Zhu, Linfeng Tao, Xin Zhang, Zekang Cheng, Letian Zhao, Wei Yuan, Xiaotian Wang, Xi Jin:

MHE-TPE: Multi-Operand High-Radix Encoder for Mixed-Precision Fixed-Point Tensor Processing Engines. 1625-1639 - Sabuj Laskar, Pranati Majhi, Abdullah Muzahid, Eun Jung Kim:

SuperMesh: Energy-Efficient Collective Communications for Accelerators. 1640-1655
8C: Emerging Applications - 1
- Max Doblas, Po Jui Shih, Oscar Lostes-Cazorla, Miquel Moreto, Christopher Batten, Santiago Marco-Sola:

SMX: Heterogeneous Architecture for Universal Sequence Alignment Acceleration. 1656-1671 - Guy Eichler, Yatin Gilhotra, Nanyu Zeng, Martha A. Kim, Kenneth L. Shepard, Luca P. Carloni:

MINDFUL: Safe, Implantable, Large-Scale Brain-Computer Interfaces from a System-Level Design Perspective. 1672-1689 - Chuan Liu, Chunshu Wu, Ruibing Song, Guangyan Sun, Ying Nian Wu, Yousu Chen, Ang Li, Tong Geng:

DS-TIDE: Harnessing Dynamical Systems for Efficient Time-Independent Differential Equation Solving. 1690-1703
9A: Security and Privacy - Cryptography, Speculation and Computational Storage
- Naifeng Zhang, Sophia Fu, Franz Franchetti:

Towards Closing the Performance Gap for Cryptographic Kernels Between CPUs and Specialized Hardware. 1704-1718 - Liang Kong, Shengyu Fan, Xianglong Deng, Lei Chen, Guang Fan, Guiming Shi, Yilan Zhu, Geng Yang, Shoumeng Yan, Mingzhe Zhang:

HAWK: Fully Homomorphic Encryption Accelerator with Fixed-Word Key Decomposition Switching. 1719-1734 - Amund Bergland Kvalsvik, Magnus Själander:

ShadowBinding: Realizing Effective Microarchitectures for In-Core Secure Speculation Schemes. 1735-1748 - Zehao Chen, Honghui You, Qian Wei, Hang Lu, Lei Ju, Zhaoyan Shen:

SmartPIR: A Private Information Retrieval System using Computational Storage Devices. 1749-1762
9B: Memory
- Hwanjun Lee, Minho Kim, Yeji Jung, Seonmu Oh, Ki-Dong Kang, Seunghak Lee, Daehoon Kim:

Beyond Page Migration: Enhancing Tiered Memory Performance via Integrated Last-Level Cache Management and Page Migration. 1763-1776 - Kaiyang Zhao, Yuang Chen, Xenia Xu, Dan Schatzberg, Nastaran Hajinaza, Rupin Vakharwala, Andy Anderson, Dimitrios Skarlatos:

Learning to Walk: Architecting Learned Virtual Memory Translation. 1777-1792 - Víctor Soria Pardos, Adrià Armejach, Tiago Mück, Darío Suárez Gracia, José A. Joao, Miquel Moretó:

A. Delegato: Locality-Aware Atomic Memory Operations on Chiplets. 1793-1808 - Houxiang Ji, Yifan Yuan, Yang Zhou, Ipoom Jeong, Ren Wang, Saksham Agarwal, Nam Sung Kim:

Re-architecting End-host Networking with CXL: Coherence, Memory, and Offloading. 1809-1823
9C: Emerging Applications - 2
- Minnan Pei, Gang Li, Junwen Si, Zeyu Zhu, Zitao Mo, Peisong Wang, Zhuoran Song, Xiaoyao Liang, Jian Cheng:

GCC: A 3DGS Inference Architecture with Gaussian-Wise and Cross-Stage Conditional Processing. 1824-1837 - Leshu Li, Jiayin Qin, Jie Peng, Zishen Wan, Huaizhi Qu, Ye Han, Pingqing Zheng, Hongsen Zhang, Yu Cao, Tianlong Chen, Yang Katie Zhao:

RTGS: Real-Time 3D Gaussian Splatting SLAM via Multi-Level Redundancy Reduction. 1838-1851 - Hongyi Wang, Zhenhua Zhu, Tianchen Zhao, Yunfei Xiang, Zehao Wang, Jincheng Yu, Huazhong Yang, Yuan Xie, Yu Wang:

REACT3D: Real-time Edge Accelerator for Incremental Training in 3D Gaussian Splatting based SLAM Systems. 1852-1866 - Meng Han, Liang Wang, Limin Xiao, Hao Zhang, Bowen Jiang, Xilong Xie, Jianfeng Zhu, Shaojun Wei, Leibo Liu:

PointISA: ISA-Extensions for Efficient Point Cloud Analytics via Architecture and Algorithm Co-Design. 1867-1881

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














