


default search action
32nd HPCA 2026: Sydney, Australia
- IEEE International Symposium on High Performance Computer Architecture, HPCA 2026, Sydney, Australia, January 31 - Feb. 4, 2026. IEEE 2026, ISBN 979-8-3315-9302-5

- Junghoon Kim, Jongheon Jeong, Seokwon Moon, Seong Hoon Seo, Yeonhong Park, Jinkyu Jeong, Nam Sung Kim, Jae W. Lee:

MemSOS: OS-Guided Selective Memory Mirroring. 1-15 - Shuang Liang, Yuncheng Lu, Ce Guo, Paul H. J. Kelly, Wayne Luk, Hongxiang Fan:

Advancing Full-Stack Acceleration for SchröDinger-Style Quantum Simulation. 1-15 - Junguk Hong, Changmin Shin, Sukjin Kim, Si Ung Noh, Taehee Kwon, Seongyeon Park, Hanjun Kim, Youngsok Kim, Jinho Lee:

LoCaLUT: Harnessing Capacity-Computation Tradeoffs for LUT-Based Inference in DRAM-PIM. 1-16 - Joongun Park, Yongqin Wang, Huan Xu, Hanjiang Wu, Mengyuan Li, Tushar Krishna:

SCALE: Tackling Communication Bottlenecks in Confidential Distributed Machine Learning. 1-14 - Xinhua Chen, Jiangbin Dong, Hongren Zheng, Tian Tang, Mingyu Gao:

CROPHE: Cross-Operator Dataflow Optimization for Fully Homomorphic Encryption Accelerators. 1-14 - Baiqing Zhong, Zhirong Ye, Xiaojie Li, Peilin Wang, Haiqiu Huang, Zhaolin Li, Zhiyi Yu, Mingyu Wang:

LRM-GPU: Alleviating Synchronization Overhead for Multi-Chiplet GPU Architecture. 1-14 - Fangzhou Ye, Lingxiang Yin, Hao Zheng:

Scaling Graph Neural Network Training via Geometric Optimization. 1-15 - Alexander Knapen, Guanchen Tao, Jacob Mack, Tomas Bruno, Mehdi Saligane, Dennis Sylvester, Qirui Zhang, Gokul Subramanian Ravi:

Pinball: A Cryogenic Predecoder for Quantum Error Correction Decoding Under Circuit-Level Noise. 1-17 - Minh S. Q. Truong, Yiqiu Sun

, Dawei Xiong, Amol Shah, Alexander Glass, Abraham Farrell, James A. Bain, L. Richard Carley, Saugata Ghose:
The Memory Processing Unit: A Generalized Interface for End-to-End In-Memory Execution. 1-16 - Huizheng Wang, Hongbin Wang, Zichuan Wang, Zhiheng Yue, Yang Wang, Chao Li, Yang Hu, Shouyi Yin:

PADE: A Predictor-Free Sparse Attention Accelerator via Unified Execution and Stage Fusion. 1-19 - Lin Wang, Yuchong Hu, Ziling Duan, Mingqi Li, Chenxuan Yao, Feifan Liu, Xiaolu Li, Leihua Qin, Dan Feng:

SpotCC: Facilitating Coded Computation for Prediction Serving Systems on Spot Instances. 1-14 - Peter W. Deutsch, Harish Dattatraya Dixit, Gautham Vunnam, Carl Moran, Eleanor Ozer, Sriram Sankar:

PinDrop: Breaking the Silence on SDCs in a Large-Scale Fleet. 1-14 - Junseo Lee, Sangyun Jeon, Jungi Lee, Junyong Park, Jaewoong Sim:

GRTX: Efficient Ray Tracing for 3D Gaussian-Based Rendering. 1-14 - Zheng Xu, Dehao Kong, Jiaxin Liu, Dingcheng Jiang, Xu Dai, Jinyi Deng, Yang Hu, Shouyi Yin:

FACE: Fully Overlapped PD Scheduling and Multi-Level Architecture Co-Exploration on Wafer. 1-16 - Chuhao Xu, Zijun Li, Quan Chen, Han Zhao, Xueyan Tang, Minyi Guo:

Towards Resource-Efficient Serverless LLM Inference with SLINFER. 1-18 - Haoqi He, Zhiwei Wang, Lutan Zhao, Dian Jiao, Dan Meng, Rui Hou:

Peregrine: Accelerating TFHE Bootstrapping on GPUs via Multi-Level External Product Co-Design. 1-14 - David Schall, Mária Duracková, Boris Grot:

The Last-Level Branch Predictor Revisited. 1-16 - Qingyun Niu, Lutan Zhao, Ming Cai, Kai Li, Dan Meng, Rui Hou:

UniFHE: Faster Accelerator for FHE with Diverse Algebraic Structure and Balanced Memory System. 1-14 - Suhas K. Vittal, Moinuddin Qureshi:

BARD: Reducing Write Latency of DDR5 Memory by Exploiting Bank-Parallelism. 1-15 - Zifei Zhang, Yinan Xu, Sa Wang, Dan Tang, Yungang Bao:

TraceRTL: Agile Performance Evaluation for Microarchitecture Exploration. 1-15 - Benjamin F. Morris III, Tergel Molom-Ochir, Changchun Zhou, Yiran Chen, Alex K. Jones, Hai Li:

NP-CAM: Efficient and Scalable DNA Classification using a NoC-Partitioned CAM Architecture. 1-14 - Haomin Li, Yun Liang, Fangxin Liu, Bowen Zhu, Zongwu Wang, Yu Feng, Liqiang Lu, Li Jiang, Haibing Guan:

ORANGE: Exploring Ockham's Razor for Neural Rendering by Accelerating 3DGS on NPUs with GEMM-Friendly Blending and Balanced Workloads. 1-15 - Hyungyo Kim, Qirong Xia, Jinghan Huang, Nachuan Wang, Younjoo Lee, Jung Ho Ahn, Wajdi K. Feghali, Ren Wang, Nam Sung Kim:

LiLo: Harnessing the on-Chip Accelerators in Intel CPUs for Compressed LLM Inference Acceleration. 1-17 - Moinuddin K. Qureshi:

SALT: Track-and-Mitigate Subarrays, Not Rows, for Blast-Radius-Free Rowhammer Defense. 1-16 - Yang Zhong, Haoran Wu, Xueqi Li, Sa Wang, David Boland, Yungang Bao, Kan Shi:

TurboFuzz: FPGA Accelerated Hardware Fuzzing for Processor Agile Verification. 1-16 - Xinkai Wang, Chao Li, Yiming Zhuansun, Jinyang Guo, Xiaofeng Hou, Jing Wang, Luping Wang, Weigao Chen, Cheng Huang, Guodong Yang, Liping Zhang, Minyi Guo:

AUM: Unleashing the Efficiency Potential of Shared Processors with Accelerator Units for LLM Serving. 1-15 - Xiaotong Huang, He Zhu, Tianrui Ma, Yuxiang Xiong, Fangxin Liu, Zhezhi He, Yiming Gan, Zihan Liu, Jingwen Leng, Yu Feng, Minyi Guo:

SPLATONIC: Architectural Support for 3D Gaussian Splatting SLAM via Sparse Processing. 1-14 - Guangyang Deng, Zixiang Yu, Zhirong Shen, Qiangsheng Su, Zhinan Cheng, Jiwu Shu:

Pulse: Fine-Grained Hierarchical Hashing Index for Disaggregated Memory. 1-14 - Yanjing Wang, Lizhou Wu, Sunfeng Gao, Yibo Tang, Junhui Luo, Zicong Wang, Yang Ou, Dezun Dong, Nong Xiao, Mingche Lai:

Cohet: A CXL-Driven Coherent Heterogeneous Computing Framework with Hardware-Calibrated Full-System Simulation. 1-16 - Wenhao Huang, Zhaolin Duan, Laiping Zhao, Yuhao Zhang, Yanjie Wang, Yiming Li, Yihan Wang, Yichi Chen, Zhihang Tang, Kang Chen, Deze Zeng, Wenxin Li, Keqiu Li:

µShare: Non-Intrusive Kernel Co-Locating on NVIDIA GPUs. 1-14 - Hanyu Zhang, Fangxu Guo, Liqiang Lu, Long Wang, Yunfei Du, Zhe Wang, Jinghan Zhang, Jie Zhang, Chenli Xue, Chengpeng Wu, Ziyi Zhang, Yun Liang, Size Zheng, Jianwei Yin:

TENET-v2: Applying Relation-Centric Notation to Model and Optimize Data Swizzle in the Cache of Modern NPU. 1-15 - Anbang Wu, Liqiang Lu, Jianwei Yin, Jingwen Leng, Minyi Guo:

CLINE: Improving Control Flow Compilation of Quantum Programs with Control Line Encoding. 1-13 - Taishu Sheng, Guangyu Sun, Dezun Dong:

COMET: Communication and Memory Co-Design for Fine-Grained AI Inference in MCM Accelerators. 1-14 - Zhantong Qiu, Mahyar Samani, Jason Lowe-Power:

Nugget: Portable Program Snippets. 1-17 - Carlos Escuin, Paolo Salvatore Galfano, Davide Basilio Bartolini, Leeor Peled, Mehdi Alipour:

Tempranillo: Non-Speculative Early Register Release. 1-17 - Ming Wang, Ang Li, Frank Mueller:

Fully Parallelized BP Decoding for Quantum LDPC Codes Can Outperform BP-OSD. 1-14 - Xu Jiang, Xueliang Wei, Yifei Qu, Dan Feng, Yulai Xie, Wei Tong:

Secret Caching Sauce for High-Performance Secure Memory. 1-14 - Sehyeon Kim, Minkwan Kim, Chanho Park

, Hanmok Park, Seonghoon Kim, Taigon Song, William J. Song
:
NPUWattch: ML-Based Power, Area, and Timing Modeling for Neural Accelerators. 1-14 - Anatole Lefort

, David Schall, Nicolò Carpentieri, Julian Pritzi, Soham Chakraborty, Nicolai Oswald, Pramod Bhatotia:
C³: CXL Coherence Controllers for Heterogeneous Architectures. 1-17 - Chang Liu, Hongpei Zheng, Xin Zhang, Dapeng Ju, Dongsheng Wang, Yinqian Zhang, Trevor E. Carlson:

SSBleed: Non-Speculative Side-Channel Attacks via Speculative Store Bypass on Armv9 CPUs. 1-15 - Fuyu Wang, Minghua Shen, Yufei Ding, Nong Xiao, Yutong Lu:

SFD: Towards Segment Fusion Dataflow for Spatial Accelerators. 1-14 - Nicholas Mosier, Hamed Nemati, John C. Mitchell, Caroline Trippel:

Protean: A Programmable Spectre Defense. 1-20 - Han Zhao, Weihao Cui, Zeshen Zhang, Wenhao Zhang, Jiangtong Li, Quan Chen, Pu Pang, Zijun Li, Zhenhua Han, Yuqing Yang, Minyi Guo:

LEGO: Supporting LLM-Enhanced Games with One Gaming GPU. 1-14 - Xinru Tang, Jingxiang Hou, Dingcheng Jiang, Taiquan Wei, Jiaxin Liu, Jinyi Deng, Huizheng Wang, Qize Yang, Haoran Shang, Chao Li, Yang Hu, Shouyi Yin:

MoEntwine: Unleashing the Potential of Wafer-Scale Chips for Large-Scale Expert Parallel Inference. 1-15 - Zehao Chen, Zhaoyan Shen, Qian Wei, Hang Lu, Lei Ju:

Conflux: A High-Performance Keyword Private Retrieval System for Dynamic Datasets. 1-14 - Yilan Zhu, Geng Yang, Xingyu Tian, Dilshan Kumarathunga, Liang Kong, Xianglong Deng, Shengyu Fan, Guang Fan, Guiming Shi, Lei Chen, Bo Zhang, Yisong Chang, Shoumeng Yan, Zhenman Fang, Mingzhe Zhang:

An Efficient and Scalable Hardware Architecture for Number Theoretic Transform on FPGA with Design Automation. 1-14 - Alhad Daftardar, Jianqiao Mo, Joey Ah-kiow, Benedikt Bünz, Siddharth Garg, Brandon Reagen:

zkPHIRE: A Programmable Accelerator for ZKPs over HIgh-degRee, Expressive Gates. 1-15 - Deanna Postles Dunn Berger, Alper Buyuktosunoglu, Craig R. Walters, Robert J. Sonnelitter, Hailey Nicholson, Ashraf ElSharif, Yamil Rivera, Avery Francois, Cédric Lichtenau, Jason Kohl:

Enterprise Class On-Chip Accelerator Integration. 1-15 - Xiaochuan Tang, Hao Qi, Jianbo Dong, Yinghao Yu, Zhennan Xue, Zhengyu Zhang, Daocheng Ying, Zheng Cao, Xiaoyi Lu:

eGPU: Production-Scale Elastic Sharing Over 10,000 GPUs. 1-14 - Sangjin Kim, Yuseon Chou, Byeongcheol Kim, Jungjun Oh, Hoi-Jun Yoo:

GyRot: Leveraging Hidden Synergy Between Rotation and Fine-Grained Group Quantization for Low-Bit LLM Inference. 1-15 - Yaoyun Zhou, Qian Wang:

HERO-Sign: Hierarchical Tuning and Efficient Compiler-Time GPU Optimizations for SPHINCS+ Signature Generation. 1-13 - Jiuchen Shi, Hang Zhang, Yixiao Wang, Quan Chen, Yizhou Shan, Kaihua Fu, Wei Wang, Minyi Guo:

ELORA: Efficient LoRA and KV Cache Management for Multi-LoRA LLM Serving. 1-14 - Jinwoo Park, John Kim:

N-DIPPER: A Distributed Inter-Die Peak Power Management Network for Nand Systems. 1-14 - Donghyuk Kim, Sejeong Yang, Wonjin Shin, Joo-Young Kim:

V-Rex: Real-Time Streaming Video LLM Acceleration via Dynamic KV Cache Retrieval. 1-14 - Wenjun Yu, Sitian Chen, Cheng Chen, Amelie Chi Zhou:

Near-Zero-Overhead Freshness for Recommendation Systems via Inference-Side Model Updates. 1-15 - Fan Li

, Qiufeng Li, Yanan Guo, Weidong Cao, Xin Xin:
ASPA: Reassigning DDR5 Parity Bandwidth. 1-14 - Junkyum Kim, Divya Mahajan:

VectorLiteRAG: Latency-Aware and Fine-Grained Resource Partitioning for Efficient RAG. 1-15 - Hongshi Tan, Yao Chen, Xinyu Chen, Qizhen Zhang, Cheng Chen, Weng-Fai Wong, Bingsheng He:

RidgeWalker: Perfectly Pipelined Graph Random Walks on FPGAs. 1-15 - Chen Zhang, Qijun Zhang, Zhuoshan Zhou, Yijia Diao, Haibo Wang, Zhe Zhou, Zhipeng Tu, Zhiyao Li, Guangyu Sun, Zhuoran Song, Zhigang Ji, Jingwen Leng, Minyi Guo:

Towards Compute-Aware In-Switch Computing for LLMs Tensor-Parallelism on Multi-GPU Systems. 1-15 - Qixuan Yu, David Wentzlaff:

Area Bloating and the Future of Specialization. 1-14 - Yuzhe Fu, Changchun Zhou, Hancheng Ye, Bowen Duan, Qiyu Huang, Chiyue Wei, Cong Guo, Hai Helen Li, Yiran Chen:

FractalCloud: A Fractal-Inspired Architecture for Efficient Large-Scale Point Cloud Processing. 1-15 - Yi Li, Tsun-Yu Yang, Zhaoyan Shen, Ming-Chang Yang, Bingzhe Li:

VeloxGNN: Efficient Out-of-Core GNN Training with Delayed Gradient Propagation. 1-16 - Nicolás Meseguer, Daoxuan Xu, Yifan Sun, Michael Pellauer, José L. Abellán, Manuel E. Acacio:

QuCo: Efficient and Flexible Hardware-Driven Automatic Configuration of Tile Transfers in GPUs. 1-14 - Jianming Tong, Tianhao Huang, Jingtian Dang, Leo de Castro, Anirudh Itagi, Anupam Golder, Asra Ali, Jeremy Kun, Jevin Jiang, Arvind, G. Edward Suh, Tushar Krishna:

Leveraging ASIC AI Chips for Homomorphic Encryption. 1-18 - Huizheng Wang, Taiquan Wei, Zichuan Wang, Dingcheng Jiang, Qize Yang, Jiaxin Liu, Jingxiang Hou, Chao Li, Jinyi Deng, Yang Hu, Shouyi Yin:

TEMP: A Memory Efficient Physical-Aware Tensor Partition-Mapping Framework on Wafer-Scale Chips. 1-18 - Chengran Li, Huizheng Wang, Jiaxin Liu, Jingyao Liu, Zhiheng Yue, Xia Li, Shenfei Jiang, Jinyi Deng, Yang Hu, Shouyi Yin:

ReThermal: Co-Design of Thermal-Aware Static and Dynamic Scheduling for LLM Training on Liquid-Cooled Wafer-Scale Chips. 1-15 - Hyucksung Kwon, Kyungmo Koo, Janghyeon Kim, Woongkyu Lee, Minjae Lee, Gyeonggeun Jung, Hyungdeok Lee, Yousub Jung, Jaehan Park, Yosub Song, Byeongsu Yang, Haerang Choi, Guhyun Kim, Jongsoon Won, Woojae Shin, Changhyun Kim, Gyeongcheol Shin, Yongkee Kwon, Ilkon Kim, Euicheol Lim, John Kim, Jungwook Choi:

PIMphony: Overcoming Bandwidth and Capacity Inefficiency in PIM-Based Long-Context LLM Inference System. 1-21 - Runze Wang, Qinggang Wang, Haifeng Liu, Long Zheng, Xiaofei Liao, Hai Jin, Jingling Xue:

Adaptive Draft Sequence Length: Enhancing Speculative Decoding Throughput on PIM-Enabled Systems. 1-15 - Zhen He, Yiqi Wang, Zhiheng Yue, Zihan Wu, Huiming Han, Shaojun Wei, Yang Hu, Fengbin Tu, Shouyi Yin:

HR-DCIM: High-Reliability Floating-Point Digital CIM Architecture With Unified Low-Cost Iterative Error Correction. 1-15 - Sangwoo Hwang, Donghun Lee, Jahyun Koo, Jaeha Kung:

GustavSNN: Unleashing the Power of Gustavson's Algorithm on SNN Acceleration with Column-Parallel Tick-Batch Dataflow. 1-14 - Jinyu Hu, Huizhang Luo, Hong Jiang, Marc Casas, Kenli Li, Chubo Liu:

Swift: High-Performance Sparse-Dense Matrix Multiplication on GPUs. 1-16 - Zhiqiang Chen, Wenwen Fu, Yongwen Wang, Hongwei Zhou:

A Deadlock-Free Bridge Module for Inter-Chiplet Cache-Coherent Communication in an Open Chiplet Ecosystem. 1-13 - Yuanyuan Wang

, Nana Tang, Yuyang Wang, Shu Pan, Dingding Yu, Zeyue Wang, Mou Sun, Kejie Fu, Fangyu Wang, Yunchuan Chen, Ning Sun, Fei Yang:
AutoHAAP: Automated Heterogeneity-Aware Asymmetric Partitioning for LLM Training. 1-17 - Huizheng Wang, Zichuan Wang, Hongbin Wang, Jingxiang Hou, Taiquan Wei, Chao Li, Yang Hu, Shouyi Yin:

WATOS: Efficient LLM Training Strategies and Architecture Co-Exploration for Wafer-Scale Chip. 1-19 - Hritvik Taneja, Ali Hajiabadi, Michele Marazzi, Kaveh Razavi, Moinuddin Qureshi:

MIRZA: Efficiently Mitigating Rowhammer with Randomization and ALERT. 1-13 - Sanghyun Kim

, Jinhyeok Oh, Taehun Kim, Gyutae Kim, Youngsok Kim, Jaehyun Hwang, Joonsung Kim:
SMTcheck: Accurate SMT Interference Prediction to Improve Scheduling Efficiency in Datacenters. 1-15 - Hyunkyun Shin, Seongtae Bang, Hyungwon Park, Daehoon Kim:

ARIADNE: Adaptive UVM Management for Efficient GPU Memory Oversubscription. 1-15 - Nika Mansouri-Ghiasi, Talu Güloglu

, Harun Mustafa
, Can Firtina, Konstantina Koliogeorgi, Konstantinos Kanellopoulos, Haiyu Mao, Rakesh Nadig, Mohammad Sadrosadati, Jisung Park, Onur Mutlu:
SAGe: A Lightweight Algorithm-Architecture Co-Design for Mitigating the Data Preparation Bottleneck in Large-Scale Genome Sequence Analysis. 1-23 - Chihun Song, Austin Antony Cruz, Michael Jaemin Kim, Minbok Wi, Gaohan Ye, Kyungsan Kim, Sangyeol Lee, Jung Ho Ahn, Nam Sung Kim:

ReScue: Reliable and Secure CXL Memory. 1-16 - Rahul Bera, Zhenrong Lang, Caroline Hengartner, Konstantinos Kanellopoulos, Rakesh Kumar, Mohammad Sadrosadati, Onur Mutlu:

Athena: Synergizing Data Prefetching and Off-Chip Prediction via Online Reinforcement Learning. 1-19 - Ziyu Huang, Yangjie Zhou, Zihan Liu, Xinhao Luo, Yijia Diao, Minyi Guo, Jidong Zhai, Yu Feng, Chen Zhang, Anbang Wu, Jingwen Leng:

FlashFuser: Expanding the Scale of Kernel Fusion for Compute-Intensive Operators via Inter-Core Connection. 1-14 - Zishen Wan, Che-Kai Liu, Jiayi Qian, Hanchen Yang, Arijit Raychowdhury, Tushar Krishna:

REASON: Accelerating Probabilistic Logical Reasoning for Scalable Neuro-Symbolic Intelligence. 1-16 - Seungkwan Kang, Seungjun Lee, Donghyun Gouk, Miryeong Kwon, Hyunkyu Choi, Junhyeok Jang, Sangwon Lee, Huiwon Choi, Jie Zhang, Wonil Choi, Mahmut Taylan Kandemir, Myoungsoo Jung:

AutoGNN: End-to-End Hardware-Driven Graph Preprocessing for Enhanced GNN Performance. 1-17 - Dongjae Lee, Bongjoon Hyun, Youngjin Kwon, Minsoo Rhu:

PIM-Malloc: A Fast and Scalable Dynamic Memory Allocator for Processing-In-Memory (PIM) Architectures. 1-17 - João Paulo C. de Lima, Benjamin F. Morris III, Asif Ali Khan, Jerónimo Castrillón, Alex K. Jones:

Count2Multiply: Reliable In-Memory High-Radix Counting. 1-15 - Pranati Majhi, Sabuj Laskar, Abdullah Muzahid, Eun Jung Kim:

Compression-Aware Gradient Splitting for Collective Communications in Distributed Training. 1-16 - Ben Chen, Kunlin Li, Shuwen Deng, Dongsheng Wang, Yun Chen:

DSAssassin: Cross-VM Side-Channel Attacks by Exploiting Intel Data Streaming Accelerator. 1-15 - Changheon Lee, Hyungseok Kim, Seungwoo Choi, Youngmin Kim, Won Woo Ro:

D'ArQ: A QOC Framework with Causality-Aware Grouping and Basis Selection. 1-13 - Shunchen Shi, Qijia Yang, Fan Yang, Yu Huang, Youwei Zhuo, Zhichun Li, Ninghui Sun, Xueqi Li:

CoCoTree: A Computation-Capable Architecture for Collective Communication in Scalable PIM. 1-16 - Daoxuan Xu, Ying Li, Yuwei Sun, Jie Ren, Yifan Sun:

HDPAT: Hierarchical Distributed Page Address Translation for Wafer-Scale GPUs. 1-12 - Kosuke Matsushima, Yasuyuki Okoshi, Masato Motomura, Daichi Fujiki

:
AQPIM: Breaking the PIM Capacity Wall for LLMs with in-Memory Activation Quantization. 1-17 - Yiquan Lin, Wenhai Lin, Yiquan Chen, Jiexiong Xu, Shishun Cai, Jiarong Ye, Zonghui Wang, Wenzhi Chen:

I-POP: Ignite Positive Prefetchers. 1-16 - Enhyeok Jang, Hyungseok Kim, Yongju Lee, Jaewon Kwon, Yipeng Huang, Won Woo Ro:

Toward Scalable Gate-Level Parallelism on Trapped-Ion Processors with Racetrack Electrodes. 1-17 - Chiyue Wei, Cong Guo, Junyao Zhang, Haoxuan Shan, Yifan Xu

, Ziyue Zhang
, Yudong Liu, Qinsi Wang, Changchun Zhou, Hai Helen Li, Yiran Chen:
Focus: A Streaming Concentration Architecture for Efficient Vision-Language Models. 1-18 - Yecheng Xue, Rui Yang, Zhiding Liang, Tongyang Li:

DC-MBQC: A Distributed Compilation Framework for Measurement-Based Quantum Computing. 1-14 - Rui Wen

, Zhifei Yue, Tianbo Liu, Xinkai Song, Jin Li, Di Huang, Jiaming Guo, Xing Hu, Zidong Du, Qi Guo, Tianshi Chen:
Cambricon-GS: An Accelerator for 3D Gaussian Splatting Training With Gaussian-Pixel Hybrid Parallelism. 1-14 - Xujiang Xiang

, Fengbin Tu:
VAR-Turbo: Unlocking the Potential of Visual Autoregressive Models Through Dual Redundancy. 1-16 - Theodoros Trochatos, Christopher Kang, Andrew Wang, Frederic T. Chong, Jakub Szefer:

TraceQ: Trace-Based Reconstruction of Quantum Circuit Dataflow in Surface-Code Fault-Tolerant Quantum Computing. 1-14 - Hamed Seyedroudbari, Alexandros Daglis:

Sassy: SmartNIC-Assisted Notification Delivery for μs-Scale RDMA Workloads. 1-14 - Dayou Du, Shijie Cao, Jianyi Cheng, Luo Mai, Ting Cao, Mao Yang:

BitDecoding: Unlocking Tensor Cores for Long-Context LLMs with Low-Bit KV Cache. 1-13 - Quang Duong

, Calvin Lin:
Streamlined on-Chip Temporal Prefetching. 1-15 - Zhezheng Ren, Chenao Yuan, Yuke Zhang, Shiyu Su:

A PN-Free Digital 3-SAT Accelerator Using Crossbar Architecture and Frequency-Controlled Counters. 1-14 - Jiin Kim, Byeongjun Shin, Jinha Chung, Minsoo Rhu:

The Cost of Dynamic Reasoning: Demystifying AI Agents and Test-Time Scaling from an AI Infrastructure Perspective. 1-16 - Matthew Joseph Adiletta, Gu-Yeon Wei, David Brooks:

RPU - A Reasoning Processing Unit. 1-17 - Anjunyi Fan, Xuejie Liu, Anji Liu, Qiuping Wu, Jiaqi Yang, Yuchao Qin, Guy Van den Broeck, Yitao Liang, Bonan Yan:

ESTroM: Element-Flow Architecture for Processing Sparse Tractable Probabilistic Models. 1-15 - Xingyu Liu, Jiawei Liang, Yipu Zhang, Linfeng Du, Chaofang Ma, Hui Yu, Jiang Xu, Wei Zhang:

DRACO: A Hardware-Efficient Robot Rigid Body Dynamics Accelerator with Precision-Aware Quantization Framework. 1-13 - Hongrui Guo, Tianrui Ma, Zidong Du, Mo Zou, Yifan Hao, Yongwei Zhao, Rui Zhang, Wei Li, Xing Hu, Zhiwei Xu, Qi Guo, Tianshi Chen:

Cambricon-CIM: Enabling Energy-Efficient and Error-Resilient Analog CIM Acceleration via Reformation of Coding Bases. 1-16 - Chenglin Wang, Shouxin Wang, Zhirong Shen, Lu Tang, Shuyue Zhou, Ronglong Wu, Min Zhou, Jialiang Yu, Yiming Zhang:

Predicting DRAM Failures at Scale: A Two-Stage Approach for Heterogeneous Systems. 1-14 - Julien Eudine, Chu Li, Zhuo Cheng, Renzo Andri, Can Firtina, Mohammad Sadrosadati, Nika Mansouri-Ghiasi, Konstantina Koliogeorgi, Anirban Nag, Arash Tavakkol, Haiyu Mao, Onur Mutlu, Shai Bergman, Ji Zhang:

GenPairX: A Hardware-Algorithm Co-Designed Accelerator for Paired-End Read Mapping. 1-16 - Hwayong Nam, Seungmin Baek, Jumin Kim, Michael Jaemin Kim, Jung Ho Ahn:

RoMe: Row Granularity Access Memory System for Large Language Models. 1-15 - Sahil Khan, Abhinav Anand, Kenneth R. Brown, Jonathan M. Baker:

Cyclone: Designing Efficient and Highly Parallel QCCD Architectural Codesigns for Fault Tolerant Quantum Memory. 1-14 - Zhixing Jiang, Justin Garrigus, Allison Seigler, Ethan Syed, Yan-Lun Huang, Mehdi Sadi, Tawfik Rahal-Arabi, Lizy Kurian John:

Exploration of LLM Workload Reliability Based on di/dt Effects and Voltage Droops. 1-15 - Jovan Stojkovic, Abraham Farrell, Zhangxiaowen Gong, Christopher J. Hughes, Josep Torrellas:

AccelFlow: Orchestrating an On-Package Ensemble of Fine-Grained Accelerators for Microservices. 1-17 - Eunyeong Cho, Jehyeon Bang, Ranggi Hwang, Minsoo Rhu:

PASCAL: A Phase-Aware Scheduling Algorithm for Serving Reasoning-based Large Language Models. 1-16 - Gan Fang, Jianping Zeng, Yuchen Zhou, Changhee Jung:

Intermittence-Aware Cache Compression. 1-17 - Burak Ocalan, Chloe Alverti, Shashwat Jaiswal, Antonis Psistakis, David A. Koufaty, Suyash Mahar, Steven Swanson, Josep Torrellas:

PhasedStore: Supporting High-Performance Write-Through Cache-Coherence Protocols Under TSO. 1-14 - Jingwei Cai, Dehao Kong, Hantao Huang, Zishan Jiang, Zixuan Ma, Qingyu Guo, Zhenxing Zhang, Guiming Shi, Mingyu Gao, Kaisheng Ma, Minghui Yu:

Characterizing Cloud-Native LLM Inference at Bytedance and Exposing Optimization Challenges and Opportunities for Future AI Accelerators. 1-19 - Sangpyo Kim, Hyesung Ji, Jongmin Kim

, Wonseok Choi, Jaiyoung Park, Jung Ho Ahn:
IVE: An Accelerator for Single-Server Private Information Retrieval Using Versatile Processing Elements. 1-15 - Rohan Basu Roy, Devesh Tiwari:

LowCarb: Carbon-Aware Scheduling of Serverless Functions. 1-16 - Anshu Gupta, Yingqi Cao, Jason Liang, Yatish Turakhia:

DP-HLS: A High-Level Synthesis Framework for Accelerating Dynamic Programming Algorithms in Bioinformatics. 1-17 - Haocheng Lian, Qiyue Zhang, Xinran Zhao, Meichen Dong, Yijie Nie, Zhengyi Zhao, Junzhong Shen, Wei Guo, Chun Huang, Bingcai Sui, Weifeng Liu:

Uni-STC: Unified Sparse Tensor Core. 1-18 - Rakesh Nadig, Vamanan Arulchelvan, Mayank Kabra, Harshita Gupta, Rahul Bera, Nika Mansouri-Ghiasi, Nanditha Rao, Qingcai Jiang, Andreas Kosmas Kakolyris, Yu Liang, Mohammad Sadrosadati, Onur Mutlu:

Conduit: Programmer-Transparent Near-Data Processing Using Multiple Compute-Capable Resources in Solid State Drives. 1-20

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.


Google
Google Scholar
Semantic Scholar
Internet Archive Scholar
CiteSeerX
ORCID














