


default search action
31st HPCA 2025: Las Vegas, NV, USA
- IEEE International Symposium on High Performance Computer Architecture, HPCA 2025, Las Vegas, NV, USA, March 1-5, 2025. IEEE 2025, ISBN 979-8-3315-0647-6
- Varun Gohil, Sundar Dev, Gaurang Upasani, David Lo, Parthasarathy Ranganathan, Christina Delimitrou:
The Importance of Generalizability in Machine Learning for Systems. 1 - Odysseas Chatzopoulos, Nikos Karystinos, George Papadimitriou, Dimitris Gizopoulos, Harish Dattatraya Dixit, Sriram Sankar:
Veritas - Demystifying Silent Data Corruptions: μArch-Level Modeling and Fleet Data of Modern x86 CPUs. 1-14 - Yuhui Cai, Shiyao Lin, Zhirong Shen, Jiahui Yang, Jiwu Shu:
ChameleonEC: Exploiting Tunability of Erasure Coding for Low-Interference Repair. 15-28 - Peng Jiang, Hanlin Jiang, Ruizhe Huang, Hanwen Lei, Zhineng Zhong, Shaokun Zhang, Yuxin Ren, Ning Jia, Xinwei Hu, Yao Guo, Xiangqun Chen, Ding Li:
DPUaudit: DPU-assisted Pull-based Architecture for Near-Zero Cost System Auditing. 29-43 - Anirudh Seshadri, Eric Rotenberg:
Delinquent Loop Pre-execution Using Predicated Helper Threads. 44-58 - Karl H. Mose, Sebastian S. Kim, Alberto Ros, Timothy M. Jones, Robert D. Mullins:
Mascot: Predicting Memory Dependencies and Opportunities for Speculative Memory Bypassing. 59-71 - Pierre Ravenel, Arthur Perais, Benoît Dupont de Dinechin, Frédéric Pétrot:
Architecting Value Prediction around In-Order Execution. 72-84 - Devrath Iyer, Sara Achour:
Efficient Optimization with Encoded Ising Models. 85-98 - Siddhartha Raman Sundara Raman, Lizy Kurian John, Jaydeep P. Kulkarni:
SPARK: Sparsity Aware, Low Area, Energy-Efficient, Near-memory Architecture for Accelerating Linear Programming Problems. 99-112 - Zhengbang Yang, Lutan Zhao, Peinan Li, Han Liu, Kai Li, Boyan Zhao, Dan Meng, Rui Hou:
LegoZK: A Dynamically Reconfigurable Accelerator for Zero-Knowledge Proof. 113-126 - Wan-Hsuan Lin, Daniel Bochen Tan, Jason Cong:
Reuse-Aware Compilation for Zoned Quantum Architectures Based on Neutral Atoms. 127-142 - Yuhao Liu, Kevin Yao, Jonathan Hong, Julien Froustey, Ermal Rrapaj, Costin Iancu, Gushu Li, Yunong Shi:
HATT: Hamiltonian Adaptive Ternary Tree for Optimizing Fermion-to-Qubit Mapping. 143-157 - Ji Liu, Alvin Gonzales, Benchen Huang, Zain Hamid Saleem, Paul D. Hovland:
QuCLEAR: Clifford Extraction and Absorption for Quantum Circuit Optimization. 158-172 - Zixiao Chen, Chentao Wu, Yunfei Gu, Ranhao Jia, Jie Li, Minyi Guo:
Gaze into the Pattern: Characterizing Spatial Patterns with Internal Temporal Correlations for Hardware Prefetching. 173-187 - Georgios Vavouliotis, Martí Torrents, Boris Grot, Kleovoulos Kalaitzidis, Leeor Peled, Marc Casas:
To Cross, or Not to Cross Pages for Prefetching? 188-203 - Mengming Li, Qijun Zhang, Yongqing Ren, Zhiyao Xie:
Integrating Prefetcher Selection with Dynamic Request Allocation Improves Prefetching Efficiency. 204-216 - Junseo Lee, Jaisung Kim, Junyong Park, Jaewoong Sim:
VR-Pipe: Streamlining Hardware Graphics Pipeline for Volume Rendering. 217-230 - Raúl Taranco, José-María Arnau, Antonio González:
IRIS: Unleashing ISP-Software Cooperation to Optimize the Machine Vision Pipeline. 231-245 - Chaojian Li, Sixu Li, Linrui Jiang, Jingqun Zhang, Yingyan Celine Lin:
Uni-Render: A Unified Accelerator for Real-Time Rendering Across Diverse Neural Renderers. 246-260 - Joshua Viszlai, Sophia Fuhui Lin, Siddharth Dangwal, Conor Bradley, Vikram Ramesh, Jonathan M. Baker, Hannes Bernien, Frederic T. Chong:
Interleaved Logical Qubits in Atom Arrays. 261-274 - Debin Xiang
, Qifan Jiang
, Liqiang Lu, Siwei Tan, Jianwei Yin:
Choco-Q: Commute Hamiltonian-based QAOA for Constrained Binary Optimization. 275-289 - Xian Wu, Chenghong Zhu, Jingbo Wang, Xin Wang
:
BOSS: Blocking algorithm for optimizing shuttling scheduling in Ion Trap. 290-303 - Takumi Kobori, Yasunari Suzuki, Yosuke Ueno, Teruo Tanimoto, Synge Todo, Yuuki Tokunaga:
LSQCA: Resource-Efficient Load/Store Architecture for Limited-Scale Fault-Tolerant Quantum Computing. 304-320 - Lieven Eeckhout:
R.I.P. Geomean Speedup Use Equal-Work (Or Equal-Time) Harmonic Mean Speedup Instead. 322 - Minsik Cho, Keivan Alizadeh-Vahid, Qichen Fu, Saurabh Adya, Carlo C. del Mundo, Mohammad Rastegari, Devang Naik, Peter Zatloukal:
eDKM: An Efficient and Accurate Train-Time Weight Clustering for Large Language Models. 323 - Jaehoon Heo, Adiwena Putra, Jieon Yoon, Sungwoong Yune, Hangyeol Lee, Ji-Hoon Kim, Joo-Young Kim:
EXION: Exploiting Inter-and Intra-Iteration Output Sparsity for Diffusion Models. 324-337 - Sungbin Kim, Hyunwuk Lee, Wonho Cho, Mincheol Park, Won Woo Ro:
Ditto: Accelerating Diffusion Model via Temporal Value Similarity. 338-352 - Zhifan Ye, Yonggan Fu, Jingqun Zhang, Leshu Li, Yongan Zhang, Sixu Li, Cheng Wan, Chenxi Wan, Chaojian Li, Sreemanth Prathipati, Yingyan Celine Lin:
Gaussian Blending Unit: An Edge GPU Plug-in for Real-Time Gaussian-Based Rendering in AR/VR. 353-365 - Houshu He, Gang Li, Fangxin Liu, Li Jiang, Xiaoyao Liang, Zhuoran Song:
GSArch: Breaking Memory Barriers in 3D Gaussian Splatting Training via Architectural Support. 366-379 - Haojie Ye, Yuchen Xia, Yuhan Chen, Kuan-Yu Chen, Yichao Yuan, Shuwen Deng, Baris Kasikci, Trevor N. Mudge, Nishil Talati:
Palermo: Improving the Performance of Oblivious Memory using Protocol-Hardware Co-Design. 380-393 - Debpratim Adak, Huiyang Zhou, Eric Rotenberg, Amro Awad:
SpecMPK: Efficient In-Process Isolation with Speculative and Secure Permission Update Instruction. 394-408 - Hyosang Kim, Ki-Dong Kang, Gyeongseo Park, Seungkyu Lee, Daehoon Kim:
BrokenSleep: Remote Power Timing Attack Exploiting Processor Idle States. 409-422 - Muhammad Umar, Akhilesh Parag Marathe, Monami Dutta Gupta, Shubham Jogprakash Ghosh, G. Edward Suh, Wenjie Xiong:
Efficient Memory Side-Channel Protection for Embedding Generation in Machine Learning. 423-441 - Liren Zhu
, Liujia Li, Jianyu Wu, Yiming Yao, Zhan Shi, Jie Zhang, Zhenlin Wang, Xiaolin Wang, Yingwei Luo, Diyu Zhou:
Criticality-Aware Instruction-Centric Bandwidth Partitioning for Data Center Applications. 442-457 - Jovan Stojkovic, Chloe Alverti, Alan Andrade, Nikoleta Iliakopoulou, Hubertus Franke, Tianyin Xu, Josep Torrellas:
Concord: Rethinking Distributed Coherence for Software Caches in Serverless Environments. 458-473 - Liao Chen, Chenyu Lin, Shutian Luo, Huanle Xu, Chengzhong Xu:
Grad: Intelligent Microservice Scaling by Harnessing Resource Fungibility. 474-486 - Alireza Khadem, Daichi Fujiki, Hilbert Chen, Yufeng Gu, Nishil Talati, Scott A. Mahlke, Reetuparna Das:
Multi-Dimensional Vector ISA Extension for Mobile In-Cache Computing. 487-503 - Zhen He, Yiqi Wang, Zihan Wu, Shaojun Wei, Yang Hu, Fengbin Tu, Shouyi Yin:
ER-DCIM: Error-Resilient Digital CIM Architecture with Run-Time MAC-Cell Error Correction. 504-517 - Liyan Chen, Dongxu Lyu, Jianfei Jiang, Qin Wang, Zhigang Mao, Naifeng Jing:
AsyncDIMM: Achieving Asynchronous Execution in DIMM-Based Near-Memory Processing. 518-532 - Jingwei Cai, Xuan Wang, Mingyu Gao, Sen Peng, Zijian Zhu, Yuchen Wei, Zuotong Wu, Kaisheng Ma:
SoMa: Identifying, Exploring, and Understanding the DRAM Communication Scheduling Space for DNN Accelerators. 533-548 - Zhiyao Li, Bohan Yang, Jiaxiang Li, Taijie Chen, Xintong Li, Mingyu Gao:
Adyna: Accelerating Dynamic Neural Networks with Adaptive Scheduling. 549-562 - Bo Ren Pao, I-Chia Chen, En-Hao Chang, Tsung Tai Yeh:
EDA: Energy-Efficient Inter-Layer Model Compilation for Edge DNN Inference Acceleration. 563-576 - Haoyang Zhang, Yuqi Xue, Yirui Eric Zhou, Shaobo Li, Jian Huang:
SkyByte: Architecting an Efficient Memory-Semantic CXL-based SSD with OS and Hardware Co-design. 577-593 - Tianyang Jiang, Guangyan Zhang, Xiaojian Liao, Yuqi Zhou:
Zebra: Efficient Redundant Array of Zoned Namespace SSDs Enabled by Zone Random Write Area (ZRWA). 594-607 - Yingjia Wang, Tao Lu, Yuhong Liang, Xiang Chen, Ming-Chang Yang:
Reviving In-Storage Hardware Compression on ZNS SSDs through Host-SSD Collaboration. 608-623 - Tongxin Xie, Zhenhua Zhu, Bing Li, Yukai He, Cong Li, Guangyu Sun, Huazhong Yang, Yuan Xie, Yu Wang:
UniNDP: A Unified Compilation and Simulation Tool for Near DRAM Processing Architectures. 624-640 - Changmin Shin, Jaeyong Song, Hongsun Jang, Dogeun Kim, Jun Sung, Taehee Kwon, Jae Hyung Ju, Frank Liu, YeonKyu Choi, Jinho Lee:
Piccolo: Large-Scale Graph Processing with Fine-Grained in-Memory Scatter-Gather. 641-656 - Siling Yang, Shuibing He, Wenjiong Wang, Yanlong Yin, Tong Wu, Weijian Chen, Xuechen Zhang, Xian-He Sun, Dan Feng:
GoPIM: GCN-Oriented Pipeline Optimization for PIM Accelerators. 657-670 - Guoyu Li, Shengyu Ye, Chunyun Chen, Yang Wang, Fan Yang, Ting Cao, Cheng Liu, Mohamed M. Sabry Aly, Mao Yang:
LUT-DLA: Lookup Table as Efficient Extreme Low-Bit Deep Learning Accelerator. 671-684 - Qizhe Wu, Huawen Liang, Yuchen Gui, Zhichen Zeng, Zerong He, Linfeng Tao, Xiaotian Wang, Letian Zhao, Zhaoxi Zeng, Wei Yuan, Wei Wu, Xi Jin:
Exploring the Performance Improvement of Tensor Processing Engines through Transformation in the Bit-weight Dimension of MACs. 685-700 - Dongyun Kam, Myeongji Yun, Sunwoo Yoo, Seungwoo Hong, Zhengya Zhang, Youngjoo Lee:
Panacea: Novel DNN Accelerator using Accuracy-Preserving Asymmetric Quantization and Energy-Saving Bit-Slice Sparsity. 701-715 - Kan Zhu, Yilong Zhao, Yufei Gao, Peter Braun, Tanvir Ahmed Khan, Heiner Litz, Baris Kasikci, Shuwen Deng:
From Optimal to Practical: Efficient Micro-op Cache Replacement Policies for Data Center Applications. 716-731 - Gan Fang, Changhee Jung:
Rethinking Dead Block Prediction for Intermittent Computing. 732-744 - Maryam Babaie, Ayaz Akram, Wendy Elsasser, Brent Haukness, Michael R. Miller, Taeksang Song, Thomas Vogelsang, Steven C. Woo, Jason Lowe-Power:
Efficient Caching with A Tag-enhanced DRAM. 745-760 - Yihan Fu, Anjunyi Fan, Wenshuo Yue, Hongxiao Zhao, Daijing Shi, Qiuping Wu, Jiayi Li, Xiangyu Zhang, Yaoyu Tao, Yuchao Yang, Bonan Yan:
PROCA: Programmable Probabilistic Processing Unit Architecture with Accept/Reject Prediction & Multicore Pipelining for Causal Inference. 761-774 - Zishen Wan, Hanchen Yang, Ritik Raj
, Che-Kai Liu, Ananda Samajdar, Arijit Raychowdhury, Tushar Krishna:
CogSys: Efficient and Scalable Neurosymbolic Cognition System via Algorithm-Hardware Co-Design. 775-789 - Ziming Yuan, Lei Dai, Wen Li, Jie Zhang, Shengwen Liang, Ying Wang, Cheng Liu, Huawei Li, Xiaowei Li, Jiafeng Guo, Peng Wang, Renhai Chen, Gong Zhang:
NeuVSA: A Unified and Efficient Accelerator for Neural Vector Search. 790-805 - Chiyue Wei, Cong Guo, Feng Cheng, Shiyu Li, Hao Frank Yang, Hai Helen Li, Yiran Chen:
Prosperity: Accelerating Spiking Neural Networks via Product Sparsity. 806-820 - Insu Choi, Young-Seo Yoon, Joon-Sung Yang:
Bit-slice Architecture for DNN Acceleration with Slice-level Sparsity Enhancement and Exploitation. 821-835 - Zhenyu Wu
, Maolin Wang, Hayden Kwok-Hay So:
A Hardware-Software Design Framework for SpMV Acceleration with Flexible Access Pattern Portfolio. 836-848 - Ataberk Olgun, F. Nisa Bostanci, Ismail Emir Yüksel, Oguzhan Canpolat, Haocong Luo, Geraldo F. Oliveira, A. Giray Yaglikçi, Minesh Patel, Onur Mutlu:
Variable Read Disturbance: An Experimental Analysis of Temporal Variation in DRAM Read Disturbance. 849-866 - Yahya Can Tugrul, A. Giray Yaglikçi, Ismail Emir Yüksel, Ataberk Olgun, Oguzhan Canpolat, Nisa Bostanci, Mohammad Sadrosadati, Oguz Ergin, Onur Mutlu:
Understanding RowHammer Under Reduced Refresh Latency: Experimental Analysis of Real DRAM Chips and Implications on Future Solutions. 867-886 - Oguzhan Canpolat, A. Giray Yaglikçi, Geraldo F. Oliveira, Ataberk Olgun, Nisa Bostanci, Ismail Emir Yuksel, Haocong Luo, Oguz Ergin, Onur Mutlu:
Chronus: Understanding and Securing the Cutting-Edge Industry Solutions to DRAM Read Disturbance. 887-905 - Marjan Fariborz, Mahyar Samani, Austin York, S. J. Ben Yoo, Jason Lowe-Power, Venkatesh Akella:
NOVA: A Novel Vertex Management Architecture for Scalable Graph Processing. 906-919 - Wenju Zhao, Pengcheng Yao, Dan Chen, Long Zheng, Xiaofei Liao, Qinggang Wang, Shaobo Ma, Yu Li, Haifeng Liu, Wenjing Xiao, Yufei Sun, Bing Zhu, Hai Jin, Jingling Xue:
MeHyper: Accelerating Hypergraph Neural Networks by Exploring Implicit Dataflows. 920-933 - Zhifei Yue
, Xinkai Song, Tianbo Liu, Xing Hu, Rui Zhang, Zidong Du, Wei Li, Qi Guo, Tianshi Chen:
Cambricon-DG: An Accelerator for Redundant-Free Dynamic Graph Neural Networks Based on Nonlinear Isolation. 934-948 - Jun Liu, Shulin Zeng, Junbo Zhao, Li Ding, Zeyu Wang, Jinhao Li, Zhenhua Zhu, Xuefei Ning, Chen Zhang, Yu Wang, Guohao Dai:
TB-STC: Transposable Block-wise N: M Structured Sparse Tensor Core. 949-962 - Fangxin Liu, Shiyuan Huang, Ning Yang, Zongwu Wang, Haomin Li, Li Jiang:
CROSS: Compiler-Driven Optimization of Sparse DNNs Using Sparse/Dense Computation Kernels. 963-976 - Jiaqi Zhai, Xuanhua Shi, Kaiyi Huang, Chencheng Ye, Weifang Hu, Bingsheng He, Hai Jin:
AccelES: Accelerating Top-K SpMV for Embedding Similarity via Low-bit Pruning. 977-990 - Moinuddin Qureshi:
AutoRFM: Scaling Low-Cost in-DRAM Trackers to Ultra-Low Rowhammer Thresholds. 991-1004 - Jeonghyun Woo, Prashant J. Nair:
DAPPER: A Performance-Attack-Resilient Tracker for RowHammer Defense. 1005-1020 - Jeonghyun Woo, Shaopeng Chris Lin, Prashant J. Nair, Aamer Jaleel, Gururaj Saileshwar:
QPRAC: Towards Secure and Practical PRAC-based Rowhammer Mitigation using Priority Queues. 1021-1037 - Jiaqi Yang, Hao Zheng, Ahmed Louri:
I-DGNN: A Graph Dissimilarity-based Framework for Designing Scalable and Efficient DGNN Accelerators. 1038-1051 - Jingji Chen, Zhuoming Chen, Xuehai Qian:
Mithril: A Scalable System for Deep GNN Training. 1052-1065 - Shuangyan Yang, Minjia Zhang, Dong Li:
Buffalo: Enabling Large-Scale GNN Training via Memory-Efficient Bucketization. 1066-1081 - Yuzong Chen, Ahmed F. AbouElhamayed, Xilai Dai, Yang Wang, Marta Andronic, George A. Constantinides, Mohamed S. Abdelfattah:
BitMoD: Bit-serial Mixture-of-Datatype LLM Acceleration. 1082-1097 - Gunho Park, Hyeokjun Kwon, Jiwoo Kim, Jeongin Bae, Baeseong Park, Dongsoo Lee, Youngjoo Lee:
FIGLUT: An Energy-Efficient Accelerator Design for FP-INT GEMM Using Look-Up Tables. 1098-1111 - Weiming Hu, Haoyan Zhang
, Cong Guo, Yu Feng, Renyang Guan, Zhendong Hua, Zihan Liu, Yue Guan, Minyi Guo, Jingwen Leng:
M-ANT: Efficient Low-bit Group Quantization for LLMs via Mathematically Adaptive Numerical Type. 1112-1126 - Yongmo Park, Aporva Amarnath, Subhankar Pal, Karthik Swaminathan, Alper Buyuktosunoglu, Hayim Shaul, Ehud Aharoni, Nir Drucker, Wei D. Lu, Omri Soceanu, Pradip Bose:
FHENDI: A Near-DRAM Accelerator for Compiler-Generated Fully Homomorphic Encryption Applications. 1127-1142 - Yi Huang, Xinsheng Gong, Xiangyu Kong, Dibei Chen, Jianfeng Zhu, Wenping Zhu, Liangwei Li, Mingyu Gao, Shaojun Wei, Aoyang Zhang, Leibo Liu:
EFFACT: A Highly Efficient Full-Stack FHE Acceleration Platform. 1143-1157 - Jongmin Kim, Sungmin Yun, Hyesung Ji, Wonseok Choi, Sangpyo Kim, Jung Ho Ahn:
Anaheim: Architecture and Algorithms for Processing Fully Homomorphic Encryption in Memory. 1158-1173 - Yinghao Yang, Xicheng Xu, Haibin Zhang, Jie Song, Xin Tang, Hang Lu, Xiaowei Li:
Hydra: Scale-out FHE Accelerator Architecture for Secure Deep Learning on FPGA. 1174-1186 - Guang Fan, Mingzhe Zhang, Fangyu Zheng, Shengyu Fan, Tian Zhou, Xianglong Deng, Wenxu Tang, Liang Kong, Yixuan Song, Shoumeng Yan:
WarpDrive: GPU-Based Fully Homomorphic Encryption Acceleration Leveraging Tensor and CUDA Cores. 1187-1200 - Arya Tschand, Arun Tejusve Raghunath Rajan, Sachin Idgunji, Anirban Ghosh, Jeremy Holleman, Csaba Király, Pawan Ambalkar, Ritika Borkar, Ramesh Chukka, Trevor Cockrell, Oliver Curtis, Grigori Fursin, Miro Hodak, Hiwot Kassa, Anton Lokhmotov, Dejan Miskovic, Yuechao Pan, Manu Prasad Manmathan, Liz Raymond, Tom St. John, Arjun Suresh, Rowan Taubitz, Sean Zhan, Scott Wasson, David Kanter, Vijay Janapa Reddi:
MLPerf Power: Benchmarking the Energy Efficiency of Machine Learning Systems from μWatts to MWatts for Sustainable AI. 1201-1216 - Craig R. Walters, Deanna Postles Dunn Berger, Robert J. Sonnelitter, Alper Buyuktosunoglu:
Enterprise Class Modular Cache Hierarchy. 1217-1230 - Yaoguang Yong, Xiaoming Du, Xuhua Ma, Yuxiang Wang, Bin Yao, Xudong Zheng, Huite Yi:
Predicting DRAM-Caused Risky VMs in Large-Scale Clouds. 1231-1245 - Jianbo Dong, Bin Luo, Jun Zhang, Pengcheng Zhang, Fei Feng, Yikai Zhu, Ang Liu, Zian Chen, Yi Shi, Hairong Jiao, Gang Lu, Yu Guan, Ennan Zhai, Wencong Xiao, Hanyu Zhao, Man Yuan, Siran Yang, Xiang Li, Jiamang Wang, Rui Men, Jianwei Zhang, Chang Zhou, Dennis Cai, Yuan Xie, Binzhang Fu:
Enhancing Large-Scale AI Training Efficiency: The C4 Solution for Real-Time Anomaly Detection and Communication Optimization. 1246-1258 - Apostolos Kokolis, Michael Kuchnik, John Hoffman, Adithya Kumar, Parth Malani, Faye Ma, Zachary DeVito, Shubho Sengupta, Kalyan Saladi, Carole-Jean Wu:
Revisiting Reliability in Large-Scale Machine Learning Research Clusters. 1259-1274 - Joseph Rogers, Lieven Eeckhout, Magnus Jahre:
HILP: Accounting for Workload-Level Parallelism in System-on-Chip Design Space Exploration. 1275-1288 - Mariam Elgamal, Doug Carmean, Elnaz Ansari, Okay Zed, Ramesh Peri, Srilatha Manne, Udit Gupta, Gu-Yeon Wei, David Brooks, Gage Hills, Carole-Jean Wu:
CORDOBA: Carbon-Efficient Optimization Framework for Computing Systems. 1289-1303 - Nathan Bleier, Rick Eason, Michael Lembeck, Rakesh Kumar:
Architecting Space Microdatacenters: A System-level Approach. 1304-1319 - Subhankar Pal, Aporva Amarnath, Behzad Boroujerdian, Augusto Vega, Alper Buyuktosunoglu, John-David Wellman, Vijay Janapa Reddi, Pradip Bose:
ARTEMIS: Agile Discovery of Efficient Real-Time Systems-on-Chips in the Heterogeneous Era. 1320-1334 - Yujun Lin, Zhekai Zhang, Song Han:
LEGO: Spatial Accelerator Generation and Optimization for Tensor Applications. 1335-1347 - Jovan Stojkovic, Chaojie Zhang, Íñigo Goiri, Josep Torrellas, Esha Choukse:
DynamoLLM: Designing LLM Inference Clusters for Performance and Energy Efficiency. 1348-1362 - Andreas Kosmas Kakolyris, Dimosthenis Masouros, Petros Vavaroutsos, Sotirios Xydis, Dimitrios Soudris:
throttLL'eM: Predictive GPU Throttling for Energy Efficient LLM Inference Serving. 1363-1378 - Jie Zhang, Hongjing Huang, Xuzheng Chen, Xiang Li, Jieru Zhao, Ming Liu, Zeke Wang:
RpcNIC: Enabling Efficient Datacenter RPC Offloading on PCIe-attached SmartNICs. 1379-1394 - Yiquan Chen, Zhen Jin, Yijing Wang, Yi Chen, Jiexiong Xu, Hao Yu, Jinlong Chen, Wenhai Lin, Kanghua Fang, Keyao Zhang, Chengkun Wei, Qiang Liu, Yuan Xie, Wenzhi Chen:
NVMePass: A Lightweight, High-performance and Scalable NVMe Virtualization Architecture with I/O Queues Passthrough. 1395-1407 - Eunbi Jeong, Ipoom Jeong, Myung Kuk Yoon, Nam Sung Kim:
Warped-Compaction: Maximizing GPU Register File Bandwidth Utilization via Operand Compaction. 1408-1421 - Abubakr Nada, Giuseppe Maria Sarda
, Erwan Lenormand:
Cooperative Warp Execution in Tensor Core for RISC-V GPGPU. 1422-1436 - Shinnung Jeong, Liam Paul Coopert, Ju Min Lee, Heelim Choi, Nicholas Parnenzini, Chihyo Ahn, Yongwoo Lee, Hanjun Kim, Hyesoon Kim:
SparseWeaver: Converting Sparse Operations as Dense Operations on GPUs for Graph Workloads. 1437-1451 - Min Wu, Huizhang Luo, Fenfang Li, Yiran Zhang, Zhuo Tang, Kenli Li, Jeff Zhang, Chubo Liu:
HSMU-SpGEMM: Achieving High Shared Memory Utilization for Parallel Sparse General Matrix-Matrix Multiplication on Modern GPUs. 1452-1466 - Chao Fang, Man Shi, Robin Geens, Arne Symons, Zhongfeng Wang, Marian Verhelst:
Anda: Unlocking Efficient LLM Inference with a Variable-Length Grouped Activation Data Format. 1467-1481 - Haoran Wang
, Yuming Li, Haobo Xu, Ying Wang, Liqi Liu, Jun Yang, Yinhe Han:
LAD: Efficient Accelerator for Generative Inference of LLM with Locality Aware Decoding. 1482-1495 - Zihan Liu, Xinhao Luo, Junxian Guo, Wentao Ni, Yangjie Zhou, Yue Guan, Cong Guo, Weihao Cui, Yu Feng, Minyi Guo, Yuhao Zhu, Minjia Zhang, Chen Jin, Jingwen Leng:
VQ-LLM: High-performance Code Generation for Vector Quantization Augmented LLM Inference. 1496-1509 - Xiurui Pan, Endian Li, Qiao Li, Shengwen Liang, Yizhou Shan, Ke Zhou, Yingwei Luo, Xiaolin Wang, Jie Zhang:
InstAttention: In-Storage Attention Offloading for Cost-Effective Long-Context LLM Inference. 1510-1525 - Dongkyun Lim, John Kim:
TidalMesh: Topology-Driven AllReduce Collective Communication for Mesh Topology. 1526-1540 - Jiayi Huang, Yanhua Chen, Zhe Wang, Christopher J. Hughes, Yufei Ding, Yuan Xie:
Push Multicast: A Speculative and Coherent Interconnect for Mitigating Manycore CPU Communication Bottleneck. 1541-1556 - Hyojun Son, Gilbert Jonatan, Xiangyu Wu, Haeyoon Cho, Kaustubh Shivdikar, José L. Abellán, Ajay Joshi, David R. Kaeli, John Kim:
PIMnet: A Domain-Specific Network for Efficient Collective Communication in Scalable PIM. 1557-1572 - Siyao Jia, Bo Jiao, Haozhe Zhu, Chixiao Chen, Qi Liu, Ming Liu:
EIGEN: Enabling Efficient 3DIC Interconnect with Heterogeneous Dual-Layer Network-on-Active-Interposer. 1573-1587 - Yu Liang, Aofeng Shen, Chun Jason Xue, Riwei Pan, Haiyu Mao, Nika Mansouri-Ghiasi, Qingcai Jiang, Rakesh Nadig, Lei Li, Rachata Ausavarungnirun, Mohammad Sadrosadati, Onur Mutlu:
Ariadne: A Hotness-Aware and Size-Adaptive Compressed Swap Technique for Fast Application Relaunch and Reduced CPU Usage on Mobile Devices. 1588-1602 - Zhehua Zhang, Suzhen Wu, Wenyan You, Chunfeng Du, Bo Mao:
Gemina: A Coordinated and High-Performance Memory Deduplication Engine. 1603-1617 - Ashkan Asgharzadeh, Josué Feliu, Manuel E. Acacio, Stefanos Kaxiras, Alberto Ros:
No Rush in Executing Atomic Instructions. 1618-1630 - Jie Ren, Bin Ma, Shuangyan Yang, Benjamin Francis, Ehsan K. Ardestani, Min Si, Dong Li:
Machine Learning-Guided Memory Optimization for DLRM Inference on Tiered Memory. 1631-1647 - Jaewon Lee, Euijun Chung, Saurabh Singh, Seonjin Na, Yonghae Kim, Jaekyu Lee, Hyesoon Kim:
Let-Me-In: (Still) Employing In-pointer Bounds Metadata for Fine-grained GPU Memory Safety. 1648-1661 - Jiwon Lee
, Gun Ko, Myung Kuk Yoon, Ipoom Jeong, Yunho Oh, Won Woo Ro:
Marching Page Walks: Batching and Concurrent Page Table Walks for Enhancing GPU Throughput. 1662-1677 - Yueqi Wang, Bingyao Li, Mohamed Tarek Ibn Ziad, Lieven Eeckhout, Jun Yang, Aamer Jaleel, Xulong Tang:
OASIS: Object-Aware Page Management for Multi-GPU Systems. 1678-1692 - Xia Zhao, Guangda Zhang, Lu Wang, Shiqing Zhang, Huadong Dai:
NearFetch: Saving Inter-Module Bandwidth in Many-Chip-Module GPUs. 1693-1706 - Hyojung Lee, Daehyeon Baek, Jimyoung Son, Jieun Choi, Kihyo Moon, Minsung Jang:
PAISE: PIM-Accelerated Inference Scheduling Engine for Transformer-based LLM. 1707-1719 - Seong Hoon Seo, Junghoon Kim, Donghyun Lee, Seonah Yoo, Seokwon Moon, Yeonhong Park, Jae W. Lee:
FACIL: Flexible DRAM Address Mapping for SoC-PIM Cooperative On-device LLM Inference. 1720-1733 - Weiyi Sun, Mingyu Gao, Zhaoshi Li, Aoyang Zhang, Iris Ying Chou, Jianfeng Zhu, Shaojun Wei, Leibo Liu:
Lincoln: Real-Time 50~100B LLM Inference on Consumer Devices with LPDDR-Interfaced, Compute-Enabled Flash Memory. 1734-1750 - Lian Liu, Shixin Zhao, Bing Li, Haimeng Ren, Zhaohui Xu, Mengdi Wang, Xiaowei Li, Yinhe Han, Ying Wang:
Make LLM Inference Affordable to Everyone: Augmenting GPU Memory with NDP-DIMM. 1751-1765

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.