


default search action
ACM Transactions on Architecture and Code Optimization, Volume 16
Volume 16, Number 1, March 2019
- Ghassan Shobaki
, Austin Kerbow
, Christopher Pulido, William Dobson:
Exploring an Alternative Cost Function for Combinatorial Register-Pressure-Aware Instruction Scheduling. 1:1-1:30 - Yu-Ping Liu, Ding-Yong Hong
, Jan-Jan Wu, Sheng-Yu Fu, Wei-Chung Hsu:
Exploiting SIMD Asymmetry in ARM-to-x86 Dynamic Binary Translation. 2:1-2:24 - Mohammad Sadrosadati, Seyed Borna Ehsani
, Hajar Falahati, Rachata Ausavarungnirun, Arash Tavakkol, Mojtaba Abaee
, Lois Orosa
, Yaohua Wang, Hamid Sarbazi-Azad, Onur Mutlu:
ITAP: Idle-Time-Aware Power Management for GPU Execution Units. 3:1-3:26 - Halit Dogan, Masab Ahmad, Brian Kahne, Omer Khan:
Accelerating Synchronization Using Moving Compute to Data Model at 1, 000-core Multicore Scale. 4:1-4:27 - Leonid Azriel
, Lukas Humbel
, Reto Achermann, Alex Richardson
, Moritz Hoffmann, Avi Mendelson, Timothy Roscoe, Robert N. M. Watson, Paolo Faraboschi, Dejan S. Milojicic
:
Memory-Side Protection With a Capability Enforcement Co-Processor. 5:1-5:26 - Aamer Jaleel, Eiman Ebrahimi, Sam Duncan:
DUCATI: High-performance Address Translation by Extending TLB Reach of GPU-accelerated Systems. 6:1-6:24
Volume 16, Number 2, May 2019
- Yemao Xu, Dezun Dong, Weixia Xu, Xiangke Liao:
SketchDLC: A Sketch on Distributed Deep Learning Communication via Trace Capturing. 7:1-7:26 - Aristeidis Mastoras
, Thomas R. Gross:
Efficient and Scalable Execution of Fine-Grained Dynamic Linear Pipelines. 8:1-8:26 - Tae Jun Ham
, Juan L. Aragón
, Margaret Martonosi:
Efficient Data Supply for Parallel Heterogeneous Architectures. 9:1-9:23 - Savvas Sioutas, Sander Stuijk
, Luc Waeijen, Twan Basten, Henk Corporaal, Lou J. Somers:
Schedule Synthesis for Halide Pipelines through Reuse Analysis. 10:1-10:22 - Xiaoyuan Wang, Haikun Liu, Xiaofei Liao, Ji Chen, Hai Jin, Yu Zhang, Long Zheng, Bingsheng He
, Song Jiang:
Supporting Superpages and Lightweight Page Migration in Hybrid Memory Systems. 11:1-11:26 - Sahar Sargaran, Naser Mohammadzadeh
:
SAQIP: A Scalable Architecture for Quantum Information Processors. 12:1-12:21 - Prerna Budhkar, Ildar Absalyamov, Vasileios Zois, Skyler Windh
, Walid A. Najjar
, Vassilis J. Tsotras
:
Accelerating In-Memory Database Selections Using Latency Masking Hardware Threads. 13:1-13:28 - Heinrich Riebler
, Gavin Vaz, Tobias Kenter, Christian Plessl
:
Transparent Acceleration for Heterogeneous Platforms With Compilation to OpenCL. 14:1-14:26 - Xun Gong, Xiang Gong, Leiming Yu, David R. Kaeli:
HAWS: Accelerating GPU Wavefront Execution through Selective Out-of-order Execution. 15:1-15:22 - Yang Song, Olivier Alavoine, Bill Lin
:
A Self-aware Resource Management Framework for Heterogeneous Multicore SoCs with Diverse QoS Targets. 16:1-16:23 - Pedro Yébenes, Jose Rocher-Gonzalez, Jesús Escudero-Sahuquillo
, Pedro Javier García
, Francisco J. Alfaro, Francisco J. Quiles
, Crispín Gómez Requena, José Duato
:
Combining Source-adaptive and Oblivious Routing with Congestion Control in High-performance Interconnects using Hybrid and Direct Topologies. 17:1-17:26 - Mohammad A. Alshboul
, Hussein Elnawawy, Reem Elkhouly
, Keiji Kimura, James Tuck, Yan Solihin:
Efficient Checkpointing with Recompute Scheme for Non-volatile Main Memory. 18:1-18:27 - Zacharias Hadjilambrou, Marios Kleanthous, Georgia Antoniou, Antoni Portero
, Yiannakis Sazeides:
Comprehensive Characterization of an Open Source Document Search Engine. 19:1-19:21
Volume 16, Number 3, August 2019
- Bingchao Li, Jizeng Wei, Jizhou Sun, Murali Annavaram
, Nam Sung Kim:
An Efficient GPU Cache Architecture for Applications with Irregular Memory Access Patterns. 20:1-20:24 - Stephen I. Roberts, Steven A. Wright
, Suhaib A. Fahmy, Stephen A. Jarvis
:
The Power-optimised Software Envelope. 21:1-21:27 - Ram Srivatsa Kannan, Michael Laurenzano, Jeongseob Ahn
, Jason Mars, Lingjia Tang:
Caliper: Interference Estimator for Multi-tenant Environments Sharing Architectural Resources. 22:1-22:25 - Zhen Lin
, Hongwen Dai, Michael Mantor, Huiyang Zhou
:
Coordinated CTA Combination and Bandwidth Partitioning for GPU Concurrent Kernel Execution. 23:1-23:27 - Keryan Didier, Dumitru Potop-Butucaru, Guillaume Iooss, Albert Cohen
, Jean Souyris, Philippe Baufreton, Amaury Graillat:
Correct-by-Construction Parallelization of Hard Real-Time Avionics Applications on Off-the-Shelf Predictable Hardware. 24:1-24:27 - Pantea Zardoshti
, Tingzhe Zhou, Pavithra Balaji, Michael L. Scott
, Michael F. Spear
:
Simplifying Transactional Memory Support in C++. 25:1-25:24 - Jungwoo Park
, Myoungjun Lee
, Soontae Kim
, Minho Ju, Jeongkyu Hong
:
MH Cache: A Mult Stephen Jarvisi-retention STT-RAM-based Low-power Last-level Cache for Mobile Hardware Rendering Systems. 26:1-26:26 - Jakob Leben
, George Tzanetakis
:
Polyhedral Compilation for Multi-dimensional Stream Processing. 27:1-27:26 - Mohammad Sadegh Sadeghi
, Siavash Bayat Sarmadi
, Shaahin Hessabi
:
Toward On-chip Network Security Using Runtime Isolation Mapping. 28:1-28:25 - Stéphane Louise:
A First Step Toward Using Quantum Computing for Low-level WCETs Estimations. 29:1-29:22 - Artem Chikin, Taylor Lloyd, José Nelson Amaral, Ettore Tiotto, Muhammad Usman:
Memory-access-aware Safety and Profitability Analysis for Transformation of Accelerator-bound OpenMP Loops. 30:1-30:26 - Sanghoon Cha, Bokyeong Kim, Chang Hyun Park
, Jaehyuk Huh:
Morphable DRAM Cache Design for Hybrid Memory Systems. 31:1-31:24 - Chao Luo
, Yunsi Fei
, David R. Kaeli:
Side-channel Timing Attack of RSA on a GPU. 32:1-32:18 - Liang Yuan, Chen Ding, Wesley Smith, Peter J. Denning, Yunquan Zhang:
A Relational Theory of Locality. 33:1-33:26
Volume 16, Number 4, January 2020
- Arun Thangamani, V. Krishna Nandivada:
Optimizing Remote Communication in X10. 34:1-34:26 - Sriseshan Srikanth
, Anirudh Jain, Joseph M. Lennon, Thomas M. Conte
, Erik DeBenedictis, Jeanine E. Cook:
MetaStrider: Architectures for Scalable Memory-centric Reduction of Sparse Data Streams. 35:1-35:26 - Mostafa Koraei, Omid Fatemi, Magnus Jahre
:
DCMI: A Scalable Strategy for Accelerating Iterative Stencil Loops on FPGAs. 36:1-36:24 - Leeor Peled
, Uri C. Weiser, Yoav Etsion:
A Neural Network Prefetcher for Arbitrary Memory Access Patterns. 37:1-37:27 - Nicolas Vasilache, Oleksandr Zinenko
, Theodoros Theodoridis, Priya Goyal, Zachary DeVito, William S. Moses
, Sven Verdoolaege, Andrew Adams, Albert Cohen
:
The Next 700 Accelerated Layers: From Mathematical Expressions of Network Computation Graphs to Accelerated GPU Kernels, Automatically. 38:1-38:26 - Wenbin Jiang, Yang Ma
, Bo Liu, Haikun Liu, Bing Bing Zhou, Jian Zhu, Song Wu, Hai Jin:
Layup: Layer-adaptive and Multi-type Intermediate-oriented Memory Optimization for GPU-based CNNs. 39:1-39:23 - Sergi Siso, Wes Armour
, Jeyarajan Thiyagalingam
:
Evaluating Auto-Vectorizing Compilers through Objective Withdrawal of Useful Information. 40:1-40:23 - Salonik Resch, S. Karen Khatamifard, Zamshed Iqbal Chowdhury
, Masoud Zabihi, Zhengyang Zhao, Jianping Wang, Sachin S. Sapatnekar, Ulya R. Karpuzcu:
PIMBALL: Binary Neural Networks in Spintronic Memory. 41:1-41:26 - Zhen Hang Jiang, Yunsi Fei
, David R. Kaeli:
Exploiting Bank Conflict-based Side-channel Timing Leakage of GPUs. 42:1-42:24 - Kyle Daruwalla
, Heng Zhuo, Rohit Shukla
, Mikko H. Lipasti:
BitSAD v2: Compiler Optimization and Analysis for Bitstream Computing. 43:1-43:25 - Aristeidis Mastoras
, Thomas R. Gross:
Chunking for Dynamic Linear Pipelines. 44:1-44:25 - Manuel Selva
, Fabian Gruber, Diogo Sampaio, Christophe Guillon, Louis-Noël Pouchet, Fabrice Rastello:
Building a Polyhedral Representation from an Instrumented Execution: Making Dynamic Analyses of Nonaffine Programs Scalable. 45:1-45:26 - Ahmad Yasin
, Jawad Haj-Yahya
, Yosi Ben-Asher, Avi Mendelson:
A Metric-Guided Method for Discovering Impactful Features and Architectural Insights for Skylake-Based Processors. 46:1-46:25 - Jie Zhao
, Albert Cohen
:
Flextended Tiles: A Flexible Extension of Overlapped Tiles for Polyhedral Compilation. 47:1-47:25 - Daniel Gerzhoy
, Xiaowu Sun, Michael Zuzak, Donald Yeung:
Nested MIMD-SIMD Parallelization for Heterogeneous Microprocessors. 48:1-48:27 - Chunwei Xia, Jiacheng Zhao
, Huimin Cui, Xiaobing Feng, Jingling Xue
:
DNNTune: Automatic Benchmarking DNN Models for Mobile-cloud Computing. 49:1-49:26 - Ian Briggs, Arnab Das
, Mark Baranowski, Vishal Chandra Sharma, Sriram Krishnamoorthy
, Zvonimir Rakamaric, Ganesh Gopalakrishnan:
FailAmp: Relativization Transformation for Soft Error Detection in Structured Address Generation. 50:1-50:21 - Khalid Ahmad, Hari Sundar, Mary W. Hall
:
Data-driven Mixed Precision Sparse Matrix Vector Multiplication for GPUs. 51:1-51:24 - Larisa Stoltzfus, Bastian Hagedorn, Michel Steuwer
, Sergei Gorlatch, Christophe Dubach:
Tiling Optimizations for Stencil Computations Using Rewrite Rules in Lift. 52:1-52:25 - Michiel A. van der Vlag
, Georgios Smaragdos, Zaid Al-Ars, Christos Strydis
:
Exploring Complex Brain-Simulation Workloads on Multi-GPU Deployments. 53:1-53:25 - Reem Elkhouly
, Mohammad A. Alshboul
, Akihiro Hayashi, Yan Solihin, Keiji Kimura:
Compiler-support for Critical Data Persistence in NVM. 54:1-54:25 - Lorenzo Chelini, Oleksandr Zinenko
, Tobias Grosser
, Henk Corporaal:
Declarative Loop Tactics for Domain-specific Optimization. 55:1-55:25 - Asif Ali Khan
, Fazal Hameed
, Robin Bläsing, Stuart S. P. Parkin, Jerónimo Castrillón:
ShiftsReduce: Minimizing Shifts in Racetrack Memory 4.0. 56:1-56:23

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.