


default search action
20th PPOPP 2015: San Francisco, CA, USA
- Albert Cohen, David Grove:
Proceedings of the 20th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, PPoPP 2015, San Francisco, CA, USA, February 7-11, 2015. ACM 2015, ISBN 978-1-4503-3205-7
Concurrency
- Vincent Gramoli
:
More than you ever wanted to know about synchronization: synchrobench, measuring the impact of the synchronization on concurrent algorithms. 1-10 - Dan Alistarh, Justin Kopinsky, Jerry Li, Nir Shavit:
The SprayList: a scalable relaxed priority queue. 11-20 - Maya Arbel, Adam Morrison:
Predicate RCU: an RCU for scalable concurrent updates. 21-30 - Guy Golan-Gueta, G. Ramalingam, Mooly Sagiv, Eran Yahav:
Automatic scalable atomicity via semantic locking. 31-41
Code Generation
- Austin R. Benson
, Grey Ballard
:
A framework for practical parallel fast matrix multiplication. 42-53 - Aravind Acharya, Uday Bondhugula:
PLUTO+: near-complete modeling of affine transformations for parallelism and locality. 54-64 - Mahesh Ravishankar, Roshan Dathathri, Venmugil Elango, Louis-Noël Pouchet, J. Ramanujam
, Atanas Rountev, P. Sadayappan
:
Distributed memory code generation for mixed Irregular/Regular computations. 65-75
Transactional Memory
- Lingxiang Xiang, Michael L. Scott
:
Software partitioning of hardware transactions. 76-86 - Alexandro Baldassin
, Edson Borin, Guido Araujo:
Performance implications of dynamic memory allocators on transactional memory systems. 87-96 - Minjia Zhang, Jipeng Huang, Man Cao, Michael D. Bond
:
Low-overhead software transactional memory with progress guarantees and strong semantics. 97-108
Large Scale Parallelism
- Milind Chabbi, Wim Lavrijsen, Wibe de Jong
, Koushik Sen, John M. Mellor-Crummey
, Costin Iancu:
Barrier elision for production parallel programs. 109-119 - Loïc Thébault, Eric Petit, Quang Dinh:
Scalable and efficient implementation of 3d unstructured meshes computation: a case study on matrix assembly. 120-129 - Nathan R. Tallent
, Abhinav Vishnu, Hubertus Van Dam
, Jeff Daily
, Darren J. Kerbyson, Adolfy Hoisie
:
Diagnosing the causes and severity of one-sided message contention. 130-139
Verification and Accelerators
- Yen-Jung Chang, Vijay K. Garg:
A parallel algorithm for global states enumeration in concurrent systems. 140-149 - Tiago Cogumbreiro
, Raymond Hu
, Francisco Martins
, Nobuko Yoshida
:
Dynamic deadlock verification for general barrier synchronisation. 150-160 - Yi-Ping You
, Hen-Jung Wu, Yeh-Ning Tsai, Yen-Ting Chao:
VirtCL: a framework for OpenCL device abstraction and management. 161-172 - Arash Ashari, Shirish Tatikonda, Matthias Boehm, Berthold Reinwald, Keith Campbell, John Keenleyside, P. Sadayappan
:
On optimizing machine learning workloads via kernel fusion. 173-182
Algorithms
- Kaiyuan Zhang, Rong Chen, Haibo Chen:
NUMA-aware graph-structured analytics. 183-193 - Chenning Xie, Rong Chen, Haibing Guan, Binyu Zang, Haibo Chen:
SYNC or ASYNC: time to fuse for distributed graph-parallel computation. 194-204 - Yuan Tang, Ronghui You, Haibin Kan, Jesmin Jahan Tithi, Pramod Ganapathi
, Rezaul Alam Chowdhury:
Cache-oblivious wavefront: improving parallelism of recursive dynamic programming algorithms without losing cache-efficiency. 205-214
Locking and Locality
- Milind Chabbi, Michael W. Fagan, John M. Mellor-Crummey
:
High performance locks for multi-level NUMA systems. 215-226 - Zoltan Majó, Thomas R. Gross:
A library for portable and composable data locality optimizations for NUMA systems. 227-238 - Abdelhalim Amer, Huiwei Lu, Yanjie Wei, Pavan Balaji, Satoshi Matsuoka:
MPI+Threads: runtime contention and remedies. 239-248
Poster Abstracts
- Andrew J. McPherson, Vijay Nagarajan, Susmit Sarkar
, Marcelo Cintra:
Fence placement for legacy data-race-free programs via synchronization read detection. 249-250 - Xianglan Piao, Channoh Kim, Younghwan Oh, Huiying Li, Jincheon Kim, Hanjun Kim
, Jae W. Lee:
JAWS: a JavaScript framework for adaptive CPU-GPU work sharing. 251-252 - Hyunseok Seo, Jinwook Kim, Min-Soo Kim:
GStream: a graph streaming processing method for large-scale graphs on GPUs. 253-254 - Nabeel AlSaber, Milind Kulkarni:
SemCache++: semantics-aware caching for efficient multi-GPU offloading. 255-256 - Jungwon Kim
, Seyong Lee
, Jeffrey S. Vetter:
An OpenACC-based unified programming model for multi-accelerator systems. 257-258 - Paul Thomson, Alastair F. Donaldson:
The lazy happens-before relation: better partial-order reduction for systematic concurrency testing. 259-260 - Azzam Haidar, Tingxing Dong, Piotr Luszczek, Stanimire Tomov
, Jack J. Dongarra:
Towards batched linear solvers on accelerated hardware platforms. 261-262 - Saurav Muralidharan, Michael Garland, Bryan Catanzaro, Albert Sidelnik, Mary W. Hall
:
A collection-oriented programming model for performance portability. 263-264 - Yangzihao Wang, Andrew A. Davidson, Yuechao Pan, Yuduo Wu, Andy Riffel, John D. Owens:
Gunrock: a high-performance graph processing library on the GPU. 265-266 - Olga Pearce, Todd Gamblin, Bronis R. de Supinski, Martin Schulz
, Nancy M. Amato:
Decoupled load balancing. 267-268 - Ye Jin, Mingliang Liu, Xiaosong Ma, Qing Liu, Jeremy Logan, Norbert Podhorszki
, Jong Youl Choi, Scott Klasky:
Combining phase identification and statistic modeling for automated parallel benchmark generation. 269-270 - Xuanhua Shi, Junling Liang, Sheng Di, Bingsheng He
, Hai Jin, Lu Lu, Zhixiang Wang, Xuan Luo, Jianlong Zhong:
Optimization of asynchronous graph processing on GPU with hybrid coloring model. 271-272 - Scott West, Sebastian Nanz, Bertrand Meyer:
Efficient and reasonable object-oriented concurrency. 273-274 - Vassilis Vassiliadis, Konstantinos Parasyris, Charalambos Chalios
, Christos D. Antonopoulos
, Spyros Lalis
, Nikolaos Bellas
, Hans Vandierendonck, Dimitrios S. Nikolopoulos
:
A programming model and runtime system for significance-aware energy-efficient computing. 275-276 - Martin Wimmer, Jakob Gruber, Jesper Larsson Träff, Philippas Tsigas
:
The lock-free k-LSM relaxed priority queue. 277-278 - Emmanuelle Saillard, Patrick Carribault, Denis Barthou
:
Static/Dynamic validation of MPI collective communications in multi-threaded context. 279-280 - Arunmoezhi Ramachandran
, Neeraj Mittal:
CASTLE: fast concurrent internal binary search tree using edge-based locking. 281-282 - Madan Mohan Das, Gabriel Southern, Jose Renau:
Section based program analysis to reduce overhead of detecting unsynchronized thread communication. 283-284 - Harshvardhan, Nancy M. Amato, Lawrence Rauchwerger:
A hierarchical approach to reducing communication in parallel graph algorithms. 285-286 - Yifeng Chen, Xiang Cui, Hong Mei:
Tiles: a new language mechanism for heterogeneous parallelism. 287-288 - Cosmin Radoi, Stephan Herhut, Jaswanth Sreeram, Danny Dig:
Are web applications ready for parallelism? 289-290

manage site settings
To protect your privacy, all features that rely on external API calls from your browser are turned off by default. You need to opt-in for them to become active. All settings here will be stored as cookies with your web browser. For more information see our F.A.Q.