This event has ended. View the official site or create your own event → Check it out
This event has ended. Create your own
View analytic
Friday, March 18 • 10:40 - 11:20
Scalarization across threads

Sign up or log in to save this to your schedule and see who's attending!

Some of the modern highly parallel architectures include separate vector arithmetic units to achieve better performance on parallel algorithms. On the other hand, real world applications never operate on vector data only. In most cases whole data flow is intended to be processed by vector units. In fact, vector operations on some platforms (for instance, with massive data parallelism) may be expensive, especially for parallel memory operations. Sometimes instructions operating on vectors of identical values could be transformed into corresponding scalar form.

The goal of this presentation is to outline a technique which allows to split program data flow to separate vector and scalar parts so that they can be executed on vector and scalar arithmetic units separately.

The analysis has been implemented in the HSA compiler as an iterative solver over SSA form. The result of the analysis is a set of memory operations legitimate to be transformed into a scalar form. The subsequent transformations resulted in a small performance increase across the board, and gain up to 10% increase in a few benchmarks, one of them being HEVC decoder.

avatar for Alexander Timofeev

Alexander Timofeev

Lead SW engineer, Luxoft
Engineering compilers since 2004 when joined Sun-Micro compiler team. | Since 2012 working on the AMD HSA compiler. | | Interests: HPC, data parallel processing | Me at LinkedIn: ru.linkedin.com/in/alexandertimofeev

Friday March 18, 2016 10:40 - 11:20

Attendees (11)