OpenACC Tutorial - Introduction: Difference between revisions

From Alliance Doc
Jump to navigation Jump to search
No edit summary
No edit summary
Line 20: Line 20:


A '''high throughput''' device will accomplish much more work, but in a longer amount of time. This is probably what you want if you are trying to solve a highly parallel problem. Examples of such tasks are numerous, and include matrix operations, Fourier transforms, multidimensional differential equations, etc. In real life, we could compare a high throughput device to a train or a bus. It will bring a lot of passengers from point A to point B, but in an admittedly longer time than a racing motorcycle or car.  
A '''high throughput''' device will accomplish much more work, but in a longer amount of time. This is probably what you want if you are trying to solve a highly parallel problem. Examples of such tasks are numerous, and include matrix operations, Fourier transforms, multidimensional differential equations, etc. In real life, we could compare a high throughput device to a train or a bus. It will bring a lot of passengers from point A to point B, but in an admittedly longer time than a racing motorcycle or car.  
[[File:Train.jpg|thumbnail|Throughput is like a train, slow, but carries a lot of work in a single trip]][[File:Motorcycle.jpeg|thumbnail|Speed is like a motorcycle. Very fast, but only carries one person at a time. ]]
[[File:Train.jpg|thumbnail|center|Throughput is like a train, slow, but carries a lot of work in a single trip]][[File:Motorcycle.jpeg|thumbnail|center|Speed is like a motorcycle. Very fast, but only carries one person at a time. ]]
}}
}}

Revision as of 15:25, 2 May 2016

Learning objectives
  • Understand the difference between a CPU and an accelerator.
  • Understand the difference between speed and throughput.
  • Understand the steps to take to port an existing code to an accelerator.


CPU vs accelerator

Historically, computing has developed around Central Processing Units (CPU) that were optimized for sequential tasks. That is, they would do only one computation during a given clock cycle. The frequency of these units steadily increased until about 2005, when the top speed of the high end CPUs reached a plateau at around 4 GHz. Since then - for reasons well explained in this article - CPU clock frequency has barely moved, and is even now often lower than 4 GHz. Instead, manufacturers started adding multiple computation cores within a single chipset, opening wide the era of parallel computing. Yet, even as of 2016, CPU are mostly optimized for sequential tasks, for which they present some major advantages, but also some weaknesses. First, they have direct access to the main computer memory, which can be very large. Second, because of their very fast clock speed, they can run a small number of tasks very quickly. However, they have relatively low memory bandwidth. They use cache mechanisms to mitigate this, but this implies that cache misses are very costly. They also are rather power hungry compared to accelerator.

Typical accelerators, such as GPU or coprocessors, are highly parallel chipsets. They are made out of hundreds or thousands of relatively simple and low frequency compute cores. Simply said, they are optimized for parallel computing. High end GPU usually have a few thousands of compute cores. They also have a high bandwidth to access their own device memory. They present significantly more compute resources than high end CPUs, and provide a much higher 'throughput, and much better performance per watt. However, they embed a relatively low amount of memory, and have a low per-thread performance.


Speed vs throughput, which is best ?

Depending on what kind of task you are trying to accomplish, you may want to use a high speed device such as CPU, or a high throughput device such as an accelerator.

A high speed device will accomplish a single task within a very short amount of time. This is probably what you want if you are trying to do a single sequential computation, such as the resolution of a one dimensional differential equation. In real life, we could compare a high speed device to a racing motorcycle or a racing care. It will bring a single passenger from point A to point B very quickly.

A high throughput device will accomplish much more work, but in a longer amount of time. This is probably what you want if you are trying to solve a highly parallel problem. Examples of such tasks are numerous, and include matrix operations, Fourier transforms, multidimensional differential equations, etc. In real life, we could compare a high throughput device to a train or a bus. It will bring a lot of passengers from point A to point B, but in an admittedly longer time than a racing motorcycle or car.

Throughput is like a train, slow, but carries a lot of work in a single trip
Speed is like a motorcycle. Very fast, but only carries one person at a time.