CUDA compute and copy engine queue limits -

- August 15, 2010

i seem encounter limit number of asynchronous kernel launches can queued in compute engine queue. after limit host blocked , gpu-cpu concurrency lost. not mentioned in cuda programming guide.

what maximum number of asynchronous kernel launches can queued in compute engine queue?
does maximum number depend in way on kernel being launched?
does time takes cpu put kernel launch in compute engine queue depend on kernel being launched?
what maximum number of asynchronous memcpy's can queued in copy engine queue?

i not sure there universal answer question, degree platform , cuda version specific afaik. answer bullet points

the limit queue size, believe, there maximum number of queue operations rather kernel launches. same total limit should apply combination of kernels, copy operations , stream events. total number of operations depends on platform , cuda version
no
no, once driver queue filled, time taken submit asynchronous operation considerably increased
see first point. don't believe driver distinguishes between copies, kernel launches, or events.

i can recall doing benchmarking circa cuda 2.1 , finding ran until 24 operations had been queued, time taken subsequent operations queued slowed. time cuda 3.0 had been released, didn't have code hit limit existed in older versions, changed. should trivial write benchmark check more modern cuda versions do.

Search This Blog

Employment & Recruiting

CUDA compute and copy engine queue limits -

Popular posts from this blog

How to calculate SNR of signals in MATLAB? -

Php - Delimiter must not be alphanumeric or backslash -

Delphi interface implements -