@@ -95,10 +95,11 @@ Implicit copy of data from source in Host memory to GPU memory are performed wit
...
@@ -95,10 +95,11 @@ Implicit copy of data from source in Host memory to GPU memory are performed wit
All the job is done by one single block of 400 threads. every thread execute a single load, multiplication and store from source to destination. This is possible because index is a bijective function of threadIdx.x that is unique index for all thread in linear block.
All the job is done by one single block of 400 threads. every thread execute a single load, multiplication and store from source to destination. This is possible because index is a bijective function of threadIdx.x that is unique index for all thread in linear block.