Mentions légales du service

Skip to content
Snippets Groups Projects
Commit 2029ee7c authored by ANDRADE-BARROSO Guillermo's avatar ANDRADE-BARROSO Guillermo
Browse files

Add Arm NEO instructions set tutorial

change names to be more generic on SIMD instructions sets
parent 9b2eff62
Branches
No related tags found
No related merge requests found
......@@ -9,17 +9,17 @@ Work with these tutorials supposes that you have a python interpreter installed
## pthread simple tutorial
There is a simple tutorial for begin manipulate threads using POSIX Threads library [(pthread)](pthread/README.md)
## SSE and OpenMP
### Introduction to C++ SSE using Python and inline compilation
## SIMD and OpenMP
### Introduction to C++ SIMD instructions (Intel SSE or ARM Neo) using Python and inline compilation
There is a tutorial showing :
* how to include C++ code in python script with [inline module](https://github.com/GuillermoAndrade/inline)
* how to optimize C++ code using SSE and measure performance
[SSE tutorial using Python and Inline module](SSE/README.md)
* how to optimize C++ code using SIMD instructions and measure performance
[SIMD tutorial using Python and Inline module](SIMD/README.md)
### Multi-core programming with OpenMP and SSE using Python and Inline module
### Multi-core programming with OpenMP and SIMD instructions using Python and Inline module
There is a tutorial showing :
* how to optimize C++ code using OpenMP and SSE
[OpenMP and SSE tutorial using Python and Inline module](OpenMP/README.md)
* how to optimize C++ code using OpenMP and SIMD instructions
[OpenMP and SIMD instructions tutorial using Python and Inline module](OpenMP/README.md)
## CUDA
### Blocks and Grid on scalar multiplication kernel
......
## Goals of this tutorial ##
* launch and modify python scripts using NumPy module
* Learn how to run C++ code inside python using [inline module](https://github.com/GuillermoAndrade/inline)
* Optimize C++ code using SIMD instructions :
* with a Intel compatible CPU, work with SSE instructions set Intrinsic C api
* with a ARMv7 compatible CPU, work with NEO instructions set Intrinsic C api
## Very little introduction to **Python** and **NumPy** module ##
[Python](http://www.python.org/ ) is a programing language defined in http://docs.python.org/2/tutorial/index.html as this: *''Python is an easy to learn, powerful programming language. It has efficient high-level data structures and a simple but effective approach to object-oriented programming. Python’s elegant syntax and dynamic typing, together with its interpreted nature, make it an ideal language for scripting and rapid application development in many areas on most platforms.''*
Python seems to be well adapted for tutorials on parallel computing thanks to modules like [**inline**](https://github.com/GuillermoAndrade/inline), **PyCuda** and **PyOpenCL**. We will work with python 2.x versions (but all instructions can run with python 3.x).
### Launch a Python console in your PC ###
Python come with an interpreter console, you can launch it just typing :
```
$ python
Python 3.8.5 (default, Jul 28 2020, 12:59:40)
[GCC 9.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> 5+7
12
>>> exit()
$
```
#### IPython ####
There is a very powerful and comfortable interactive console named Ipython. Ipython allows you to access to command history, command complexion using [TAB] touch, help about command or objects using "?" or debugging scripts facilities. I recommended you to work with it. To launch Ipython console just tape:
```
$ ipython
Python 3.8.5 (default, Jul 28 2020, 12:59:40)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.13.0 -- An enhanced Interactive Python. Type '?' for help.
In [1]:
```
##### Ask for in-line documentation #####
```
In [1]: range?
Init signature: range(self, /, *args, **kwargs)
Docstring:
range(stop) -> range object
range(start, stop[, step]) -> range object
Return an object that produces a sequence of integers from start (inclusive)
to stop (exclusive) by step. range(i, j) produces i, i+1, i+2, ..., j-1.
start defaults to 0, and stop is omitted! range(4) produces 0, 1, 2, 3.
These are exactly the valid indices for a list of 4 elements.
When step is given, it specifies the increment (or decrement).
Type: type
Subclasses:
```
##### Completion #####
Completion using [TAB] touch:
```
In [2]: mylist = li
license()
list
```
##### Using a variable and lists #####
Define a variable ```my_list``` that contents a list of integers:
```
In [2]: my_list = list(range(10))
In [3]: my_list
Out[3]: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
In [4]: list.reverse?
Signature: list.reverse(self, /)
Docstring: Reverse *IN PLACE*.
Type: method_descriptor
In [5]: my_list.reverse()
In [6]: my_list
Out[6]: [9, 8, 7, 6, 5, 4, 3, 2, 1, 0]
In [7]: exit()
$
```
### Write a python program ###
You can edit a python program using a file editor like vim, or gedit (graphic mode only). Python files have usually ".py" extension.
$gedit my_test.py
By default python files are interpreted as ASCII to files. But you can set another character coding mode by adding this kind of instruction at the begin of file (here is UTF-8 character sets):
```python
# -*- coding: utf-8 -*-
```
Thats allow you to type char with French accents if you want.
You can define comments with "#" char to the end of line :
```python
a= 5 # this is a comment
```
or multi-line comments (in fact, triple-quoted strings) :
```python
'''
This is a multiline
comment.
'''
```
#### Block syntax ####
Unlike C or C++, python don't need semi-colons to separate instructions. You can just separate instructions with line-feeds:
```python
a=5
b=8
print ("a=", a)
print ("b=", b)
print ("a+b=", a+b)
```
And unlike C or C++, in Python instructions blocks aren't delimited by "{}" but using indentation:
```python
if a> 2:
c=1
print ("a is bigger that 2")
else:
print ("a is smaller or equal to 2")
c=2
print ("c=",c)
```
#### Functions ####
You can define functions with parameters, and parameters with default values:
```python
def my_function(x, y=0):
a= 5*y + x**2
return a
def my_function(x, y=0):
a= 5*y + x**2
return a
w= my_function(3)
z= my_function(3,1)
y= my_function(y=2, x=3)
print "(w,z,y)=",(w,z,y)
```
### Run your program ###
You can run your program with python:
```
$ python my_test.py
a= 5
b= 8
a+b= 13
a is bigger that 2
c= 1
(w,z,y)= (9, 14, 19)
$
```
Inside ipython you can run your program with:
```
In [1]: %run my_test.py
a= 5
b= 8
a+b= 13
a is bigger that 2
c= 1
(w,z,y)= (9, 14, 19)
```
After your program run, you can watch state of variable or call methods:
```
In [2]: b
Out[2]: 8
In [3]: my_function(4)
Out[3]: 16
```
### NumPy ###
NumPy is a module that allow you to efficiently manipulate array of data in high level with a optimized operation.
You can find more information here : http://docs.scipy.org/doc/numpy/user/index.html
You can access to Numpy functions by importing module:
```python
import numpy
```
#### Manipulate arrays ####
##### Create an array of float32 #####
An empty linear array of 1000 elements:
```python
a= numpy.empty(1000,numpy.float32)
```
A linear array of 1000 elements with all values set to ```zero``` :
```python
a= numpy.zeros(1000,numpy.float32)
```
A linear array of 1000 elements with all values set to ```one``` :
```python
a= numpy.ones(1000,numpy.float32)
```
A linear array of 1000 from with random values between 0 and 1:
```python
a=numpy.random.rand(1000).astype(numpy.float32)
```
A linear array of 1000 from with linear from values 0 to 1000 :
```python
a= numpy.arange(1000, dtype=numpy.float32)
```
##### Access to elements #####
Get the first element:
```python
In [3]: a[0]
Out[3]: 0.0
```
Get the last:
```python
In [4]: a[999]
Out[4]: 999.0
```
But in general way get the last element of any linear array:
```python
In [5]: a[-1]
Out[5]: 999.0
```
##### Copy and reference of an array #####
Assignation by reference:
```python
In [3]: b=a
In [4]: a[0]
Out[4]: 0.0
In [5]: b[0]=5
In [6]: b[0]
Out[6]: 5.0
In [7]: a[0]
Out[7]: 5.0
```
To do a copy (physical copy in memory)
```python
In [2]: a=numpy.arange(1000, dtype=numpy.float32)
In [3]: b=a.copy()
In [4]: b[0]=5
In [5]: a[0]
Out[5]: 0.0
```
#### Views and slicing ####
Numpy allows you to get different views of an array using ''slicing''. An Slice is create with index manipulation. For example:
```python
In [2]: a=numpy.arange(10, dtype=numpy.float32)
In [3]: a
Out[3]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.], dtype=float32)
In [4]: a[0:9:2]
Out[4]: array([ 0., 2., 4., 6., 8.], dtype=float32)
```
Index 0:9:2 indicate de begin = 0 to the end = 9 with a step of 2.
##### Reverse Indexing #####
Step can be a negative integer:
```python
In [5]: a[9:0:-2]
Out[5]: array([ 9., 7., 5., 3., 1.], dtype=float32)
```
in this case a[9:0:-2] is equivalent to a[-1:0:-2]
Slice is a view of existent array. Is is not a copy of data.
##### Default Index #####
It is possible to take default values for begin or end index:
```python
In [6]: a[::-2]
Out[6]: array([ 9., 7., 5., 3., 1.], dtype=float32)
```
Or for step:
```python
In [7]: a[:]
Out[7]: array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.], dtype=float32)
In [8]: a[5:]
Out[8]: array([ 5., 6., 7., 8., 9.], dtype=float32)
In [9]: a[:5]
Out[9]: array([ 0., 1., 2., 3., 4.], dtype=float32)
```
#### Array Operators ####
##### Scalar operation #####
Scalar operation with a array is equivalent to apply the operation between the scalar and every element of the array:
```python
In [1]: import numpy
In [2]: a=numpy.arange(10, dtype=numpy.float32)
In [3]: a*4
Out[3]: array([ 0., 4., 8., 12., 16., 20., 24., 28., 32., 36.], dtype=float32)
In [4]: 5+a
Out[4]: array([ 5., 6., 7., 8., 9., 10., 11., 12., 13., 14.], dtype=float32)
```
##### Array Vs. Array operation#####
Binary operations like "*" "+" are do in element-wise way :
```python
In [1]: import numpy
In [2]: a=numpy.arange(10, dtype=numpy.float32)
In [3]: a*4
Out[3]: array([ 0., 4., 8., 12., 16., 20., 24., 28., 32., 36.], dtype=float32)
In [4]: 5+a
Out[4]: array([ 5., 6., 7., 8., 9., 10., 11., 12., 13., 14.], dtype=float32)
In [5]: a+a
Out[5]: array([ 0., 2., 4., 6., 8., 10., 12., 14., 16., 18.], dtype=float32)
In [6]: a*a
Out[6]: array([ 0., 1., 4., 9., 16., 25., 36., 49., 64., 81.], dtype=float32)
In [7]: c=-a
In [8]: a*c
Out[8]: array([ -0., -1., -4., -9., -16., -25., -36., -49., -64., -81.], dtype=float32)
```
Matrix dot product must be called explicitly :
```python
In [15]: numpy.dot(a,a)
Out[15]: 285.0
```
## Using **Inline** module to integrate C/C++ code in python ##
**Inline** is a module that allow in-line inclusion of C/C++ code in python using code generation and compilation on the fly.
see https://github.com/GuillermoAndrade/inline for more information.
Actually, Python is compiled using C language and is possible to call functions from a shared library using C interface. For C++ functions python need to pass throw a C wrapper function that negotiate conversion from pure C parameters and C++ class objects in real C++ functions.
### Hello world with Inline ###
Imagine a C function that take a string parameter `text` and print a message :
```c
#include <stdio.h>
void Hello( char * text)
{
printf("Hello %s \n", text);
}
```
We can call a similar function in python with this code:
```python
In [1]: code=r'''
...: #include <stdio.h>
...: void Hello( char * text)
...: {
...: printf("Hello %s \n", text);
...: }
...: '''
In [2]: import inline
In [3]: lib=inline.c(code)
In [4]: lib.Hello(b'world')
Hello world
Out[4]: 13
```
After defined de C code as a python raw text, we can compile using `inline.c` function to produce a object representing a share library . Them we can call a function from this library as this : `lib.Hello(b'world')`.
But the argument python text passed to this function will be a string of bytes, that why we use `b'world'` to indicate to python the string is a bytes array
### Function with two arguments and a return value ###
See next C example:
```c
#include<math.h>
float norm( float x, float y)
{
return sqrt(x*x+y*y);
}
```
We can inline compile this code as precedent :
```python
In [7]: code=r'''
...: #include<math.h>
...: float norm( float x, float y)
...: {
...: return sqrt(x*x+y*y);
...: }
...: '''
In [8]: lib=inline.c(code)
```
But for call `norm` function we need to indicate the type of float for arguments to python interface object of this function. We will do that with types defined in module `ctypes` module. There are two ways to do that:
**First way** : use a object that handle float number in C call :
```python
In [6]: import ctypes
In [7]: lib.norm( ctypes.c_float(3.0), ctypes.c_float(5.0))
```
**Second way** : define arguments types for `norm` function :
```python
In [8]: lib.norm.argtypes = [ctypes.c_float, ctypes.c_float]
In [9]: lib.norm(3.0,5.0)
```
But for returned value from the function call (float type), we also need to define the conversion type to python:
```python
In [11]: lib.norm.restype = ctypes.c_float
In [12]: lib.norm(3.0,5.0)
Out[12]: 5.830951690673828
```
### Passing NumPy Arrays as arguments in read/write mode ###
Passing a NumPy array to a inline code in C is equivalent to pass pointers of array in C, with a specific type converter :
```python
import inline
import ctypes
import numpy
size=10
a=numpy.arange(size, dtype=numpy.float32)
b=numpy.arange(size, dtype=numpy.float32)
c=numpy.empty_like(a)
code = '''
void sum( int size, float *a, float *b, float *c)
{
for(unsigned int i=0; i < size ; i++)
{
c[i]=a[i]+b[i];
}
}
'''
lib = inline.c(code)
p_float= numpy.ctypeslib.ndpointer(dtype=numpy.float32)
lib.sum.argtypes = [ctypes.c_int, p_float, p_float, p_float]
```
The result is :
```python
In [18]: lib.sum(len(a), a, b, c)
Out[18]: 10
In [19]: c
Out[19]: array([ 0., 2., 4., 6., 8., 10., 12., 14., 16., 18.], dtype=float32)
```
We can compare `c` array with the sum computed by numpy `a+b` :
```python
In [20]: c - (a+b)
Out[20]: array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0.], dtype=float32)
```
## Optimization of a C++ code with SIMD instructions set ##
It's time to integrate NumPy and Inline to have benchmark program for optimization tests.
### Benchmark code ###
This code define a benchmark for test a classical procedure in linear algebra packages : SAXPY in a particular case.
This code use others optional arguments in call of Inline :
* ```extra_compile_args``` : Compilation arguments to allow to use SIMD instructions and OpenMP instructions
* ```extra_link_args``` : link arguments to allow to use SIMD instructions and OpenMP instructions
```python
import numpy
import ctypes
from time import time
import inline
sizeX = 1000000
numberIterations =1000
X = numpy.random.rand(sizeX).astype(numpy.float32)
Y = numpy.empty(sizeX).astype(numpy.float32)
def BenchmarkCode(name, code,X,Y):
# init
Y[:]=0.5
X[:]=1.0
# compile the code
lib=inline.cxx(code, compiler_extra_args=['-march=native','-fopenmp'], link_extra_args= ['-march=native','-fopenmp'])
p_float= numpy.ctypeslib.ndpointer(dtype=numpy.float32)
lib.compute.argtypes = [ctypes.c_int, ctypes.c_int, p_float, p_float]
# start chronometer
start_time = time()
# run the code
lib.compute(numberIterations, sizeX, X, Y)
# stop chronometer
stop_time = time()
execution_time= stop_time - start_time
print("execution time for "+name+" code = "+ str(execution_time))
return execution_time
# C++ reference code
referenceCode="""
#line 33 "saxpy.py" // helpful for debug
extern "C" {
#ifdef __SSE2__
#include<xmmintrin.h>
#else
#include <arm_neon.h>
#endif
void saxpy(int n, float alpha, float *X, float *Y)
{
int i;
for (i=0; i<n; i++)
Y[i] += alpha * X[i];
}
void compute(int numberIterations, int sizeX, float *X, float *Y )
{
for(int j=0; j< numberIterations;j++)
saxpy(sizeX, 0.001f, X, Y);
return ;
}
}
"""
referenceTime=BenchmarkCode('Reference', referenceCode,X,Y)
SIMDCode="""
#line 58 "saxpy.py" // helpful for debug
extern "C" {
#ifdef __SSE2__
#include<xmmintrin.h>
#else
#include <arm_neon.h>
#endif
void saxpy(int n, float alpha, float *X, float *Y)
{
int i;
for (i=0; i<n; i++)
Y[i] += alpha * X[i];
}
void compute(int numberIterations, int sizeX, float *X, float *Y )
{
for(int j=0; j< numberIterations;j++)
saxpy(sizeX, 0.001f, X, Y);
return ;
}
}
"""
SIMDTime=BenchmarkCode('SIMD', SIMDCode,X,Y)
print("speed up for SIMD = " + str(referenceTime/SIMDTime))
```
### SAXPY ###
SAXPY is a classical linear algebra routine that take `X` and `Y` arrays of float in parameters an produce an a output `Y = Y + alpha *X`. Where `alpha` is a scalar input parameter.
In this tutorial we are interesting in computation performance of SAXPY in case where we need to cumulate 1000 iterative call of this function:
```c
for(int j=0; j< numberIterations;j++)
saxpy(sizeX, 0.001f, X, Y);
```
### SIMD version of SAXPY ###
In benchmark, we have a reference code of the function SAXPY in ```referenceCode``` string and a code to be changed in ```SIMDCode```.
1. Modify SIMDCode to compute SAXPY using either intel SSE or Arm NEO instructions (depends on your machine architecture) you can find NEO instructions references here : https://developer.arm.com/architectures/instruction-sets/intrinsics/#f:@navigationhierarchiessimdisa=[Neon]&f:@navigationhierarchiesreturnbasetype=[float] and Intel SSE references here https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#techs=SSE,SSE2
1. test this version and compare speed ratio between reference code and SIMDCode
1. Draw a schema of memory transfer at every call of SIMD code and global calls in main code
1. Modify main code to integrate for loop `for(int j=0; j< numberIterations;j++`) inside SAXPY implementation
1. Change order of loops to determine best memory schema access
1. Measure corresponding acceleration (Speed-Up) in relation with reference code
1. Verify that results are identical for reference function and your code.
------------------
Go back to [Parallel Computing Tutorials](../README.md)
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import atexit
import ctypes
import distutils.ccompiler
import os.path
import platform
import shutil
import sys
import tempfile
__version__ = '0.0.1'
def c(source, libraries=[], compiler_extra_args=[], link_extra_args=[]):
r"""
>>> c('int add(int a, int b) {return a + b;}').add(40, 2)
42
>>> sqrt = c('''
... #include <math.h>
... double _sqrt(double x) {return sqrt(x);}
... ''', ['m'])._sqrt
>>> sqrt.restype = ctypes.c_double
>>> sqrt(ctypes.c_double(400.0))
20.0
"""
path = _cc_build_shared_lib(source, '.c', libraries,
compiler_extra_args, link_extra_args)
return ctypes.cdll.LoadLibrary(path)
def cxx(source, libraries=[], compiler_extra_args=[], link_extra_args=[]):
r"""
>>> cxx('extern "C" { int add(int a, int b) {return a + b;} }').add(40, 2)
42
"""
path = _cc_build_shared_lib(source, '.cc', libraries,
compiler_extra_args, link_extra_args)
return ctypes.cdll.LoadLibrary(path)
cpp = cxx # alias
def cxx2asm(source, compiler_extra_args=[]):
return _cc_get_assembly_code(source, '.cc', compiler_extra_args)
def c2asm(source, compiler_extra_args=[]):
return _cc_get_assembly_code(source, '.c', compiler_extra_args)
def python(source):
r"""
>>> python('def add(a, b): return a + b').add(40, 2)
42
"""
obj = type('', (object,), {})()
_exec(source, obj.__dict__, obj.__dict__)
return obj
def _cc_get_assembly_code(source, suffix, compiler_extra_args):
assembly_code=None
tempdir = tempfile.mkdtemp()
atexit.register(lambda: shutil.rmtree(tempdir))
with tempfile.NamedTemporaryFile('w+', suffix=suffix, dir=tempdir) as f:
f.write(source)
f.seek(0)
name=f.name
cc = distutils.ccompiler.new_compiler()
args = [] + compiler_extra_args
if platform.system() == 'Linux':
args.append('-fPIC')
assembly_args = ['-S']
assembly_file = cc.compile((name,), tempdir, extra_postargs=args+assembly_args)
with open(assembly_file[0],'r') as fa:
assembly_code=fa.read()
return assembly_code
def _cc_build_shared_lib(source, suffix, libraries,
compiler_extra_args, link_extra_args, return_assembly_code=False):
tempdir = tempfile.mkdtemp()
atexit.register(lambda: shutil.rmtree(tempdir))
with tempfile.NamedTemporaryFile('w+', suffix=suffix, dir=tempdir) as f:
f.write(source)
f.seek(0)
name=f.name
cc = distutils.ccompiler.new_compiler()
args = [] + compiler_extra_args
if platform.system() == 'Linux':
args.append('-fPIC')
obj = cc.compile((name,), tempdir, extra_postargs=args)
for library in libraries:
cc.add_library(library)
cc.link_shared_lib(obj, name, tempdir, extra_postargs=link_extra_args)
filename = cc.library_filename(name, 'shared')
return os.path.join(tempdir, filename)
def _exec(object, globals, locals):
r"""
>>> d = {}
>>> exec('a = 0', d, d)
>>> d['a']
0
"""
if sys.version_info < (3,):
exec('exec object in globals, locals')
else:
exec(object, globals, locals)
import numpy
import ctypes
from time import time
import inline
sizeX = 1000000
numberIterations =1000
X = numpy.random.rand(sizeX).astype(numpy.float32)
Y = numpy.empty(sizeX).astype(numpy.float32)
def get_assembly(code):
return inline.cxx2asm(code, compiler_extra_args=['-march=native','-fopenmp', '-lstdc++'])
def BenchmarkCode(name, code, X, Y, SIMD=True, OMP=True):
# init
Y[:]=0.5
X[:]=1.0
# compile the code
compiler_extra_args = ['-g0','-lstdc++']
link_extra_args = ['-lstdc++']
if SIMD :
compiler_extra_args+=['-march=native']
link_extra_args += ['-march=native']
if OMP:
compiler_extra_args+=['-fopenmp']
link_extra_args += ['-fopenmp']
lib=inline.cxx(code, compiler_extra_args= compiler_extra_args, link_extra_args= link_extra_args)
p_float= numpy.ctypeslib.ndpointer(dtype=numpy.float32)
lib.compute.argtypes = [ctypes.c_int, ctypes.c_int, p_float, p_float]
# print assembler code
asm_code=inline.cxx2asm(code, compiler_extra_args= compiler_extra_args)
with open(name+'.s','w') as f:
f.write(asm_code)
# start chronometer
start_time = time()
# run the code
lib.compute(numberIterations, sizeX, X, Y)
# stop chronometer
stop_time = time()
execution_time= stop_time - start_time
print("execution time for "+name+" code = "+ str(execution_time))
return execution_time
# C++ reference code
referenceCode="""
#line 52 "saxpy.py" // helpful for debug
extern "C" {
extern "C" {
#ifdef __SSE2__
#include<xmmintrin.h>
#else
#include <arm_neon.h>
#endif
void saxpy(int n, float alpha, float *X, float *Y)
{
int i;
for (i=0; i<n; i++)
Y[i] += alpha * X[i];
}
void compute(int numberIterations, int sizeX, float *X, float *Y )
{
for(int j=0; j< numberIterations;j++)
saxpy(sizeX, 0.001f, X, Y);
return ;
}
}
"""
referenceTime=BenchmarkCode('Reference', referenceCode, X, Y)
SIMDCode="""
#line 82 "saxpy.py" // helpful for debug
extern "C" {
#include<xmmintrin.h>
void saxpy(int n, float alpha, float *X, float *Y)
{
int i;
for (i=0; i<n; i++)
Y[i] += alpha * X[i];
}
void compute(int numberIterations, int sizeX, float *X, float *Y )
{
for(int j=0; j< numberIterations;j++)
saxpy(sizeX, 0.001f, X, Y);
return ;
}
}
"""
SIMDTime=BenchmarkCode('SIMD', SIMDCode,X,Y)
print("speed up for SIMD = " + str(referenceTime/SIMDTime))
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment