sse

B

bill

I am working on a project with alot of array manipulations (sin,cos,mult).
Does anyone know of a package utilizing simd (sse or mmx) to increase the
processing capability. ? (particularly for the sin/cos)

Gotta go fast!
Thanks,
Bill
 
A

Andre

Just wanted to know.. what really is sse and how can it improve code
performance? Thanks

-Andre
 
C

Carl Daniel [VC++ MVP]

bill said:
I am working on a project with alot of array manipulations
(sin,cos,mult). Does anyone know of a package utilizing simd (sse or
mmx) to increase the processing capability. ? (particularly for the
sin/cos)

IIRC, MMX/SSE/SSE2 won't help with the trig functions, but can definitely be
used to optimize matrix multiplication.

-cd
 
T

Teis Draiby

I've used SSE2 (Streaming SIMD Extensions) instructions for performance
critical loops in inline assembly blocks directly in my VC++ code.
I'm not an experinced SSE2 programmer, but I'll share my knoledge anyway.
There are other ways to get use of SSE2, see below.

If you want to use SSE2 instructions directly, Intels 'IA-32 Architecture
Software Developer's Manual' tells you about everything you need to know
when it comes to MMX/SSE/SSE2. You can get it
ftp://download.intel.com/design/Pentium4/manuals/24547012.pdf . -down to
very low-level though. No code examples either. -Takes a lot of coffee.


SSE2 is an extension to SSE and MMX. They all use the SIMD - 'single
instruction multiple data' -model. That means that with a single instruction
you can perform an operation on up to four 32-bit numbers - simultaniously.
Therefore using SIMD increases the speed significantly. You can use SIMD
instructions in cases where you need to perform the same operation on a
large amount of similar data, like video or 3D applications.

MMX: Instruction set that operates on 64 bit registers, for example
containing two 32 bit integers -simultaneously. Unfortunately MMX only
includes instructions that operates on integers.
The eight MMX registers are called MM0 - MM7 (An easy way to identify the
use of MMX in assembly code).

SSE: extends MMX so that you can also do floating point operations on 128
bit registers. With SSE you can operate on four 32 bit floating point
numbers, or two 64 bit double precision floating point numbers -
simultaneously. Came with the Intel P3 processors.
The eight SSE registers are called XMM0 - XMM7.

SSE2: Yet another extension. Includes all SSE operations but adds integer
operations. Now you can work with four 32 bit integers. Introduced with the
P4 processors.
SSE2 uses the same registers as SSE.

SSE2/SSE are processor specific. SSE instructions work only on Intel P3
processors and later. SSE2 instructions work only on Intel P4 processors and
not on AMD processors. The AMD eqvivalent to SSE/SSE2 is called '3DNow!'.


You can use SSE2 instructions directly in Visual Studio in an inline
assembly '__asm' block, as I have done in very performance dependent loops.

As said, there are also other ways to utilize the SSE2 registers, which I
have no experience of:
I think you can allow the compiler to use SIMD instructions when
interpreting your c++-code. I don't know how.
You can also use SIMD 'Intrinsics' which is C++ instructions that
specifically use SIMD instructions.
+... ?

regards, Teis
 
B

Brandon Bray [MSFT]

Teis said:
You can use SSE2 instructions directly in Visual Studio in an inline
assembly '__asm' block, as I have done in very performance dependent
loops.

Of course, inline assembly isn't always necessary. Often it is better to use
intrinsic functions (which are also documented with the SSE/SSE2 support in
Visual C++). The compiler is able to deal with intrinsic functions better
because it can do optimizations beyond what you could do with inline
assembly, and its more portable between different architectures.

Just my two cents. Cheerio!
 
B

Brandon Bray [MSFT]

Carl said:
IIRC, MMX/SSE/SSE2 won't help with the trig functions, but can
definitely be used to optimize matrix multiplication.

You're right. Although, there are some benefits to using the SSE/SSE2
registers for the trig functions rather than using the x87 FP stack. Over
time, the processors will optimize for register architectures. Already,
compilers handle register architectures better than stack architectures (as
evididenced in much better optimization for integer code). Anyways, some
trig routines supplied by Visual C++ will use SSE/SSE2 instructions after
first checking the CPU ID.

Don't forget that the compiler can also generate SSE/SSE2 instructions when
given either the /arch:SSE or /arch:SSE2 switches.

Hope that helps. Cheerio!
 
C

Carl Daniel [VC++ MVP]

Andre said:
Thanks a lot Teis, that helped :)

Just wondering... can we somehow use this in C# (or does the JIT
compiler have SSE support?) Thanks.

No direct support in C# at this time. Maybe in the future though. The only
way you could get at SSE-type technology from C# would be by calling native
C++ code (through COM interop, or PInvoke, or managed C++ IJW).

-cd
 
T

Teis Draiby

Don't forget that the compiler can also generate SSE/SSE2 instructions
when
given either the /arch:SSE or /arch:SSE2 switches.

-I got an "Command line warning D4002 : ignoring unknown option
'/arch:SSE2".
Is Visual Studio C++ .NET 2000 able to recognize this switch?


Thanks Teis
 
N

Niall

Ahh, I was under the impression that you could do 4 doubles with SSE2. It
sticks in my mind that this is the case, but it only comes from reading
reviews or other things about the processor. I've never done SSE2 coding on
a P4, so your explanation of just being allowed to use two doubles in the
same register seems likely to me.

Niall
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top