PC Review


Reply
Thread Tools Rate Thread

Additional registers in x86-64

 
 
Grumble
Guest
Posts: n/a
 
      13th Aug 2004
A few weeks ago, AMD published the SPECint2000 score for the FX-53:
http://www.spec.org/cpu2000/results/...628-03181.html

SPECint2000_peak = 1700
SPECint2000_base = 1601

I see that they used Intel's compiler on Windows XP Professional. Please
correct me if I am wrong. Windows XP is a 32-bit OS, thus the benchmarks
did not use the 8 additional general purpose registers defined in the
x86-64 instruction set, right?

I imagine that, even with 8 more registers available, gcc cannot
outperform Intel's compiler and Microsoft libraries on integer code?

I also noticed Sun's recent SPECfp2000 submission for the Opteron 150:
http://www.spec.org/cpu2000/results/...712-03241.html

SPECfp2000_peak = 1787
SPECfp2000_base = 1637

Sun did use a 64-bit OS, and it seems they compiled most benchmarks as
64-bit applications. I imagine the compiler (most often PathScale)
produced SIMD code to use the XMM registers?

In short, I am wondering how much improvement the 8 additional GPRs and
8 additional media registers bring...

--
Regards, Grumble
 
Reply With Quote
 
 
 
 
Tony Hill
Guest
Posts: n/a
 
      14th Aug 2004
On Fri, 13 Aug 2004 11:03:06 +0200, Grumble <(E-Mail Removed)> wrote:
>
>A few weeks ago, AMD published the SPECint2000 score for the FX-53:
>http://www.spec.org/cpu2000/results/...628-03181.html
>
>SPECint2000_peak = 1700
>SPECint2000_base = 1601
>
>I see that they used Intel's compiler on Windows XP Professional. Please
>correct me if I am wrong. Windows XP is a 32-bit OS, thus the benchmarks
>did not use the 8 additional general purpose registers defined in the
>x86-64 instruction set, right?


That is correct.

>I imagine that, even with 8 more registers available, gcc cannot
>outperform Intel's compiler and Microsoft libraries on integer code?


Correct again. The optimizations in GCC are not as good as those in
Intel's compiler, though the difference is generally not huge. Take a
look at the results AMD published for their 'A4800' systems. These
are a bunch of Opteron 144 (1.8GHz) processors running under a variety
of different OSes and using different compilers. The fastest results
they achieved was 1095 using Win2K3 (32-bit OS) + Intel's (32-bit)
compiler. For comparison, SuSE 8 for AMD64 (64-bit OS) + GCC 3.3
(64-bit) they managed 1045, and with SuSE 8 for x86 (32-bit OS) + GCC
3.3 for x86 (32-bit compiler) they turned in a score of 960.

So, in the end AMD showed an 8.8% improvement by going from 32 to
64-bit code, but they saw a 14% improvement going from Linux + GCC
(32-bit ) to Windows + Intel C (also 32-bit).

>I also noticed Sun's recent SPECfp2000 submission for the Opteron 150:
>http://www.spec.org/cpu2000/results/...712-03241.html
>
>SPECfp2000_peak = 1787
>SPECfp2000_base = 1637
>
>Sun did use a 64-bit OS, and it seems they compiled most benchmarks as
>64-bit applications. I imagine the compiler (most often PathScale)
>produced SIMD code to use the XMM registers?


Presumably yes, it would use SIMD code, the XMM registers and the
extra 8 integer registers (even with FP code you still need some
integer registers).

>In short, I am wondering how much improvement the 8 additional GPRs and
>8 additional media registers bring...


Usually more than enough to make up for the performance loss you would
expect with 64-bit code. Normally, if all else is equal, 64-bit code
is about 5-10% slower than 32-bit code until you blow your memory
limits, at which point 32-bit code just completely breaks down.
That's why most bi-arch systems still use lots of 32-bit applications
if they can, eg Sun's Solaris.

With AMD64 the extra registers have managed to improve the performance
enough that they not only negate this performance loss, but turn it
into a 5-10% performance gain on average. Not bad at all for a fairly
small cost in die space and virtually no changes to the instruction
set. FWIW the reason why AMD only went to 16 registers (still a
pretty low number as compared to most modern processors) is that this
is the most that they could squeeze into the x86 instruction set
without making fairly major changes (they did a pretty damn good job
of this, obviously they actually put some thought into how to extend
x86 to 64-bits as naturally as possible).

-------------
Tony Hill
hilla <underscore> 20 <at> yahoo <dot> ca
 
Reply With Quote
 
Yousuf Khan
Guest
Posts: n/a
 
      15th Aug 2004
Tony Hill wrote:
> Correct again. The optimizations in GCC are not as good as those in
> Intel's compiler, though the difference is generally not huge. Take a
> look at the results AMD published for their 'A4800' systems. These
> are a bunch of Opteron 144 (1.8GHz) processors running under a variety
> of different OSes and using different compilers. The fastest results
> they achieved was 1095 using Win2K3 (32-bit OS) + Intel's (32-bit)
> compiler. For comparison, SuSE 8 for AMD64 (64-bit OS) + GCC 3.3
> (64-bit) they managed 1045, and with SuSE 8 for x86 (32-bit OS) + GCC
> 3.3 for x86 (32-bit compiler) they turned in a score of 960.
>
> So, in the end AMD showed an 8.8% improvement by going from 32 to
> 64-bit code, but they saw a 14% improvement going from Linux + GCC
> (32-bit ) to Windows + Intel C (also 32-bit).


Cool, but I wonder why AMD submitted the scores with the Intel 32-bit
compiler and a 32-bit OS, rather than a 64-bit OS with the 64-bit Pathscale
or PGI compilers? These two companies seem to have designed themselves
completely for AMD64, which I'm completely certain the Intel compilers
aren't.

> With AMD64 the extra registers have managed to improve the performance
> enough that they not only negate this performance loss, but turn it
> into a 5-10% performance gain on average. Not bad at all for a fairly
> small cost in die space and virtually no changes to the instruction
> set. FWIW the reason why AMD only went to 16 registers (still a
> pretty low number as compared to most modern processors) is that this
> is the most that they could squeeze into the x86 instruction set
> without making fairly major changes (they did a pretty damn good job
> of this, obviously they actually put some thought into how to extend
> x86 to 64-bits as naturally as possible).


How do we know that the extra performance isn't due to built-in memory
controller and branch prediction?

Yousuf Khan


 
Reply With Quote
 
RusH
Guest
Posts: n/a
 
      15th Aug 2004
Grumble <(E-Mail Removed)> wrote :

> In short, I am wondering how much improvement the 8 additional
> GPRs and 8 additional media registers bring...


not to much in Intel design i guess

http://www.anandtech.com/linux/showdoc.aspx?i=2163


Pozdrawiam.
--
RusH //
http://randki.o2.pl/profil.php?id_r=352019
Like ninjas, true hackers are shrouded in secrecy and mystery.
You may never know -- UNTIL IT'S TOO LATE.
 
Reply With Quote
 
Tony Hill
Guest
Posts: n/a
 
      16th Aug 2004
On Sun, 15 Aug 2004 07:46:17 GMT, "Yousuf Khan" <(E-Mail Removed)>
wrote:
>Tony Hill wrote:
>> So, in the end AMD showed an 8.8% improvement by going from 32 to
>> 64-bit code, but they saw a 14% improvement going from Linux + GCC
>> (32-bit ) to Windows + Intel C (also 32-bit).

>
>Cool, but I wonder why AMD submitted the scores with the Intel 32-bit
>compiler and a 32-bit OS, rather than a 64-bit OS with the 64-bit Pathscale
>or PGI compilers? These two companies seem to have designed themselves
>completely for AMD64, which I'm completely certain the Intel compilers
>aren't.


Both of these compilers are still fairly new and they still are not as
fast as Intel's x86 compilers for integer code. Sun submitted some
SPEC CINT results using the Pathscale compiler, and they only managed
a score of 1437/1584 (base/peak) with an Opteron 250 while AMD managed
a score of 1566/1655 with an Opteron 150 using Intel's compiler.
What's more, Sun still had to resort to using GCC for one of their
tests as it was 20% faster on that test than PathCC.

On the floating point side of things though, it's a different story.
Sun's Opteron systems turns in a VERY respectable 1637/1787
(base/peak) score using a combination of GCC, PGI and Pathscale's
compilers. This puts them just about on-par with IBM's Power4 chip,
not bad for a processor that sells for about 1/10th the cost.

>> With AMD64 the extra registers have managed to improve the performance
>> enough that they not only negate this performance loss, but turn it
>> into a 5-10% performance gain on average. Not bad at all for a fairly
>> small cost in die space and virtually no changes to the instruction
>> set. FWIW the reason why AMD only went to 16 registers (still a
>> pretty low number as compared to most modern processors) is that this
>> is the most that they could squeeze into the x86 instruction set
>> without making fairly major changes (they did a pretty damn good job
>> of this, obviously they actually put some thought into how to extend
>> x86 to 64-bits as naturally as possible).

>
>How do we know that the extra performance isn't due to built-in memory
>controller and branch prediction?


Err.. it's not like AMD turns those features off in 32-bit mode on
their Athlon64 and Opteron chips!

-------------
Tony Hill
hilla <underscore> 20 <at> yahoo <dot> ca
 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
XP PRO Registers too much RAM! GaryB Windows XP 4 22nd Oct 2009 04:07 PM
registers DNS Microsoft Windows 2000 DNS 1 8th Jan 2007 07:06 PM
Registers Rolando São Marcos Microsoft Access ADP SQL Server 1 7th Jul 2005 06:09 PM
Registers Rolando São Marcos Microsoft Access 1 7th Jul 2005 06:09 PM
MMX registers Nak Microsoft VC .NET 4 7th Jun 2004 06:01 PM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 05:00 AM.