The coming of the Pentium 4 600-series

Y

Yousuf Khan

The new microprocessors will be based on the Prescott 2M core that brings 2MB L2 cache, Intel EM64T, Enhanced Intel SpeedStep Technology (EIST) as well as Execute Disable Bit (EDB) capability. The chips will be clocked at 3.20GHz, 3.40GHz, 3.60GHz and 3.80GHz and will be intended for infrastructure supporting 800MHz Quad Pumped Bus and TDP of up to 115W.

X-bit labs - Hardware news - Intel Preps Onslaught with New Pentium 4
Processors 600
http://www.xbitlabs.com/news/cpu/display/20050103132921.html
 
F

Felger Carbon

Yousuf Khan said:
Seems like adding more L2 cache is starting come its proverbial point of
diminishing returns.

Yousuf, the problem is that Prescott's L2 _doubled_ the L2 latency
over the previous 130nm generation. I've never heard an explanation
for this disaster. Yes, _doubled_. ;-(
 
K

keith

Yousuf, the problem is that Prescott's L2 _doubled_ the L2 latency
over the previous 130nm generation. I've never heard an explanation
for this disaster. Yes, _doubled_. ;-(

Did you ever find ot with certainty whether or not they added back in the
FXU multiplier and barrel-shifter? That issue still seems to be up in the
air.
 
Y

Yousuf Khan

Felger said:
point of



Yousuf, the problem is that Prescott's L2 _doubled_ the L2 latency
over the previous 130nm generation. I've never heard an explanation
for this disaster. Yes, _doubled_. ;-(

I wonder if it's got something to do with the doubled transistor count?
Twice the transistors to go through, twice the distance to travel
through. Even with a die shrink, it's still twice the distance per
transistor.

Yousuf Khan
 
G

Grumble

Felger said:
Yousuf, the problem is that Prescott's L2 _doubled_ the L2 latency
over the previous 130nm generation. I've never heard an explanation
for this disaster. Yes, _doubled_. ;-(

Not quite.

Northwood = ~19 cycles
Prescott = ~28 cycles

L1 latency, however, I believe went from 2 to 4 cycles.
 
C

chrisv

(Outhouse-induced mangling fixed)
Yousuf, the problem is that Prescott's L2 _doubled_ the L2 latency
over the previous 130nm generation. I've never heard an explanation
for this disaster. Yes, _doubled_. ;-(

Well, I think Yousuf is right about the diminishing returns of larger
cache size, as well. Seems to me the "paltry" 256k of a P3 serves
quite well for the job. The cost/performance trade-off of the huge
caches seems suspect, even if you don't factor-in things like latency
increases created by the larger cache size.
 
F

Felger Carbon

keith said:
Did you ever find ot with certainty whether or not they added back in the
FXU multiplier and barrel-shifter? That issue still seems to be up in the
air.

Yes, I did find out (honest), but I quickly lost interest in Prescott
when I discovered it did not increase performance over the previous
generation. So I am no longer certain, but I seem to remember that it
did include those improvements. But the performance improvement
provided by those two items is completely swamped by the lousy L2
latency.

Keith, if you ever discover _why_ the lousy L2 latency, please ping
me?
 
F

Felger Carbon

Yousuf Khan said:
I wonder if it's got something to do with the doubled transistor count?
Twice the transistors to go through, twice the distance to travel
through. Even with a die shrink, it's still twice the distance per
transistor.

Yousuf, if the cache transistor count is doubled the shrink will
result in L2 cache _area_ that's exactly the same as the old
generation, and hence the same distances. Sorry.
 
R

Rob Stow

Felger said:
Yousuf, if the cache transistor count is doubled the shrink will
result in L2 cache _area_ that's exactly the same as the old
generation, and hence the same distances. Sorry.

And it must be mentioned that with a 130 nm process AMD went
from: 256 KB L2 Pre-Barton Athlon XP's, L2 latency = 28 clocks
to: 512 KB L2 Barton Athlon XP L2 latency = 23 clocks
to: 1024 KB L2 AMD64 L2 latency = 20 clocks

No idea what is going to happen to the L2 latency on the 90 nm AMD64 chips.
I've heard that both cache management and the memory controller have
been tweaked, but I haven't read any latency numbers.
 
K

keith

Yes, I did find out (honest), but I quickly lost interest in Prescott
when I discovered it did not increase performance over the previous
generation. So I am no longer certain, but I seem to remember that it
did include those improvements. But the performance improvement
provided by those two items is completely swamped by the lousy L2
latency.

Keith, if you ever discover _why_ the lousy L2 latency, please ping
me?

You keep your ear to the same rumors I do. I certainly am not likely to
see any such information "officially". Since Andy gave me a copy of his
book, we haven't talked much. ;-)
 
B

Bill Davidsen

chrisv said:
(Outhouse-induced mangling fixed)




Well, I think Yousuf is right about the diminishing returns of larger
cache size, as well. Seems to me the "paltry" 256k of a P3 serves
quite well for the job. The cost/performance trade-off of the huge
caches seems suspect, even if you don't factor-in things like latency
increases created by the larger cache size.
The large cache size really helps reduce bus contention in SMP
configurations. It feels as though lower latency would be better than
size at these low speeds, but I bet Intel did simulations and chose more
over faster. Why they had to choose I can't even guess!
 
F

Felger Carbon

Bill Davidsen said:
The large cache size really helps reduce bus contention in SMP
configurations. It feels as though lower latency would be better than
size at these low speeds, but I bet Intel did simulations and chose more
over faster. Why they had to choose I can't even guess!

Amen, bro! ;-)
 
C

chrisv

Yousuf, if the cache transistor count is doubled the shrink will
result in L2 cache _area_ that's exactly the same as the old
generation, and hence the same distances. Sorry.

But skinnier wires and thus inferior RC characteristics, possibly?
 
K

Keith R. Williams

But skinnier wires and thus inferior RC characteristics, possibly?

Or they tool the old macros, plopped down twice as many and wired them
up with whatever glue they needed to get it to work. That would
account for perhaps two clocks (maybe more with the tags), but the
rest???
 
J

James Boswell

Felger said:
Yes, I did find out (honest), but I quickly lost interest in Prescott
when I discovered it did not increase performance over the previous
generation. So I am no longer certain, but I seem to remember that it
did include those improvements. But the performance improvement
provided by those two items is completely swamped by the lousy L2
latency.

Keith, if you ever discover _why_ the lousy L2 latency, please ping
me?

Remember that the L1 latency in Prescott is stupidly high as well

if they could tweak the L1/L2 latencies back down to Northwood levels, it'd
probably start being a very very impressive chip, even with it's stupidly
high power needs.


-JB
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top