Pentium M desktops ???

K

KR Williams

Bitstring <[email protected]>, from the


Actually it does rather well (2x as good as an Opteron or Athlon64)
running Prime95 LL tests, but only because AMDs SSE2 has some sort of
implementation glitch that means you can't process at the speed you
ought be able to.

Ok, let me modify my statement a little: If I were doing only
video encoding or esoteric bench marking, I'd buy a P4. For
anything else, forget it. ;-)
 
K

KR Williams

Exactly what I was trying to say. I see the Pentium4 mostly
as a throwback to the original Pentium (P5 core).

I don't see how the P4 is anywhere close to either the P5 or P6.
The P5 was more CISCy than either the P6 or P4 (interesting
numbers ;-).
Odlly, the Pentium4 may not perform too badly because compiler
technology lags horribly and alot of apps (MS-Office?) are
still compiled on older compilers that optimize only for two
exec pipelines.

Sheer MHz showing through the ugliness. ;-)
 
G

GSV Three Minds in a Can

from the said:
Ok, let me modify my statement a little: If I were doing only
video encoding or esoteric bench marking, I'd buy a P4. For
anything else, forget it. ;-)

Yeah, I'd agree with that.

Actually the P4 isn't all that =bad compared to an AMD processor (pick
any one), until you plug 'price' into the equation. It's just that with
the memory bandwidth, and the core clock rates, it really ought be
better, and it isn't.
 
K

KR Williams

Yeah, I'd agree with that.

Actually the P4 isn't all that =bad compared to an AMD processor (pick
any one), until you plug 'price' into the equation. It's just that with
the memory bandwidth, and the core clock rates, it really ought be
better, and it isn't.

People keep talking "bandwidth" as if "latency" didn't matter.
AMD's integration of the memory controller is so obvious I could
spit. Yet people still just don't get it. The North-Bridge is
dead, why are we still doing the stupid?
 
F

flekso

Robert Redelmeier said:
Actually, I believe the Pentium4 was done as a rush job,
on the cheap after it became apparant that ia64 (aka
Itanium) would not take over. IMHO it's an original
Pentium plus SSE2, deeply pipelined for inflated clocks.

Everybody's using this theory of more pipelines = more Hz, but can someone
please shed some physical light on this topic.
What i know about pipelines is from AoA book and web(instruction stages =
fetch, decode, ip++ ...), but i really can not realate that to any kind of
speed increase, plus where do they come up with 20+ stages in Prescott, i
mean there's only so much stuff one instruction needs/can do.
 
F

flekso

Anonymous Joe said:
Robert Redelmeier said:
Actually, I believe the Pentium4 was done as a rush job,
on the cheap after it became apparant that ia64 (aka
Itanium) would not take over. IMHO it's an original
Pentium plus SSE2, deeply pipelined for inflated clocks.

The Pentium-M is little more than the venerable P6 core,
tweaked and clocked higher on smaller processes.

-- Robert

It does seem as though the P4 is just that. A P3 with SSE2, with very long
pipelines (extended further thanks to Prescott) for a hyper-inflated clock
speed that eventually is pretty decent, but only when you get to a
ridiculous level of clock speed, and some of that is due to the quadrupled
bandwidth bus speed combined with the memory speeds (dual channel anyways).
As the P4 increases in clock speed, so does the L1/L2 cache, which can only
help but improve performance further.

What strikes me is that a P4 @ 3GHz is generally on par with an Athlon
3000+, which depending on the bus speed chosen is either 2.16GHz (333Mhz
bus) or 2.1GHz (400Mhz bus). I forget off-hand how deep the P4 pipeline is,
but is something like 24, isnt it? The Athlon is something like 12, or 15.
Either way, the numbers are off, but it still is rather close. The P4 has
L1 & L2 cache running at 3GHz, while the Athlon's is about 66% of that, but
more plentiful. The bus bandwidth of the P4 is going to be either
4.27GB/sec or 6.4GB/sec (533 or 800MHz bus [yet it is really 133 or
166MHz]). Yet the Athlon is using a 3.2GB/sec bus. As for memory, the most
you can get out of the Athlon is the 3.2GB/sec (whether you use P3200 RAM,
any speed dual channel RAM, even PC3200), but with P4 you have a shot at
getting a theoretical of 6.4GB/sec (dual channel PC3200).

All this combined, things sure look favorable for P4. It has so much more
bandwidth in every area, cache, bus, and RAM. Yet, how come with a 900MHz
core clock lead, it is only able to tie the Athlon? It seems like it is
using all the bandwidth and wasting it. If AMD could get the sort of
bandwidth that Intel has, I would imagine that the P4 would need about a
1200MHz or more head start to start being comparable.

For anybody who cares, I do use AMD, so if you want to say I'm promoting AMD
unfairly or whatever, that's wrong, I'm simply showing that Intel isn't
efficient.

In my opionion(2 ?cents worth) that's exactly the opposite of bragging
rights, those numbers. When we consider the problems they constantly bring
out related to Moore's law and heat dissipation and all that stuff related
to miniaturization reaching it's final barrier, every FLOP/Hz, B/s and
transistor saved should mean more than ever and P4 design looks right in the
eye of the problem and smugs: yes, but my pipe is longer...
 
K

KR Williams

Everybody's using this theory of more pipelines = more Hz, but can someone
please shed some physical light on this topic.
What i know about pipelines is from AoA book and web(instruction stages =
fetch, decode, ip++ ...), but i really can not realate that to any kind of
speed increase, plus where do they come up with 20+ stages in Prescott, i
mean there's only so much stuff one instruction needs/can do.

It's really rather simple; the less work done in each clock
cycle, the faster that clock can be run. The Intel developer
site has the descriptions and clock counts (length) of each pipe.
 
G

GSV Three Minds in a Can

Bitstring <[email protected]>, from the
wonderful person KR Williams said:
It's really rather simple; the less work done in each clock
cycle, the faster that clock can be run. The Intel developer
site has the descriptions and clock counts (length) of each pipe.

A real simple (too simple, but WTF) analogy .. it takes what, 30 seconds
maybe, to run round a baseball diamond. However just getting from base
to base you can do in ~10 seconds. If you had 16 or 32 bases instead of
the 4 (including home plate) you could get from base to base in maybe a
second or two.

You wouldn't get runs completed any faster (if you decided to =stop= at
each base, it'd actually take way longer), but you sure could brag about
an amazing clock speed, getting from one base to the next. 8>.

Which just demonstrates the stupidity of trying to measure MPH or BHP
using the rev counter.
 
K

KR Williams

Bitstring <[email protected]>, from the


A real simple (too simple, but WTF) analogy .. it takes what, 30 seconds
maybe, to run round a baseball diamond. However just getting from base
to base you can do in ~10 seconds. If you had 16 or 32 bases instead of
the 4 (including home plate) you could get from base to base in maybe a
second or two.

You wouldn't get runs completed any faster (if you decided to =stop= at
each base, it'd actually take way longer), but you sure could brag about
an amazing clock speed, getting from one base to the next. 8>.

Which just demonstrates the stupidity of trying to measure MPH or BHP
using the rev counter.

Or nine women can have nine babies in nine months, but each one
still takes nine months. ;-)
 
F

Felger Carbon

KR Williams said:
Or nine women can have nine babies in nine months, but each one
still takes nine months. ;-)

That's the difference between throughput (bandwidth) and latency,
Keith. ;-)
 
K

KR Williams

That's the difference between throughput (bandwidth) and latency,
Keith. ;-)

Sorta like base-runners, eh? ...though I know you don't care a
wit about stick-ball (nor do I, actually). When's New England
going to kick SF's sorry butt again, eh? ...gotta be soon now.
;-)
 
D

daytripper

Sorta like base-runners, eh? ...though I know you don't care a
wit about stick-ball (nor do I, actually). When's New England
going to kick SF's sorry butt again, eh? ...gotta be soon now.
;-)

The Pats STILL haven't lost a game since last October ;-)
 
K

KR Williams

The Pats STILL haven't lost a game since last October ;-)
....and won't again for three more months, maybe they can make it
a complete year! ...or two. ;-)
 
G

G

GSV Three Minds in a Can said:
Bitstring <[email protected]>, from the


A real simple (too simple, but WTF) analogy .. it takes what, 30 seconds
maybe, to run round a baseball diamond. However just getting from base
to base you can do in ~10 seconds. If you had 16 or 32 bases instead of
the 4 (including home plate) you could get from base to base in maybe a
second or two.

You wouldn't get runs completed any faster (if you decided to =stop= at
each base, it'd actually take way longer), but you sure could brag about
an amazing clock speed, getting from one base to the next. 8>.

Which just demonstrates the stupidity of trying to measure MPH or BHP
using the rev counter.


I think that's a very good analogy. It also points out the theoretical
*BENEFIT* to long pipelines. If you can keep a runner on every base
all the time, you've got a person crossing the plate more frequently,
which represents more work getting done.

I don't think the higher frequency, or even the higher heat generated
is the real problem. As long as you can cool it reasonablty well, why
not design right up to the thermal threshhold? As long as you're not
talking blades, notebooks, or making the room shake with excess fan
noise, I'd rather have the PC doing more work when it's running. I
don't even consider high IPC a strict definition of "efficiency". IMO,
the real problems is are:

1) The P4 *DOESN'T* keep a runner on every base continually. Maybe
this is just Intel's implementation. Maybe they just didn't do as good
of a job as they could have. Or, maybe it's truely an intractable
problem with long pipelines. My guess is that it's probably both. But
I don't think it was WRONG to go that route when they decided to. I
also think Hyperthreading didn't help as much as they had initially
hoped it would.

2) If it takes more transistors to implement, you have to question
whether those transistors could be spent other ways that would
increase performance without the heat penalty. But I also firmly
beleive that you don't get something for nothing. Short of a temporary
performance advantage from a good idea until it gets copied by
everyone (like the on-board memory controller), the chip designers are
all working with the same transistor budget.

Put another way, I think if a Prescott successor had:

- An on-board memory controller
- Used the Pentium M's micro-op fussion
- Was optimized to run just a little bit cooler
- Had the 64-bit extension enabled

It would be a great desktop CPU. This isn't to say that the Athlon64
isn't a BETTER cpu right now (it is). But I'd rather have a Prescott
like THAT available in late 2004 than have to wait for the dual-core
64-bit desktop 2ghz Pentuim-M in late 2005 (or even later) to get the
same performance.
 
N

Nate Edel

G said:
It would be a great desktop CPU. This isn't to say that the Athlon64
isn't a BETTER cpu right now (it is). But I'd rather have a Prescott
like THAT available in late 2004 than have to wait for the dual-core
64-bit desktop 2ghz Pentuim-M in late 2005 (or even later) to get the
same performance.

Heck, how about just a 2.2ghz or so single Pentium M; given the numbers I've
seen for the 1.7, a 2.2ghz Banias would be a very competitive CPU with
anything Intel or AMD is selling now.
 
R

Rob Stow

G said:
I think that's a very good analogy. It also points out the theoretical
*BENEFIT* to long pipelines. If you can keep a runner on every base
all the time, you've got a person crossing the plate more frequently,
which represents more work getting done.

It is a good analogy in the sense that the P4 puts lots of runner
on base - and actually provides extra bases for those base runners
to stand on. However, the analogy falls apart in other ways - you
could say that the AthlonXP and AMD64 processors put fewer men on
base but do a *much* better when it comes to actually driving in some
of those baserunners. In the Intel vs AMD baseball game, Intel
loses because their batting average with men on base is pretty shitty.
It is the number of runs scored that counts at the end of the game,
not the number of men you had on base. A stranded base runner
counts for nothing.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top