Intel COO signals willingness to go with AMD64!!

R

Rmyers1400

Subject: Re: Intel COO signals willingness to go with AMD64!!
From: Keith R. Williams (e-mail address removed)
Date: 2/8/2004 12:28 PM Eastern Standard Time
Message-id: <[email protected]>




Ah, we shouldn't be using one argument to obfuscate another than,
eh?

You are trying to turn one argument into two separate arguments to obscure the
fact that the way that you as an engineer who has supposedly seen and done it
all wants to remember history doesn't correspond to reality.
Nope.

Problem: Memory wall (latency)
Solution: Caches (many of 'em), Branch prediction, speculative
execution, speculative loads, swamp with bandwidth, pray.
Did you *read* my post or the reference it contained? Circa 1994 people _knew_
about cache and they were still worried about the memory wall because of these
things called obligatory cache misses. Now I know that none of these people
were probably real engineers because they didn't work on what you regard as
real computers. Nevertheless, they wrote papers and managed to get them
published in refereed journals. I'm sure that if only their had been a real
(mainframe) engineer in their midst, he or she could have straightened things
out for those poor, misguided slobs, but history didn't go down that way. They
knew about all the tricks you mentioned, and they still expected execution time
to be dominated by first reference misses.

Problem: How to reduce pipeline stalls
Solution: OoO
And what is it, exactly, that you think causes pipeline stalls? Contention for
execution resources is not a big contributor as far as I know. If that were
the big issue, it would be a very easy problem to fix.

Pipelines get stalled because of branch mispredicts and because of cache misses
that can't be hidden. Branch misprediction really is a separate issue. The
primary technique for hiding so-called compulstory misses is...OoO.
The "existence theorem" is in my favor. It was, thus that is the
way it is.
The only existence theorem I see being proved here is of a particularly
tiresome brand of engineer that I generally try to avoid if I can help it. I
don't even know why I tried.
TP is all mainframes do? I don't *think* so. When you're theory
starts out with a false premiss it's time to junk the theory.
Transaction processing doesn't have to be *all* that mainframes do for people
to have figured out well ahead of the 1996 paper that OoO wouldn't be a big
help with OLTP.

The real reason they didn't figure it out is because, prior to the nineties, no
one, including mainframe manufacturers, could imagine throwing the number of
transistors at the problem that is necessary to make OoO really work.

"Everything has already been thought of" probably is true in your case, so you
should probably go on believing that it's true for everybody. Life will be
more emotionally satisfying for you that way.

RM
 
K

Keith R. Williams

You are trying to turn one argument into two separate arguments to obscure the
fact that the way that you as an engineer who has supposedly seen and done it
all wants to remember history doesn't correspond to reality.

Nope, every time you discuss an issue you bring in extraneous
issues, not under discussion. Certainly you can win any
argument, in your mind anyway, by shifting the issues at hand.
Did you *read* my post or the reference it contained? Circa 1994 people _knew_
about cache and they were still worried about the memory wall because of these
things called obligatory cache misses.

Sure, they didn't expect caches the size of main memory, at the
time. Sure, misses cannot be avoided, so what's your point? The
fact is tht mainframes had a memory wall in the '70s, at least.
Their main memory was across the room! Caches, my son, caches!
Now I know that none of these people
were probably real engineers because they didn't work on what you regard as
real computers.

Engineers, sure. You really don't think that they were going to
acknowledge something already *well* known in '94 (please) in an
academic paper? Good grief! "Publish or perish" doesn't mean
what's published is unknown.
Nevertheless, they wrote papers and managed to get them
published in refereed journals. I'm sure that if only their had been a real
(mainframe) engineer in their midst, he or she could have straightened things
out for those poor, misguided slobs, but history didn't go down that way. They
knew about all the tricks you mentioned, and they still expected execution time
to be dominated by first reference misses.

No, most mainframers weren't bothering with what's being
published. More ... "that's nice".

Are you *really* telling me that you believe the "memory wall" is
a post '94 artifact? ...and no one used caches before to try to
alleviate this constriction?

Robert, Robert, Robert, I had a PDP-11 in 1980 that had a cache
(plugged into the UniBus). ...and the PDP-11 was by no stretch
of the imagination bleeding edge then. Memory latency was a big
issue *long* before 1994, as amazing as that apparently is to
you.
And what is it, exactly, that you think causes pipeline stalls? Contention for
execution resources is not a big contributor as far as I know. If that were
the big issue, it would be a very easy problem to fix.
Pipelines get stalled because of branch mispredicts and because of cache misses
that can't be hidden. Branch misprediction really is a separate issue. The
primary technique for hiding so-called compulstory misses is...OoO.

Lucy! Please 'splain register renaming!
The only existence theorem I see being proved here is of a particularly
tiresome brand of engineer that I generally try to avoid if I can help it. I
don't even know why I tried.

Nope. ...been there. You're a particularly tiring brand of
wannabe who pretends to know all. ...particularly what went
before you were born, apparently.
Transaction processing doesn't have to be *all* that mainframes do for people
to have figured out well ahead of the 1996 paper that OoO wouldn't be a big
help with OLTP.

Then there is no gain from OoO because one application class
doesn't benefit (which I won't concede, since I'm not a systems
programmer)? The fact is that OoO isn't new, and has been with us
for some time. Certianly OoO has progressed tremendously with
infinite transistor budgets, but it's been around for some long
time.
The real reason they didn't figure it out is because, prior to the nineties, no
one, including mainframe manufacturers, could imagine throwing the number of
transistors at the problem that is necessary to make OoO really work.

You'd be wrong to assume this. Certinaly transistor budgets were
constrained, but there was all sorts of these "tricks" played to
increase performance.
"Everything has already been thought of" probably is true in your case, so you
should probably go on believing that it's true for everybody. Life will be
more emotionally satisfying for you that way.

Ohh, an Ad Hominum! I knew you could do it!
[/QUOTE]
 
R

Rmyers1400

Subject: Re: Intel COO signals willingness to go with AMD64!!
From: Keith R. Williams (e-mail address removed)
Date: 2/8/2004 10:32 PM Eastern Standard Time
Message-id: <[email protected]>



Nope, every time you discuss an issue you bring in extraneous
issues, not under discussion. Certainly you can win any
argument, in your mind anyway, by shifting the issues at hand.
Okay. It may be that you want to use the term "memory wall" in one way and I
want to use it in another. Maybe in some other lifetime, people used the
phrase memory wall to describe a related but _very_different_ phenomenon. If
so, it is entirely an accident. The way that the term "memory wall" is
currently used, and the way it was discussed in the Wulf paper, is to describe
an exponential divergence between the cycle time of processors and memory
latency. This is not a phenomenon that can be related to "the memory being in
a different cabinet" any more than it can be related to the fact that it takes
forever to get information from a hard disk, or that it takes longer to get
information from non-local memory in a ccNUMA setup than it does from local
memory. It is related to the fact that processor speed has been growing much
faster than memory speed. Experience with mainframes, other examples of memory
hierachies, etc., are suggestive, but they do not explain the phenomenon that
has come to dominate processor architecture for the last decade.

*Nor* can you make any sense of what actually happened (not what people thought
would happen) without invoking OoO. You might want it to be a separate topic,
but it isn't.

Engineers, sure. You really don't think that they were going to
acknowledge something already *well* known in '94 (please) in an
academic paper? Good grief! "Publish or perish" doesn't mean
what's published is unknown.
When I want to prove my point, I go to the written record and produce
citations. When you want to prove your point, you just repeat, "Take my word
for it. We knew all that stuff." I don't know how to argue against that kind
of logic, except to say that "we knew it all along" is a phrase that sets of my
bull-felgerkarb detector.
No, most mainframers weren't bothering with what's being
published. More ... "that's nice".
Yumph. I haven't done very much publishing myself. I worked in a field where
publishing was, um, problematical. I'm kind of embarrassed at the stuff that's
out there that has my name on it because it's there only because someone other
than me felt that a paper just had to be produced. Only in one case did I
write the paper myself and then only under intense pressure, and I'm not proud
of the result.

Other people haven't felt quite the way I have about publishing and I can find
papers that describe more or less the true state of the art at a particular
time. You *would* respond "That's nice," to those papers because it wouldn't
be at all obvious why the subject being talked about was even of interest, and
I wouldn't be free to explain.

The fact remains, though, that there is a written record that approximates what
people actually knew at any given time, and I could find it if pressed. I
don't think that building mainframes has a corner on people who are too busy
getting things done to bother writing papers about it. The papers get written,
anyway, even if not by the people doing the actual work and even if not with
the most insightful information.
Are you *really* telling me that you believe the "memory wall" is
a post '94 artifact?

Plainly, some degree of discussion had been going on for some time, because
there is a kind of _mea_culpa_ in the Wulf paper to the extend that, "I'm sorry
I didn't get this sooner." But do I think what was being talked about was a
natural extension of your experience with the PDP-11? No, I do not.
...and no one used caches before to try to
alleviate this constriction?
What is very clear is that caches were thought of in terms of data re-use, not
in terms of hiding latency. Now, the idea of _pre-fetching_ has a history all
its own. It's not a history I know much about and not one that I want to take
the time to explore right now.

The _new_ idea is to use the code itself as a pre-fetch mechanism, which is
what OoO and speculative slices do. If you want to tell me that that's an old,
old, old idea, too, please do educate me, but please come equipped with
evidence other than your apparently all-encompassing memory.

Lucy! Please 'splain register renaming!
To quote Mike Haertel (late of Intel, now of AMD), OoO doesn't get you far
without register renaming. It allows partially-completed execution paths to
use the same register name without using the same physical register.

Nope. ...been there. You're a particularly tiring brand of
wannabe who pretends to know all. ...particularly what went
before you were born, apparently.
I think I was born before you were, and there isn't a _great_ deal that
happened with computers before I was born. What _did_ happen is very well
known.

As to my pretending to know all, as I have said in this very forum, in a post
addressed directly to you, that suggestion makes the hair stand up on the back
of my neck. One thing that has grown without fail as I have gotten on in years
is my awareness of the volume of knowledge that I have not mastered and never
will master.

As to my being a wannabe, aside from the demeaning way in which you use the
term, is there anything wrong with wanting to grow?

Then there is no gain from OoO because one application class
doesn't benefit (which I won't concede, since I'm not a systems
programmer)?

Processors of all classes and all applications could gain at least a little bit
from OoO. Okay, with a statement like that, there's bound to be an exception.
I just can't think of any at the moment.
The fact is that OoO isn't new, and has been with us
for some time. Certianly OoO has progressed tremendously with
infinite transistor budgets, but it's been around for some long
time.

I don't think I have every made a claim as to when OoO was first introduced,
because I don't have a clue.
You'd be wrong to assume this. Certinaly transistor budgets were
constrained, but there was all sorts of these "tricks" played to
increase performance.

Then Patterson should have known what he claims is a surprise in his 1996
paper.
Ohh, an Ad Hominum! I knew you could do it!

Yes, and shame on me, but you have royally fried me, and frankly, I think I
have offered you sound advice. You think and talk like someone whose career is
completely behind him.

RM
 
G

George Macdonald

A spreadsheet, of course, is nothing more than an interpreted
functional programming language. I have *zero* knowledge of how well
Itanium does with interpreters, but you just added to my list of
things to do. :-/.

Sure but the Solve is where things get interesting.
*Compiled* compute-intensive applications can benefit spectacularly
from training and re-compilation. That's not a big market, but it
happens to be the business I'm in. Thus, my interest, at least
initially. People who want to do that kind of work are going to have
to own a compiler.

There *are* compiled compute intensive commercial packages. IME most
corporations with such an option do not want to pay for/employ the level of
expertise for the nuts and bolts of numerically intensive computing
FFTW seems to do pretty well without help from a compiler.

Intel and Microsoft both have teams of compiler experts working on
this problem full time. My read is that Microsoft loves Itanium in a
way they are never going to love x86-64, and they are certainly not
going to let Intel allow the whole world to write better software with
a compiler they don't own and control. Itanium has done wonders for
the world of compiler development, as I read it. In some ways, the
compiler horse race is at least as interesting as the processor horse
race.

The fact that M$ has developed a x86-64 WinXP seems evidence to me of,
extraordinary for M$, enthusiasm... even further accentuated by the trail
of destruction left by abandoned Risc WinXXs.
I think the world is moving toward BLAS as a lingua franca for
compute-intensive applications written by non-specialists, and it's
not hard to imagine building a pretty smart BLAS and LAPACK for
Itanium. Two FFT packages out there, FFTW and UHFFT have lot of
adapatability built-in, so you don't have to alot of weenie work when
you move your FFT-based application to a new machine. I can see that
general approach working pretty well for Itanium and frequently-used
compute-intensive software.

Different stuff. I'm talking about things like CPLEX, Xpress, OSL etc...
MPSX on IBM mainframes They go to a level of performance which the
libraries can never match and their users do not want to get involved in
low-level tool-kits. There are other specialist packages in other domains.

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
 
R

Rmyers1400

Subject: Re: Intel COO signals willingness to go with AMD64!!
From: George Macdonald fammacd=!SPAM^[email protected]
Date: 2/9/2004 1:20 AM Eastern Standard Time
Message-id: <[email protected]>
Sure but the Solve is where things get interesting.
Ayup. That's why I think Itanium is potentially a godsend for a company like
M$. Smart=Superfast. Naive=Might as well have used a Via C5. Commercial
vendors who figure that out will do well in general (assuming the processor
survives, of course).
There *are* compiled compute intensive commercial packages. IME most
corporations with such an option do not want to pay for/employ the level of
expertise for the nuts and bolts of numerically intensive computing

Nobody seems to want to. It's a little scary, because computer programs become
exactly what people visualized when IBM's white shirt and tie computer centers
first made an impression on the public consciousness: the computer as
inscrutable oracle.

I keep thinking that people will wake up and realize that the only thing left
that potentially *isn't* a commodity is extracting meaning from raw
information. If your competitor can do it just as well as you can with the
same plug and play program, then it's just a commodity again.

The fact that M$ has developed a x86-64 WinXP seems evidence to me of,
extraordinary for M$, enthusiasm... even further accentuated by the trail
of destruction left by abandoned Risc WinXXs.

M$ is in no position to be choosy, but there is potential margin in Itanium
software the same way there is potential margin in Itanium hardware that just
isnt going to be there with x86 of any flavor. Don't expect anybody (except
Intel) to blurt this out in a press release.

Different stuff. I'm talking about things like CPLEX, Xpress, OSL etc...
MPSX on IBM mainframes They go to a level of performance which the
libraries can never match and their users do not want to get involved in
low-level tool-kits. There are other specialist packages in other domains.

Seems to fit right into the Mandarin culture of mainframes. Dummies and
amateurs need not apply. Perfect market for Itanium.

RM
 
G

George Macdonald

Ayup. That's why I think Itanium is potentially a godsend for a company like
M$. Smart=Superfast. Naive=Might as well have used a Via C5. Commercial
vendors who figure that out will do well in general (assuming the processor
survives, of course).

You lost me there. Do you know what a Solve is? "Commercial vendors" have
all been badly burned by the low volume Risc RS/6000, Alpha experience -
they won't get caught again. If it can be done on a x86-64 with 80%
performance of Itanium, and in many cases even better, there's no
discussion.
Nobody seems to want to. It's a little scary, because computer programs become
exactly what people visualized when IBM's white shirt and tie computer centers
first made an impression on the public consciousness: the computer as
inscrutable oracle.

I keep thinking that people will wake up and realize that the only thing left
that potentially *isn't* a commodity is extracting meaning from raw
information. If your competitor can do it just as well as you can with the
same plug and play program, then it's just a commodity again.

There are valuable skills involved in using the commercial software - you
can just as easily build a lousy model as a good one. You don't need to
employ a good auto designer to win races - just a very good driver.
M$ is in no position to be choosy, but there is potential margin in Itanium
software the same way there is potential margin in Itanium hardware that just
isnt going to be there with x86 of any flavor. Don't expect anybody (except
Intel) to blurt this out in a press release.

x86 is not going away though and its natural progression to x86-64 or AMD64
is assured - like it or not. If the volume is not there, which it wasn't
with the Alpha, MIPs, Risc/6000(PowerX) etc., M$ will get soured very
quickly in either case. There's no reason, that I see, to suppose that
Itanium will have any different a fate in the WinWorld - another part of
Intel's blunder... expecting that the PC CPU model of distribution, where
others take responsibility for the infrastructure, extrapolates to the high
end. IOW if you want to challenge IBM or even Sun or EMC, you have to go
toe to toe with them.
Seems to fit right into the Mandarin culture of mainframes. Dummies and
amateurs need not apply. Perfect market for Itanium.

You should not presume knowledge you don't have - none of the xFFT things
you mentioned are Mathematical Programming, not even vaguely. Heavy-handed
derision based on ignorance only reflects badly on you... and places in
doubt any respect you have gained here.

The thing is, you see, that so far, it doesn't... fit into Itanium -
results have been mediocre at best. Oh and those are serious optimization
packages for those with the skills to use them - they do things which
cannot be achieved by dabbling with libraries. There is no free lunch I'm
afraid.

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
 
K

Keith R. Williams

Okay. It may be that you want to use the term "memory wall" in one way and I
want to use it in another. Maybe in some other lifetime, people used the
phrase memory wall to describe a related but _very_different_ phenomenon. If
so, it is entirely an accident. The way that the term "memory wall" is
currently used, and the way it was discussed in the Wulf paper, is to describe
an exponential divergence between the cycle time of processors and memory
latency.

You really *are* dense, aren't you Robert? Memory in a frame
across the room *WAS* latency. Sheesh! You know, the speed of
light, and all that rot? It was many hundreds of microseconds to
milliseconds away! Yes, caches were indeed used to mitigate the
"memory wall".
This is not a phenomenon that can be related to "the memory being in
a different cabinet" any more than it can be related to the fact that it takes
forever to get information from a hard disk, or that it takes longer to get
information from non-local memory in a ccNUMA setup than it does from local
memory.

You're a fool, as I always expected. Memory across the room
isn't a "wall"? Sheesh, grow up and learn something from
history.
It is related to the fact that processor speed has been growing much
faster than memory speed. Experience with mainframes, other examples of memory
hierachies, etc., are suggestive, but they do not explain the phenomenon that
has come to dominate processor architecture for the last decade.

Sure it has, but the issues are *old*. Memory has never been as
fast as the processor, except for some early toys called
"microprocessors". Of course not everything is a microprocessor,
even "mainframes". The more things change, the more they stay the
same.
*Nor* can you make any sense of what actually happened (not what people thought
would happen) without invoking OoO. You might want it to be a separate topic,
but it isn't.

....want to shift more goal-posts? I've noticed you're the
master.
When I want to prove my point, I go to the written record and produce
citations. When you want to prove your point, you just repeat, "Take my word
for it. We knew all that stuff." I don't know how to argue against that kind
of logic, except to say that "we knew it all along" is a phrase that sets of my
bull-felgerkarb detector.

So be it. I've noted you don't like the old farts on CA or AFC
correcting your myopic view of history either.
Yumph. I haven't done very much publishing myself. I worked in a field where
publishing was, um, problematical.

Umm, as were most mainframers. You might want to search the
patent archives though (I haven't, but it sounds like a challenge
;-).

Plainly, some degree of discussion had been going on for some time, because
there is a kind of _mea_culpa_ in the Wulf paper to the extend that, "I'm sorry
I didn't get this sooner." But do I think what was being talked about was a
natural extension of your experience with the PDP-11? No, I do not.

I cannot believe even you're this lame. The PDP11 was not a
mainframe. It was an example to show that caches were even known
(and useful) to slugs like in-order, simple instruction-at-a-
time, minicomputers. Mainframes were fully pipelined long
before, with multiple threading, even.
What is very clear is that caches were thought of in terms of data re-use, not
in terms of hiding latency.

You're a fool! The difference is? Add in branch-prediction, and
the difference is? Add in multiple threads and the difference
is?
Now, the idea of _pre-fetching_ has a history all
its own. It's not a history I know much about and not one that I want to take
the time to explore right now.

Translation: I'm wrong and don't have the time to figure out the
tough-stuff. What about branch-prediction (the analog to pre-
fetching)? What about BTICs, RAW and all the rest of the caching
that's been going on for some time. Good grief!
The _new_ idea is to use the code itself as a pre-fetch mechanism, which is
what OoO and speculative slices do. If you want to tell me that that's an old,
old, old idea, too, please do educate me, but please come equipped with
evidence other than your apparently all-encompassing memory.

Nonsense! Branch hints are as old as you are. Hints for data
doesn't to be a big stretch, considering a von-Neuman
architecture. ;-)
To quote Mike Haertel (late of Intel, now of AMD), OoO doesn't get you far
without register renaming. It allows partially-completed execution paths to
use the same register name without using the same physical register.

Well, you've been paying attention to someone! Follow along a
little longer!
I think I was born before you were, and there isn't a _great_ deal that
happened with computers before I was born. What _did_ happen is very well
known.

You really need to get out more then. This stuff isn't new, and
wasn't to those who taught me. Sure there are twists, but it's
all caching that that's important (I.e. weak).
As to my pretending to know all, as I have said in this very forum, in a post
addressed directly to you, that suggestion makes the hair stand up on the back
of my neck. One thing that has grown without fail as I have gotten on in years
is my awareness of the volume of knowledge that I have not mastered and never
will master.

Your attitude on CA speaks for itself. You do the same on
..chips said:
As to my being a wannabe, aside from the demeaning way in which you use the
term, is there anything wrong with wanting to grow?

By proposing that you know and challenge? perhaps. Asking for
information and clarification, certainly not. Perhaps it's just
your abrasive know-it-all style and "prove to me I'm wrong" that
turns some off. Ok, I can do that game too, but I am willing to
be wrong (as was pointed out in another thread here).
Processors of all classes and all applications could gain at least a little bit
from OoO. Okay, with a statement like that, there's bound to be an exception.
I just can't think of any at the moment.


I don't think I have every made a claim as to when OoO was first introduced,
because I don't have a clue.

....but you claim it's somehow "new"? You continue to lose me.
.... or maybe I just get bored.
Then Patterson should have known what he claims is a surprise in his 1996
paper.

I haven't a clue what he knew. I'd be more impressed with your
position if Gene Ahmdal wrote a paper stating such.
Yes, and shame on me, but you have royally fried me, and frankly, I think I
have offered you sound advice. You think and talk like someone whose career is
completely behind him.

MOre ad Hominims.

Career, no. ...much to young (33 ;). First career, perhaps.
 
R

Rmyers1400

Subject: Re: Intel COO signals willingness to go with AMD64!!
From: George Macdonald fammacd=!SPAM^[email protected]
Date: 2/9/2004 5:18 PM Eastern Standard Time
Message-id: <[email protected]>



You lost me there. Do you know what a Solve is?

I have never actually used a spreadsheet in the fashion you are suggesting. I
know people who do. If I wanted to solve a constraint or optimization problem,
linear or non-linear programming, or anything like it, I'd do it in Fortran or
c using a toolkit. I might use Mathematica or Matlab. If I had to do alot of
it, I might learn Haskell. I would, of course, do whatever I do when doing
problems I don't ordinarily do, which is to download the ton of software that
can be had on the internet just for the trouble of looking.

As to the circumstances that would lead people to use a spreadsheet as anything
other than an input medium for other than the simplest of
computationally-intensive operations research problems, I'll admit that you've
got me. Most of the people I knew who did it either didn't know what they were
doing and/or couldn't afford anything more sophisticated.

That fact that people actually do such a thing strikes me as an opportunity for
a company like Microsoft, which, all external appearances notwithstanding, does
understand that when you come to math you can't fake it.

As to losing you, I'm not sure where the disconnect is. Itanium is one of the
crankiest processors ever to see daylight. When it's good, and it can be, it
can be very, very good. When it's bad, you would have been better off with a
far less expensive processor that gets somewhere near to its maximum
performance with far less effort than Itanium requires.
"Commercial vendors" have
all been badly burned by the low volume Risc RS/6000, Alpha experience -
they won't get caught again. If it can be done on a x86-64 with 80%
performance of Itanium, and in many cases even better, there's no
discussion.

That's an entirely different issue. I'll come back to this point later in your
post.

There are valuable skills involved in using the commercial software - you
can just as easily build a lousy model as a good one. You don't need to
employ a good auto designer to win races - just a very good driver.

This is a point we are not going to agree on. It's a sore subject with me and
one that, in the current context, is only going to raise the temperature of the
discussion. I'll be glad to talk with you about the ways in which I think
computers and especially computer models are misused on or offline, but I don't
think this is the appropriate time or place for that discussion.

x86 is not going away though and its natural progression to x86-64 or AMD64
is assured - like it or not. If the volume is not there, which it wasn't
with the Alpha, MIPs, Risc/6000(PowerX) etc., M$ will get soured very
quickly in either case. There's no reason, that I see, to suppose that
Itanium will have any different a fate in the WinWorld - another part of
Intel's blunder... expecting that the PC CPU model of distribution, where
others take responsibility for the infrastructure, extrapolates to the high
end. IOW if you want to challenge IBM or even Sun or EMC, you have to go
toe to toe with them.
M$ wants to invade the high-end world just the way Intel does. M$ wants its
products to be competing with IBM's high margin products in just exactly the
way that Intel wants its products to be built into high margin products that
compete with IBM's high-margin products.

Contrary to what you seem to think, I have a pretty good idea of when I'm in
over my head, and engaging in anything more than the most casual speculation
about where things are headed at the moment puts me in over my head. The
computer business has been stunningly unforgiving of corporate missteps, and
quick to punish corporations that guess wrong. No need even to state examples.

It's easy to figure out what the markets think. Just head over to
www.bloomberg.com and build yourself a chart for IBM, INTC, and AMD. x86-64
has done wonders for AMD's stock price. No big surprise there. If you factor
out the one-time bump for AMD, though, Intel and AMD are pretty much tracking
each other--Prescott roll-out problems and all. IBM, OTOH, doesn't seem to be
going anywhere, and you would think that all the chaos would be good news for
them.

Intel has been running a very successful, very profitable business for what in
this industry is a very long time. You may feel comfortable patronizing me
without any real knowledge of my skill set, but I don't feel comfortable
second-guessing Intel's management team, or HP's for that matter, even though
Wall Street is plainly not enthralled with HP. I will let the fact that you
seem to feel comfortable stating that Intel's management doesn't understand the
basics of the business it is in speak for itself.

As to the fate of Itanium, my working assumption continues to be that if there
is a serious challenge to Power for high-end machines, it will be from Itanium
and not from x86-64. I'd love to be a fly on the wall at meetings at Intel
right now, but I'm not, and if I have to choose where I'm going to place my
bet, I'll stick with Intel. If Itanium goes, it's because Intel decides that
it goes. And remember what I said: contrary to areas that involve mathematics
and computer modelling, in this instance I know that I'm in over my head. All
I can do is guess.
You should not presume knowledge you don't have - none of the xFFT things
you mentioned are Mathematical Programming, not even vaguely. Heavy-handed
derision based on ignorance only reflects badly on you... and places in
doubt any respect you have gained here.
Oh boy. Another taxonomy freak. If I were interested in taxonomy, I would
have been a botanist (my standard response to taxonomy freaks). The xFFT
packages involve most of the issues that would be involved in building any kind
of linear algebra toolset and they present an example of how you can create a
software package that adapts to its environment.

So that we can get out of the taxonomy box, I would prefer to talk about
decision support software, which I believe includes all the things you wanted
to talk about and more. Itanium can deliver stunning floating point
performance, given proper care and feeding, but that isn't really the issue.

If the customers you want to pursue wind up on Itanium boxes, then you will
have to deal with Itanium. If things go the way I expect them to, a great deal
of decision support software is going to wind up on Itanium boxes.

If you're sure that your customers are going to wind up on x86-64, why waste
your time talking to me about Itanium?

I mean what's your real agenda here? You want to talk down Itanium? You want
to talk me down? You're whistling past the graveyard because Itanium's been a
bitch, and oh by the way you're looking for somebody to take it out on?
The thing is, you see, that so far, it doesn't... fit into Itanium -
results have been mediocre at best. Oh and those are serious optimization
packages for those with the skills to use them - they do things which
cannot be achieved by dabbling with libraries. There is no free lunch I'm
afraid.
Including no free ass to kick on the internet. You want to talk, talk. You
want to kick ass, go to a bar and pick a fight.

RM
 
G

George Macdonald

I have never actually used a spreadsheet in the fashion you are suggesting. I
know people who do. If I wanted to solve a constraint or optimization problem,
linear or non-linear programming, or anything like it, I'd do it in Fortran or
c using a toolkit. I might use Mathematica or Matlab. If I had to do alot of
it, I might learn Haskell. I would, of course, do whatever I do when doing
problems I don't ordinarily do, which is to download the ton of software that
can be had on the internet just for the trouble of looking.

If you have a real-world constraint optimization problem, of a size which
has any real meaning to its purpose, I'd hope your toolkit would be
something of substance like CPLEX or OSL and include an appropriate model
managemnt system. Anything else, you're wasting your time and your
computer's. OTOH if you've never done any LP or Mixed Integer Programming
and you need results, it'd be a good idea to get help/advice from someone
who has.
As to the circumstances that would lead people to use a spreadsheet as anything
other than an input medium for other than the simplest of
computationally-intensive operations research problems, I'll admit that you've
got me. Most of the people I knew who did it either didn't know what they were
doing and/or couldn't afford anything more sophisticated.

<shrug> People have done it and people have sold the tools to do it.
Granted most such models are too complex for such a simplistic approach and
need a modeling language. End users, however, like the spreadsheet idiom
for viewing and manipulating their raw data. There are even people who
believe that the failure of Lotus Improv was a terrible disaster.
That fact that people actually do such a thing strikes me as an opportunity for
a company like Microsoft, which, all external appearances notwithstanding, does
understand that when you come to math you can't fake it.

M$ is the last company which is going to be able to supply optimization
software - as is often the case when things get complex, the Solver in
Excel was not done by them.
That's an entirely different issue. I'll come back to this point later in your
post.



This is a point we are not going to agree on. It's a sore subject with me and
one that, in the current context, is only going to raise the temperature of the
discussion. I'll be glad to talk with you about the ways in which I think
computers and especially computer models are misused on or offline, but I don't
think this is the appropriate time or place for that discussion.

You are just a naif about optimization and model management then. I don't
care what temperature that raises with you. There are many "models" which
are so bad they are laughable - usually written by the grant hunters for
the usual nefarious purposes; the optimization models used in business are
usually good, used rather well and perform an extremely valuable function
in the realm of strategic and tactical planning... IME.
<snip>
M$ wants to invade the high-end world just the way Intel does. M$ wants its
products to be competing with IBM's high margin products in just exactly the
way that Intel wants its products to be built into high margin products that
compete with IBM's high-margin products.

It's called shooting at the moon. M$ does not have the depth of experience
to even realize how far short of the mark they really stand. Basically,
for both M$ and Intel, 3rd party OEMing/distribution will not gain entry to
that market. That the PC allowed M$/Intel some penetration there was
simply inertia in the computer marketplace - it does not mean that they are
now in a position of guiding that market forward.

Oh boy. Another taxonomy freak. If I were interested in taxonomy, I would
have been a botanist (my standard response to taxonomy freaks). The xFFT
packages involve most of the issues that would be involved in building any kind
of linear algebra toolset and they present an example of how you can create a
software package that adapts to its environment.

Sorry but the toolset your going to build is not even in the starting
blocks when the race has already been run. You have no conception of the
effort involved. You can't just collect a bunch of feel-right data and
throw it at a "toolset". If you compare xFFT tools against a Mathematical
Programming package, you may find that the nucleus of the core of the
kernel of the system contains some code which bears a resemblance; other
than a few loops which do matrix operations it's not even close.
So that we can get out of the taxonomy box, I would prefer to talk about
decision support software, which I believe includes all the things you wanted
to talk about and more. Itanium can deliver stunning floating point
performance, given proper care and feeding, but that isn't really the issue.

There are established algorithms for the things which have to be done in
decision support, optimization in particular. For practical models, they
involve fairly intensive spurts of FP but, as I'm sure you're aware, matrix
sparsity is an important consideration. The results obtained so far for
Itanium are not favorable.
If the customers you want to pursue wind up on Itanium boxes, then you will
have to deal with Itanium. If things go the way I expect them to, a great deal
of decision support software is going to wind up on Itanium boxes.

Considering what you've revealed about what you know about decision support
software.... said:
If you're sure that your customers are going to wind up on x86-64, why waste
your time talking to me about Itanium?

I mean what's your real agenda here? You want to talk down Itanium? You want
to talk me down? You're whistling past the graveyard because Itanium's been a
bitch, and oh by the way you're looking for somebody to take it out on?

I don't have an agenda. This started out as a discussion about the
practicality of retrain/feedback compilation for software in general. I
still maintain that it does not fit the distribution model for commercial
software and if you expect ILOG/CPLEX et.al. to supply their source code to
end users so they can retrain the performance for the various model types
they have, it ain't gonna happen - it's umm, trade secret. If you think
that a static retrain at system build time is going to work - well it'll
work but err, I thought the whole point was to get the best out of the
hardware for any given code/dataset... enter OoO!! IOW, for me,
retrain/feedback is a commercial red herring, which may OTOH, interest a
few researchers who develop their own code. They dont live in the
business/commercial world which pays the bills for Intel.

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
 
R

Robert Myers

If you have a real-world constraint optimization problem, of a size which
has any real meaning to its purpose, I'd hope your toolkit would be
something of substance like CPLEX or OSL and include an appropriate model
managemnt system. Anything else, you're wasting your time and your
computer's. OTOH if you've never done any LP or Mixed Integer Programming
and you need results, it'd be a good idea to get help/advice from someone
who has.
Constraint-based programming and functional languages, and in fact
spread-sheets in general are a possible paradigm for more general
types of parallel programming.

Since I'm not in a group filled with theoreticians, I won't go to the
web to check that I'm using all the language in the right way that
would get me a red check on my final exam, okay?

Spreadsheets have the appealing property that in non-iterative
programming (not what happens in a solve), memory locations are used
just once. That's a magic door in terms of the theoretical properties
of the programming model.

If you have a model with the correct properties (and that fact is
checkable in an automated way), you can just start anywhere and start
computing. Thread creation becomes trivial. When you run into a
piece of data you have that isn't available, you just put that thread
to sleep and wait for the data to become available.

People _have_ thought about this property of non-iterative
spreadsheets. They've also done work around what ways could be used
to get around the write-once limitation, but I personally think that's
a bad idea, and I think it will be more productive to think in terms
of a conceptually infinite spreadsheet.

I'm happy to admit that I haven't done any actual work in trying to
use the methods of OR as an approach to concurrent programming. I
stumbled upon the idea quite by accident because the acronym CSP can
be construed in two common meanings, and it turns out that both of
them are applicable to concurrent programming.

Because this is a problem of great commercial interest, I'd be willing
to bet that the theoretical foundations for this kind of work are more
solid than they are for most other models of concurrent programming.
It's probably a very good place to go prospecting for good work that's
already been done, and I'm always happy to find good work that's
already done, at least if I can understand it.

It may be that what I'm talking about is all just old hat to someone
who does this for a living. If so, maybe this is my lucky day.

If I got lucky, I might find a package out there in a language I can
understand that illustrates all the nearly-magical properties that an
infinite spreadsheet would have. If you know of such a thing, I'd be
really interested in hearing about it. If I don't find such a thing,
I will be pursuing the idea from the other end of the telescope,
probably starting nearly from scratch.
<shrug> People have done it and people have sold the tools to do it.
Granted most such models are too complex for such a simplistic approach and
need a modeling language. End users, however, like the spreadsheet idiom
for viewing and manipulating their raw data. There are even people who
believe that the failure of Lotus Improv was a terrible disaster.

The spreadsheet occupies nearly a unique place in the history of the
development of the computer, as far as I'm concerned. Aside from
providing the killer app for PC's, it made mathematics that would
otherwise have been inexplicable to the average user immediately
transparent. The appeal as a conceptual/pedagogical model is
undeniable. It just isn't a very efficienct way to do computation.
M$ is the last company which is going to be able to supply optimization
software - as is often the case when things get complex, the Solver in
Excel was not done by them.

No big surprise. Neither did they do the speech-recognition software
in Word. For all the scathing criticism I've levelled at Microsoft,
they know how to shop for quality. What they do with it when they get
it is another matter.

You are just a naif about optimization and model management then. I don't
care what temperature that raises with you. There are many "models" which
are so bad they are laughable - usually written by the grant hunters for
the usual nefarious purposes; the optimization models used in business are
usually good, used rather well and perform an extremely valuable function
in the realm of strategic and tactical planning... IME.
I know of at least one example where models are routinely used in
business and I would have a hard time imagining how people would do
without them, which is that businesses use them in price setting. How
much skill is involved in actually using one, I wouldn't have a clue,
but I can imagine that it being used in the way that you seem to
imply: as a routine tool for solving day-to-day problems, where the
user of the model would get alot of practice and would probably
benefit very little from understanding all that much about how the
model works.

The level at whcih I have seen models used that just simply horrifies
me (and it's a horror that extends to practically every realm in which
computer models are used) is that people who don't understand either
the model, the mathematics, or the limitations of either, use a
computer model to make sophisticated, non-routine decisions. I've
seen it in business, and I've seen it in science and engineering. You
can call me a naif if you want, but I've seen people in action, and I
haven't liked what I've seen in most cases. The power of the
spreadsheet as a visualization tool is also its weakness. It gives
people the illusion that they understand something and that they've
thought things through when they haven't.
It's called shooting at the moon. M$ does not have the depth of experience
to even realize how far short of the mark they really stand. Basically,
for both M$ and Intel, 3rd party OEMing/distribution will not gain entry to
that market. That the PC allowed M$/Intel some penetration there was
simply inertia in the computer marketplace - it does not mean that they are
now in a position of guiding that market forward.
Both companies are in a very odd position. They both manifestly
understand some aspects of running a business because they actually
run one themselves. The fact that they have been so successful at
those businesses may lead to a certain myopia (of the kind that I
think was plainly the undoing of DEC).

I'm willing to give both of them more credit than you are. The
players strike me as hard-nosed people who are as unsparing of
themselves as they are of others. The absense of complacent arrogance
doesn't rule out the possibility that they will be able to see beyond
the limits of their own success, but I would hesitate to bet against
either player in any enterprise they undertake. That is to say I
would _hesitate_, not that I would be completely unwilling.
Sorry but the toolset your going to build is not even in the starting
blocks when the race has already been run. You have no conception of the
effort involved. You can't just collect a bunch of feel-right data and
throw it at a "toolset". If you compare xFFT tools against a Mathematical
Programming package, you may find that the nucleus of the core of the
kernel of the system contains some code which bears a resemblance; other
than a few loops which do matrix operations it's not even close.
I was groping around for examples of software that takes account of
its environment (cache size, the actual nature of the parallel
environment it is to run in, etc). In that respect the FFT packages I
mentioned are pretty advanced in comparison to other software I've
seen. They may not come close to what you need to solve your
problems, but they are an example of an approach you can take to make
software adapt to a hardware environment that itsn't known ahead of
time.
There are established algorithms for the things which have to be done in
decision support, optimization in particular. For practical models, they
involve fairly intensive spurts of FP but, as I'm sure you're aware, matrix
sparsity is an important consideration. The results obtained so far for
Itanium are not favorable.

Itanium and sparse matrix or matrices only turns up a handful of
papers in the ACM digital library. Google, though, finds eleven pages
of results. Maybe I find your attitude toward Itanium programming as
naive as you find my attitude toward decision support software.

It would be foolish of me to say that there's anything there or not
there, because I haven't looked. Sparse matrices almost inevitably
involve indirect addressing. Just exploring and documenting the
possible strategies that could be used to solve the indirect
addressing (and the related pointer-chasing) problem with Itanium in
its current incarnation and possible future incarnations could keep an
entire team of researchers busy full time.

The place this grabbed me is that I know how expensive that kind of
research is. Suppose you could invent a feedback algorithm that you
could just turn loose with a random set of training data, and it would
find the best approach automagically. People have done such work with
algorithms in general, but it's a long way from being competitive with
talented human programmers at the moment.
Considering what you've revealed about what you know about decision support
software....<shrug>

Do put-downs help communicatoin? said:
I don't have an agenda. This started out as a discussion about the
practicality of retrain/feedback compilation for software in general. I
still maintain that it does not fit the distribution model for commercial
software and if you expect ILOG/CPLEX et.al. to supply their source code to
end users so they can retrain the performance for the various model types
they have, it ain't gonna happen - it's umm, trade secret.

This subject has been thrashed through on comp.arch. Another
possibility is to distribute code in some kind of intermediate
representation and have a compiler that could handle the intermediate
representation. The intermediate representation can even be
obfuscated, if necessary. Apparently, such things have already been
done with commercial software, although not with Itanium, as far as I
know.
If you think
that a static retrain at system build time is going to work - well it'll
work but err, I thought the whole point was to get the best out of the
hardware for any given code/dataset... enter OoO!! IOW, for me,
retrain/feedback is a commercial red herring, which may OTOH, interest a
few researchers who develop their own code. They dont live in the
business/commercial world which pays the bills for Intel.
If you're going to run a whole bunch of iterations on the same model,
changing only parameters, I'd be fascinated to see how a binary would
evolve. As a practical matter, you'd have to worry about the overhead
of the time and effort involved in rebuilding, although that can be
completely automated. It is possible, even with non-open source
software as described above. If you're running on dedicated iron,
then there isn't anything much that OoO can do for you that constant
retraining wouldn't. It's the unpredictability of the environment
that OoO can do and even constant retraining can't.

If Itanium survives, I expect speculative threading to solve a big
slice of the problems that full up OoO would solve and much earlier
than even limited OoO will be available.

RM
 
R

Robert Myers

You really *are* dense, aren't you Robert? Memory in a frame
across the room *WAS* latency. Sheesh! You know, the speed of
light, and all that rot? It was many hundreds of microseconds to
milliseconds away! Yes, caches were indeed used to mitigate the
"memory wall".
Your style of argumentation is so off-the-wall, so abusive, and so
presumptuous, that it really doesn't deserve a response. I also know,
from previous exchanges with you, that it could go on forever and that
you will simply keep raising the level of bluster and abuse, no matter
how long it takes and no matter how over-the-top you have to become.

Usenet is so crazy, though, that someone like you, who will just keep
shouting more loudly, no matter how loud you have to yell, might be
taken seriously. Not ever at any time in my life has anyone ever had
the gall to suggest that I don't understand that light travels at a
finite speed and that that has consequences in electrical engineering.

Yes, Keith, I got right away what you meant by the memory being in a
different cabinet.
You're a fool, as I always expected. Memory across the room
isn't a "wall"? Sheesh, grow up and learn something from
history.
No Keith, it isn't. It's a permanent, fixed delay that designers of
the time had to design around. The term "memory wall" as it came into
use in the Wulf paper was as in the sense, "If things keep up the way
they are going, we are going to hit a wall." The wall in the sense
that you would like to use it would have already existed because there
was already a substantial memory latency problem. In fact, as people
used the term, the "wall" had not yet been hit and wasn't expected to
be hit for some number of years in the future. When the "wall" as
used at the time was finally hit, and people fully expected it to
happen, computer performance would finally be determined *completely*
by first use, or so-called compulsory misses.

That day never arrived. It didn't arrive because people have been
able to keep hundreds of instructions in flight using OoO. Tony
understands what is being talked about and says with a straight face
he saw it all coming. I believe that Tony understands what is being
talked about, although I believe he wildly overestimates his ability
to predict the future. I don't believe that you even understand what
is being talked about.
Sure it has, but the issues are *old*. Memory has never been as
fast as the processor, except for some early toys called
"microprocessors". Of course not everything is a microprocessor,
even "mainframes". The more things change, the more they stay the
same.
You're stuck on some kind of simple-minded, "Oh, this is all just
about latency. Memory latency has always been a problem." What you
think is true so far as it goes, but it doesn't come to grips with
what was widely expected to happen in the mid-nineties...
...want to shift more goal-posts? I've noticed you're the
master.
Or with why it didn't happen.
So be it. I've noted you don't like the old farts on CA or AFC
correcting your myopic view of history either.
How does what happens in another newsgroup relate to the fact that
your attitude is:

1. Take my word for it.
2. I don't care what kind of published evidence you present. I know
better.
3. I don't care how widely the problem was perceived (conferences have
been held around the memory wall) or how others perceive it or what
they took it to mean. The only thing that counts is what I remember
and what I think.

I don't know how I'm supposed to argue with someone like that.

I cannot believe even you're this lame. The PDP11 was not a
mainframe. It was an example to show that caches were even known
(and useful) to slugs like in-order, simple instruction-at-a-
time, minicomputers. Mainframes were fully pipelined long
before, with multiple threading, even.
When did I say I thought a PDP-11 was a mainframe? You were the one
who brought a PDP-11 into the discussion. And, yes, Keith, I am
telling you that the "memory wall" as it was being discussed c. 1994
was a previously unrecognized phenomenon, in the sense that there was
not a fixed delay (the cabinet is on the other side of the room;
ferrite core memory is as slow as molasses, etc.), but a delay that
was growing exponentially in time. That's what the Wulf paper was
about, not about the issue of memory latency in general. And I'm just
going to keep saying it: what was predicted to happen never came to
pass.

Keith's version: The memory wall is about latency. We've always had
memory latency, so we've always had a memory wall.

Reality: Somewhere in the early nineties, people recognized a
disturbing trend: the slopes of memory latency and memory processor
time on a semi-log plot were different. They *extrapolated* into the
future and *predicted* a problem, which they chose to call a memory
wall.
You're a fool! The difference is? Add in branch-prediction, and
the difference is? Add in multiple threads and the difference
is?
If you can hide latency, there is no reason to talk about compulsory
misses. You experience the compulsory miss, but you somehow manage to
hide its effects. As of the Wulf paper, people could not forsee a way
to hide the latency of "compulsory misses".

I have no idea at all what branch predictions and multiple threading
have to do with this discussion. At the level the problem was being
talked about in the Wulf paper, a compulsory miss is a compulsory
miss, no matter whether it was the result of a branch misprediction or
not.
Translation: I'm wrong and don't have the time to figure out the
tough-stuff. What about branch-prediction (the analog to pre-
fetching)? What about BTICs, RAW and all the rest of the caching
that's been going on for some time. Good grief!

Daytripper has said in this forum that prefetching doesn't fix the
latency problem. When it works, it reduces the frequency of cache
misses, but it didn't work well enough in 1994 to keep people from
anticipating eventually, in the future, hitting a memory wall.
Prefetching, as a heuristic operating on the CPU front-end, has gotten
ever more sophisticated, but it is not a substitute for out of order.
Nonsense! Branch hints are as old as you are. Hints for data
doesn't to be a big stretch, considering a von-Neuman
architecture. ;-)
You really don't get this. You don't insert hints into the code. You
don't use heuristic pre-fetching algorithms. You just go ahead and
execute the code. If the data aren't in cache you suspend that
execution path and wait for the data to arrive. Meanwhile dozens of
other things that got started are having their data finally arrive and
keep the pipeline from being stalled. The raw code--no heuristics, no
hints, no branch predictions--is its own pre-fetch mechanism.

You really need to get out more then. This stuff isn't new, and
wasn't to those who taught me. Sure there are twists, but it's
all caching that that's important (I.e. weak).
What you are referring to as a "twist" is the substance of the matter.

I haven't a clue what he knew. I'd be more impressed with your
position if Gene Ahmdal wrote a paper stating such.

Are you using some kind of psychoactive substance when you write this
stuff? Oh, that's right, I forgot. You've patronized Seymour Cray,
too.

RM
 
G

George Macdonald

On Tue, 10 Feb 2004 18:45:07 -0500, George Macdonald

Constraint-based programming and functional languages, and in fact
spread-sheets in general are a possible paradigm for more general
types of parallel programming.

Since I'm not in a group filled with theoreticians, I won't go to the
web to check that I'm using all the language in the right way that
would get me a red check on my final exam, okay?

Disclaimer aside, I hope you're not equating the calc phase, dynamic or
static, of a spreadsheet with the compute task in what I think you mean by
constraint-based programming. Personally I'm not that interested in the
spreadsheet as an optimization model prep/expression tool. I only brought
up spreadsheet due to its *possibility* as a precursor to some heavy duty
computation since it has a built-in Solve... and there have been a few
"plug-in" independently supplied solvers.
I know of at least one example where models are routinely used in
business and I would have a hard time imagining how people would do
without them, which is that businesses use them in price setting. How
much skill is involved in actually using one, I wouldn't have a clue,
but I can imagine that it being used in the way that you seem to
imply: as a routine tool for solving day-to-day problems, where the
user of the model would get alot of practice and would probably
benefit very little from understanding all that much about how the
model works.

Hmmm, not sure but you seem to be missing the fact that there are three
disciplines involved here. There can be some overlap of course but in
broad terms, you have 1) the algorithm design/programming; 2) the model
designer/builder and 3) the economist/engineer user. They all have their
skills - actually, with a good model, you might be surprised at how much
the economist/engineer gets a grasp of how things work and what the
benefits and limitations are. There have certainly been many outstanding
successes in business operation planning.
The level at whcih I have seen models used that just simply horrifies
me (and it's a horror that extends to practically every realm in which
computer models are used) is that people who don't understand either
the model, the mathematics, or the limitations of either, use a
computer model to make sophisticated, non-routine decisions. I've
seen it in business, and I've seen it in science and engineering. You
can call me a naif if you want, but I've seen people in action, and I
haven't liked what I've seen in most cases. The power of the
spreadsheet as a visualization tool is also its weakness. It gives
people the illusion that they understand something and that they've
thought things through when they haven't.

Everything gets abused. Look at the "statistics" which get dished up daily
as "proof" of whatever crackpot agenda needs to be satisfied for whatever
political or social target ends. No doubt the spreadsheet has done a lot
of damage.
Itanium and sparse matrix or matrices only turns up a handful of
papers in the ACM digital library. Google, though, finds eleven pages
of results. Maybe I find your attitude toward Itanium programming as
naive as you find my attitude toward decision support software.

ACM and OR/math programming have never seemed to mix well - there was a
SIGMAP years ago but its members got very little encouragement or
satisfaction there and preferred to assemble in their own, more focused
societies, like ORSA and TIMS (now merged as INFORMS), SIAM and the
Mathematical Programming Society.

If you're going to run a whole bunch of iterations on the same model,
changing only parameters, I'd be fascinated to see how a binary would
evolve.

I thought I'd made it clear that was not what I meant. Many companies and
organizations use single tools for different model *types*, e.g. in LP,
from models which are near unit matrices and/or ultra sparse to the other
end of the spectrum where density and compute complexity are much higher.
Yes there *may* sometimes be separate paths through the code but in many
cases no.
As a practical matter, you'd have to worry about the overhead
of the time and effort involved in rebuilding, although that can be
completely automated. It is possible, even with non-open source
software as described above. If you're running on dedicated iron,
then there isn't anything much that OoO can do for you that constant
retraining wouldn't. It's the unpredictability of the environment
that OoO can do and even constant retraining can't.

If Itanium survives, I expect speculative threading to solve a big
slice of the problems that full up OoO would solve and much earlier
than even limited OoO will be available.

I dunno if we're in a spiral (up or down?) or just running in circles
here.:)

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
 
R

Robert Myers

Disclaimer aside, I hope you're not equating the calc phase, dynamic or
static, of a spreadsheet with the compute task in what I think you mean by
constraint-based programming. Personally I'm not that interested in the
spreadsheet as an optimization model prep/expression tool. I only brought
up spreadsheet due to its *possibility* as a precursor to some heavy duty
computation since it has a built-in Solve... and there have been a few
"plug-in" independently supplied solvers.

I have always considered one of my strengths that I confuse things--at
least as perceived by others. I don't think you run across people
like me *that* often: I need to see the path from the actual, conrete,
real-world example, all the way down to

{{0,1},{{0.1},1}, {}}

or whtatever its set theoretic representation is, if possible. If I
don't ever work out the details, I need to believe that I could worked
out the details if forced to, given enough time and perhaps the help
of a computer.

The abstract math would be incomprehensible and completely unmotivated
to me without the real-world connection, and I would have made a
terrible botanist. I can't remember very many different unconnected
things, so I have to get everything reduced to as few arbitrary
specifics as possible. I'm a reductionist through necessity, not
snobbery.

It is often hard to tell the difference between someone who is
genuinely confused and is conflating things that are truly different
and someone who really perceives a deeper connection and is sweeping
away irrelevant detail. There is also the reality that one person's
irrelevant detail may be another person's livelihood, and that may be
part of the disconnect here.

At the level at which I currently understand things, though, the
spreadsheet provides a nice concrete example of a programming system
that has some very interesting properties. If nothing else, while our
exchange may have done nothing more than to waste your time and annoy
you, it has led me off looking to see the ways in which people may or
may not have connected garden-variety spreadsheets with more abstract,
general, or poweful programming systems, most of which currently find
their application in OR, but which are also closely related to formal
models for concurrent programming.

But, no, I don't think that spreadsheets are an adequate
representation of the business you are in. They are a merely a
manageable, concrete example for purposes of the present discussion.

Hmmm, not sure but you seem to be missing the fact that there are three
disciplines involved here. There can be some overlap of course but in
broad terms, you have 1) the algorithm design/programming; 2) the model
designer/builder and 3) the economist/engineer user. They all have their
skills - actually, with a good model, you might be surprised at how much
the economist/engineer gets a grasp of how things work and what the
benefits and limitations are. There have certainly been many outstanding
successes in business operation planning.

So I've been told. Maybe I've just had an unusually bad run of luck.
The instant I turn my back, an otherwise competent programmer will
make an unjustified mathematical assumption with real (and disastrous
consequences), or I'm fixing somebody else's mistakes, or I'm
discovering that people have been running the same idiotic model
literally for years without stopping to ask the most obvious questions
about what the output really means.

Everything gets abused. Look at the "statistics" which get dished up daily
as "proof" of whatever crackpot agenda needs to be satisfied for whatever
political or social target ends. No doubt the spreadsheet has done a lot
of damage.

So I apologize for taking a cheap shot at your livelihood, which is
what I apparently did. I've just seen so much bad stuff.

ACM and OR/math programming have never seemed to mix well - there was a
SIGMAP years ago but its members got very little encouragement or
satisfaction there and preferred to assemble in their own, more focused
societies, like ORSA and TIMS (now merged as INFORMS), SIAM and the
Mathematical Programming Society.
That's really a shame. Part of why you've seen me bristle is that I
am so impatient when people break mathematics up into little fiefdoms
with special little names and codewords to identify you as in or out,
when the mathematics is just exactly the same. Dumping the
metaphysics of quantum mechanics and moving into a classical
discipline, only to find that the mathematics (renormalization and
all) are exactly the same was a great breakthrough for me.

...Many companies and
organizations use single tools for different model *types*, e.g. in LP,
from models which are near unit matrices and/or ultra sparse to the other
end of the spectrum where density and compute complexity are much higher.
Yes there *may* sometimes be separate paths through the code but in many
cases no.

I've read this paragraph several times, and I'm having a hard time
parsing it. Are you saying that the code goes though the same motions
(follows the same execution path or nearly the same execution path)
regardless of the model?

If so, that should be an ideal situation for Itanium. I can easily
imagine there would be problems with naively coding for packed, sparse
matrices, but it is hard for me to believe that the problem hasn't
been attacked by someone. If the code path is predictable, Itanium
should be able to eat the problem for breakfast. That a compiler
couldn't arrange things for you nicely without human intervention
doesn't come as a big surprise.

So what do I tell you to make this discussion of any value to you? If
you genuinely think that Itanium is dead in the water for your line of
business, then don't waste your time. There is no free lunch, just
exactly as you said. On the other hand, I suspect that a fairly nice
lunch can be had for a reasonable price if Itanium is worth your time
at all, at least from how I parse what you've told me.

RM
 
G

George Macdonald

On Thu, 12 Feb 2004 07:16:07 -0500, George Macdonald



I've read this paragraph several times, and I'm having a hard time
parsing it. Are you saying that the code goes though the same motions
(follows the same execution path or nearly the same execution path)
regardless of the model?

There is often some attempt to classify any given problem so that separate
paths can be arranged. e.g. a TLP (Transportation LP) has a very different
matrix representation from a manufacturing production planning model, where
you could have a huge range of values in the coefficients of a matrix which
would also be considerably more dense. Most models are going to be a
mixture and automatic classification is not possible. So what I'm saying
is that the same code path is often going to be used for different model
types, with a resulting shift in the compute complexity to different parts
of the algorithm and changes in the characteristics of the optimal code
path through those "parts". I have trouble seeing how a retrain/feedback
method can avoid having different retrains which apply to different models
even in the same organization.
If so, that should be an ideal situation for Itanium. I can easily
imagine there would be problems with naively coding for packed, sparse
matrices, but it is hard for me to believe that the problem hasn't
been attacked by someone. If the code path is predictable, Itanium
should be able to eat the problem for breakfast. That a compiler
couldn't arrange things for you nicely without human intervention
doesn't come as a big surprise.

So what do I tell you to make this discussion of any value to you? If
you genuinely think that Itanium is dead in the water for your line of
business, then don't waste your time. There is no free lunch, just
exactly as you said. On the other hand, I suspect that a fairly nice
lunch can be had for a reasonable price if Itanium is worth your time
at all, at least from how I parse what you've told me.

I wouldn't write Itanium off completely as a technical solution. Right
now, my feeling is that it's not going to fit the Intel manufacturing model
in terms of volume for various reasons already discussed. BTW I was a big
fan of the CDC 6600 and derivatives - still have my little plastic
instruction set card :) - but it was always a concern to me back then that
it was obviously not going make it as a general purpose computer.

Rgds, George Macdonald

"Just because they're paranoid doesn't mean you're not psychotic" - Who, me??
 
R

Robert Myers

There is often some attempt to classify any given problem so that separate
paths can be arranged. e.g. a TLP (Transportation LP) has a very different
matrix representation from a manufacturing production planning model, where
you could have a huge range of values in the coefficients of a matrix which
would also be considerably more dense. Most models are going to be a
mixture and automatic classification is not possible. So what I'm saying
is that the same code path is often going to be used for different model
types, with a resulting shift in the compute complexity to different parts
of the algorithm and changes in the characteristics of the optimal code
path through those "parts". I have trouble seeing how a retrain/feedback
method can avoid having different retrains which apply to different models
even in the same organization.

Now I think I see what you mean. If I can invent language, models
with very different characteristics are going to invoke the same code
modules. If I can guess at a more precise way of saying it, the
coarse grain execution path (a global flow chart from module to
module, say) is going to look quite similar for very different models,
but if you look at the details, you find that things like branch
predictions and the likelihood of a speculative strategy succeeding
within the modules might be quite different from model to model.

That suggests a different exercise from the one I had imagined, but
the results would be equally interesting to me. If you did your best
to pick two data sets that you thought would produce different
binaries, what would happen to performance if you trained the program
on the datasets, and then swapped the binaries.

My suspicion is that the answer might not be as great as you think.
The kind of feedback training that is done in commercial compilers is
really pretty crude, and the fact that they work at all is a measure
of how far we have to go in terms of compiler development. Private
correspondence I have from someone who should know suggests to me that
the training runs at moment aren't capturing much more than could be
accomplished by more sophisticated static analysis.

I'm not a professional compiler developer, and my interests are
broader and more long range, but the kind of aggressive attempt to
determine the absolute limits of code predictability that I'd like to
see done haven't really been undertaken. Or if they have, I've seen
not a whiff of evidence in the literature.

That's a long-winded way of saying that I think that if Itanium
compilers were as aggressive as they might be in capturing and
exploiting information, you might see a significant effect. As things
stand right now, I don't think you'd see a big effect because
compilers aren't capturing enough information to produce a big effect
at the microscale I'm imagining you're thinking about.

If the compiler won't do it for you then what? Go at it with vtune,
find out what the problems are, and fix them.

I wouldn't write Itanium off completely as a technical solution. Right
now, my feeling is that it's not going to fit the Intel manufacturing model
in terms of volume for various reasons already discussed.

I truly don't see a future dominated by x86 for high-end applications.
If not x86, then what? The answer is that I have not a clue, but
there are just so many non-technical reasons why a PC processor is
unacceptable for a premium box that I truly believe it would be more
likely for an Opteron with a Transmeta front end to hide its true
identity and actually operating at a reduced performance level is more
likely as a mainframe-competitor than an actual Opteron.
BTW I was a big
fan of the CDC 6600 and derivatives - still have my little plastic
instruction set card :) - but it was always a concern to me back then that
it was obviously not going make it as a general purpose computer.

Well, at least you have some sympathy for where my odd-seeming biases
come from. I just happen to have done the bulk of my hands-on work
with big machines at a time when no one in his right mind used an IBM
mainframe for anything but business applications--sorting W-2 forms as
I habitually derisively referred to it as being. It just seems so odd
that IBM may inherit that legacy in the end, and circumstances have
changed in a way that may make technical computing a much more
profitable business than it has been in the past.

RM
 
K

Keith R. Williams

hmm I have no idea, I dont have a shell acces to kernel.org to grep the
sources :)


http://www.well.com/~aleks/CompOsPlan9/0004.html

Hers something about SMP ppros not synchronizing corectly while
performing OOO job. Maybe thats the bug.

I don't understand what the author of that tidbit is up to, but I
suspect there is something missing about the understanding of OoO here.
OoO doesn't mean the processor reorders instructions at will. They (at
least the ones's I'm familiar with) still dispatch and completion
instructions in-order, but do what they please inbetween (execution).
Thus, to the external observer, the program executes as if it were run
on an in-order processor. Perhaps there is a PPro bug that prevents
this? If so it's a *huge* bug.

If the author is referring to the moaning that it's impossible to
predict the execution time of a random hunk of code, I agree with him,
TDB. One shouldn't be writing critical timing-dependent code on such
processors. There are too simply many variables and it is subject to
change across sibling processors, to say nothing of third and fourth
cousins.

As far as synchronizing two identical processors, it's known as running
them in "lock-step" and is done. While it's not easy (nor
realistically possible) to predict exactly how long a given instruction
stream will take to execute, two identical processors will take the
same time to do it (since all variables are the same for both).

The author brings up PPC's EIEIO instruction. This instruction,
"Enforce In-order Execution of IO". Doesn't tell the execution unit to
go in-order, rather tells the bus unit to enforce in-order I/O. The
PPC will try to give priority to reads over writes and reads under
cache "misses" (or misses under misses, or...). If the "memory" in
question is really an I/O device things can get all bollixed up. The
EIEIO instruction is intended to enforce in-order I/O operation to
avoid this problem. The actual instructions still execute OoO. The
sync and isync instructions will force in-order execution (of differing
levels), but are rarely needed by a user (needed for processor state
altering sorts of things).
 
D

David Schwartz

I don't understand what the author of that tidbit is up to, but I
suspect there is something missing about the understanding of OoO here.
OoO doesn't mean the processor reorders instructions at will. They (at
least the ones's I'm familiar with) still dispatch and completion
instructions in-order, but do what they please inbetween (execution).
Thus, to the external observer, the program executes as if it were run
on an in-order processor. Perhaps there is a PPro bug that prevents
this? If so it's a *huge* bug.


It depends what you mean by "external observer". An external observer
looking at the memory bus could definitely see fetches being done out of
order.

If the author is referring to the moaning that it's impossible to
predict the execution time of a random hunk of code, I agree with him,
TDB. One shouldn't be writing critical timing-dependent code on such
processors. There are too simply many variables and it is subject to
change across sibling processors, to say nothing of third and fourth
cousins.


No, he's talking about a specific problems with SMP PPro systems that
requires you to use a LOCK prefix to make your unlock instructions atomic
even though 32-bit aligned writes are supposed to be atomic any way.

The author brings up PPC's EIEIO instruction. This instruction,
"Enforce In-order Execution of IO". Doesn't tell the execution unit to
go in-order, rather tells the bus unit to enforce in-order I/O. The
PPC will try to give priority to reads over writes and reads under
cache "misses" (or misses under misses, or...). If the "memory" in
question is really an I/O device things can get all bollixed up. The
EIEIO instruction is intended to enforce in-order I/O operation to
avoid this problem. The actual instructions still execute OoO. The
sync and isync instructions will force in-order execution (of differing
levels), but are rarely needed by a user (needed for processor state
altering sorts of things).


They are needed all the time by users who write synchronization code.

Most programmers, including myself, ignore the PPro errate and simply
state that they no longer support SMP PPro systems. This is because the cost
of a LOCK prefix on a P4 machine is just too high, and it's more logical to
assume that 32-bit aligned writes will occur atomically.

I've forgotten the specifics of the PPro errata, but the net effect is
that to release a spinlock, you have to use an exchange or locked move
instruction rather than a regular move.

DS
 
D

David Schwartz

Grumble said:

This is the relevant section, it's a bug in the cache coherency logic
that can result in two processors each having modified cache lines for the
same memory area!


There exists a narrow timing window when, if P0 wins the external bus
invalidation race and gains ownership

rights to line A due to the sequence of bus invalidation traffic, P1 may not
have completed the pending

invalidation of its own, currently valid and shared copy of line A. During
this window, it is possible for a P1

internal opportunistic write to a portion of line A (while awaiting
ownership rights) to occur with the original shared

copy of line A still resident in P1's L2 cache. Such internal modification
is permissible subject to delaying the

broadcast of such changes until line ownership has actually been gained.
However, the processor must ensure

that any internal re-read by P1 of line A returns with data in the order
actually written; in this case, this should be

the data written by P0. In the case of this erratum, the internal re-read
uses the data which was written by P1.

DS
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top