Is Itanium the first 64-bit casualty?

Greg Lindahl · Jul 3, 2004

daytripper said:
Someone obviously never heard about 32b PCI's 64b "Dual Address Cycle"...

Hush! He was sounding very convincing, let's not burst his bubble.

-- greg

daytripper · Jul 3, 2004

Hush! He was sounding very convincing, let's not burst his bubble.

ooops! sorry ;-)

Rupert Pigott · Jul 3, 2004

Bengt said:
In comp.arch, Rupert Pigott

Actually, there are only two systems-vendors for SPARC systems (Sun
and Fujitsu) but more for Itanium systems (IBM, HP, Dell, SGI,
NEC...). Raw processors is not what is bought by most people. So the
argument isn't totally stupid.

It is still totally stupid IMO.

Dual sourcing is there for a reason. It's not a fetish. It can ensure
that there is competition which keeps prices reasonable. It can help
reduce the risk of common mode failures. It can ensure that if your
supplier become unwilling to supply you with critical components,
you have an alternative.

How much would it cost to fab say 100 units of an IA-64 chip that run
within 90% of the speed of a full blown IA-64 (ignore the cost of losing
your shirt to Intel in a courtroom) ? Answer : Lots. Designing and
fabbing 100 IA-64 boards on the other hand is a *lot* cheaper.

Like I said : It's not just a fetish, it's a sane way of doing business.

In theory and legally you can clone SPARC, build Solaris systems to
compete directly with Sun. In practice, forget it. Sun has too much
momentum (Fujitsu has managed by building bigger systems than Sun).

The point remains : Would Fujitsu be able to do that vs HP/Intel ?
Unlikely, because they couldn't produce a chip of their own that gave
them the characteristics and product differentiation required. HP/
Intel would just sue the pants off them - or make it prohibitively
expensive.

"In Theory, there is no difference between Theory and Practice, in
Practice, there is."

Indeed. Practice is : Intel has you over a barrel if you buy IA-64,
whereas Sun does *not* have you over a barrel if you buy SPARC.

Cheers,
Rupert

Alexander Grigoriev · Jul 3, 2004

In Windows, you can do different ways:

1. A driver can allocate a contiguous non-pageable buffer and do DMA
transfers to/from it. The buffer can be requested to be in lower 4 GB, or
even in lower 16 MB, if you need to deal with legacy DMA.
2. A buffer may be originated from an application. In this case, MapTransfer
function may move the data from/to a buffer in low memory (bounce buffer
you've mentioned), if a device is unable to do DMA above 4 GB. It's done by
HAL, and drivers don't need to bother. If a device supports scatter-gather,
only some parts of the transfer may need bounce buffers.

Alexander Grigoriev · Jul 3, 2004

With use of IOMMU, bounce buffers are not necessary in Windows.

For a DMA transfer, NT HAL maps a virtual address to a device-visible memory
address. Device-visible memory address is not necessarily the same as
physical address, even though for most cases it is.

If IOMMU can map 32-bit PCI bus-master address to any 64-bit memory address,
HAL can set up the mapping and returns the corresponding address, which the
driver should tell the PCI device.

I don't know if HAL supplied with XP 64 does that, but it fits very well to
NT architecture, and doesn't require to change drivers.

Stephen Sprunk · Jul 3, 2004

Alexander Grigoriev said:
In Windows, you can do different ways:

1. A driver can allocate a contiguous non-pageable buffer and do DMA
transfers to/from it. The buffer can be requested to be in lower 4 GB, or
even in lower 16 MB, if you need to deal with legacy DMA.
2. A buffer may be originated from an application. In this case, MapTransfer
function may move the data from/to a buffer in low memory (bounce buffer
you've mentioned), if a device is unable to do DMA above 4 GB. It's done by
HAL, and drivers don't need to bother. If a device supports scatter-gather,
only some parts of the transfer may need bounce buffers.

Linux has the same two cases. I think the bounce buffers are handled in
generic code but the drivers have to explicitly support them; it's been a
while since I read the docs on that, so the kernel may handle it
automagically now.

I'd never heard of "Dual Address Cycle" (thanks daytripper), so I did some
reading. NT does use it if every PCI device in the system supports it. If
a single PCI device is non-DAC-capable (which includes most consumer hw),
then bounce buffers are used for all DMAs above 4GB. I don't know if Linux
takes the same approach or just assumes bounce buffers are always needed
above 4GB.

I still note that the same mess is present with 24-bit ISA DMA, and there's
even cases where some DMA buffers have to live in the first 1MB for obscure
backwards-compatibility reasons. These hacks will remain in amd64, so it's
not much more of a mess to do the same for 32-bit PCI. I'm not claiming
it's ideal, but it's already standard practice.

S

Yousuf Khan · Jul 3, 2004

Bengt Larsson said:
In comp.arch, Rupert Pigott

Actually, there are only two systems-vendors for SPARC systems (Sun
and Fujitsu) but more for Itanium systems (IBM, HP, Dell, SGI,
NEC...). Raw processors is not what is bought by most people. So the
argument isn't totally stupid.

Peruse the members list for Sparc International Inc., the consortium
entrusted with maintaining the Sparc standards:

http://www.sparc.com/

It's considerably more than just Sun and Fujitsu. Some people are actually
building Sparcs for embedded applications.

Yousuf Khan

Scott Moore · Jul 3, 2004

Nick said:
|>
|> > What is actually wanted is the ability to have multiple segments,
|> > with application-specified properties, where each application
|> > segment is inherently separate and integral. That is how some
|> > systems (especially capability machines) have worked.
|>
|> Thats what paging is for, and, IMHO, a vastly superior system that
|> gives you memory attributing while still resulting in a linear
|> address space.
|>
|> Having segmentation return would be to me like seeing the Third
|> Reich make a comeback. Segmentation was a horrible, destructive
|> design atrocity that was inflicted on x86 users because it locked
|> x86 users into the architecture.
|>
|> All I can do is hope the next generation does not ignore the
|> past to the point where the nightmare of segmentation does not
|> happen again.
|>
|> Never again !

I suggest that you read my postings before responding. It is very
clear that you do not understand the issues. I suggest that you
read up about capability machines before continuing.

You even missed my point about the read-only and no-execute bits,
which are in common use today. Modern address spaces ARE segmented,
but only slightly.

Regards,
Nick Maclaren.

I have heard the arguments over and over and over (and over) again.

Obviously you didn't live through the bad old days of segmentation,
or you would not be avocating it.

--
Samiam is Scott A. Moore

Personal web site: http:/www.moorecad.com/scott
My electronics engineering consulting site: http://www.moorecad.com
ISO 7185 Standard Pascal web site: http://www.moorecad.com/standardpascal
Classic Basic Games web site: http://www.moorecad.com/classicbasic
The IP Pascal web site, a high performance, highly portable ISO 7185 Pascal
compiler system: http://www.moorecad.com/ippas

Being right is more powerfull than large corporations or governments.
The right argument may not be pervasive, but the facts eventually are.

Scott Moore · Jul 3, 2004

Sander said:
What does file size have to do with 32 vs 64bit? The OS I run on my desktop
has been supporting file sizes in excess of 4GB since at least 1994 when I
switched to it, *including* on plain vanilla x86 hardware.

It enables you to memory map files.

--
Samiam is Scott A. Moore

Personal web site: http:/www.moorecad.com/scott
My electronics engineering consulting site: http://www.moorecad.com
ISO 7185 Standard Pascal web site: http://www.moorecad.com/standardpascal
Classic Basic Games web site: http://www.moorecad.com/classicbasic
The IP Pascal web site, a high performance, highly portable ISO 7185 Pascal
compiler system: http://www.moorecad.com/ippas

Being right is more powerfull than large corporations or governments.
The right argument may not be pervasive, but the facts eventually are.

Rob Warnock · Jul 3, 2004

+---------------
| Stephen Sprunk wrote:
| > IIRC, DMA writes to physical memory addresses
|
| DMA does but DVMA doesn't. Sun, and presumably all the other RISC
| vendors', systems have IOMMUs which translate bus addresses to physical
| addresses for DMA transfers. The AMD K8 has an on-chip IOMMU.
+---------------

Not "all the other RISC vendors' systems": In SGI's Origin 2000/3000 XIO
32-bit DMA addressing -- that is, PCI "A32" cycles (whether D32 or D64) --
is mapped through an IOMMU[1], but 64-bit DMA addressing (PCI "A64" cycles)
is mapped one-to-one with physical memory[2], which is a *tremendous*
performance advantage for devices that can do a few "smart" things such
as select which of multiple sets of pre-staged buffers to use based on
the size of the transfer, etc.

-Rob

[1] Well, not always. For each PCI bus, there is a 2 GB window that
can be mapped to any 2GB (2GB-aligned) portion of main memory.
But normally, A32 devices use the 1 GB window that's page-mapped
through the IOMMU for that bus.

[2] Well, almost. The low 48/52/56 bits are one-to-one with physical
memory, but for A64 transfers a few upper bits are used to convey
side-channel information, such as BARRIER write or not, PREFETCH
read or not, etc. Again, using A64 transfers provide a tremendous
performance advantage for devices that can dynamically decide when
to set BARRIER, say. (With A32 & the IOMMU, those attributes are
a fixed part of each page's mapping. Ugh.)

Anton Ertl · Jul 3, 2004

Yousuf Khan said:
The advantage of memmaps is that after you've finished the
initial setup of the call, you no longer have to make any more OS calls to
get further pieces of the file, they are just demand-paged in just like
virtual memory. Actually, not _just like_ virtual memory, it _is_ actual
virtual memory. Saves many stages of context switching in between this way.

What do you believe happens on a page fault? Yes, typically a context
switch.

So mmap does not necessarily save context switches; it may save system
calls, though (typically one uses fewer, larger mmaps than reads,
often just one).

The main benefit of mmap (compared to read), however, is that it
usually eliminates a kernel-to-user-space copy without eliminating
disk caching by the kernel. It also allows to write programs in a
slurping style more efficiently.

Mmap also has some disadvantages, though, which is why it is not the
most popular I/O method; in particular, it does not work on all things
that can be opened, and it has no atomicity guarantees (except those
given by the hardware).

Followups to comp.arch

- anton

Paul Repacholi · Jul 3, 2004

daytripper said:
Someone obviously never heard about 32b PCI's 64b "Dual Address
Cycle"...

Someone ranted on the PCI sig list years ago to say it should be
REQUIRED, and that the idea of a 32bit counter plus a 32bit static
high address register should be shot at birth.

Welcome to the pit. Again...

--
Paul Repacholi 1 Crescent Rd.,
+61 (08) 9257-1001 Kalamunda.
West Australia 6076
comp.os.vms,- The Older, Grumpier Slashdot
Raw, Cooked or Well-done, it's all half baked.
EPIC, The Architecture of the future, always has been, always will be.

Paul Repacholi · Jul 3, 2004

Scott Moore said:
Obviously you didn't live through the bad old days of segmentation,
or you would not be avocating it.

Perhaps you should raise your eyes and notice that there is more to
the world than misbegotten barnacles from a chip vendor. The problem is not
segmentation, it is 80x86s.

--
Paul Repacholi 1 Crescent Rd.,
+61 (08) 9257-1001 Kalamunda.
West Australia 6076
comp.os.vms,- The Older, Grumpier Slashdot
Raw, Cooked or Well-done, it's all half baked.
EPIC, The Architecture of the future, always has been, always will be.

Nick Maclaren · Jul 3, 2004

I have heard the arguments over and over and over (and over) again.

Obviously you didn't live through the bad old days of segmentation,
or you would not be avocating it.

I like it! Please collect your wooden spoon as you go out.

Regards,
Nick Maclaren.

Peter Dickerson · Jul 3, 2004

Scott Moore said:
It enables you to memory map files.

You mean memory map whole files? There isn't any reason why a big file can't
be mmapped into a 32-bit address space as a window. Its a sane way too.
Otherwise you can't mmap a file that is bigger that the virtual memory of
the system. Most 64-bit addressing CPUs don't yet have full 64-bit virtual
address translation.

Bengt Larsson · Jul 3, 2004

Yousuf Khan said:
Peruse the members list for Sparc International Inc., the consortium
entrusted with maintaining the Sparc standards:

http://www.sparc.com/

It's considerably more than just Sun and Fujitsu. Some people are actually
building Sparcs for embedded applications.

I meant general-purpose systems, like those built by Dell, IBM, HP.
What defines a platform there is operating-system-on-hardware and
application compatibility with it. For example, Windows-on-x86 is what
we can thank (or blame) for still having x86 in the way we do.

Compare for example the following:

Solaris-on-SPARC
Linux-on-IA64
AIX-on-PowerPC

and which is the more open platform? I think it's clear that
HPUX-on-IA64 is more closed than any of these and Linux-on-x86_64 is
more open.

Bengt Larsson · Jul 3, 2004

Yousuf Khan said:
Peruse the members list for Sparc International Inc., the consortium
entrusted with maintaining the Sparc standards:

http://www.sparc.com/

It's considerably more than just Sun and Fujitsu. Some people are actually
building Sparcs for embedded applications.

I'm aware of Sparc International, BTW.

Rupert Pigott · Jul 3, 2004

Bengt Larsson wrote:

[SNIP]

I meant general-purpose systems, like those built by Dell, IBM, HP.
What defines a platform there is operating-system-on-hardware and
application compatibility with it. For example, Windows-on-x86 is what
we can thank (or blame) for still having x86 in the way we do.

Compare for example the following:

Solaris-on-SPARC Linux-on-SPARC
OpenBSD-on-SPARC
NetBSD-on-SPARC
FreeBSD-on-SPARC

Linux-on-IA64

and which is the more open platform? I think it's clear that
HPUX-on-IA64 is more closed than any of these and Linux-on-x86_64 is
more open.

IA-64 is just limited. You're stuck with a single source who can yank
your chain anytime they feel like it, high migration cost and bugger
all native applications. Take a look at how swiftly HPQ dropped Alpha.

They can use that same volume/market place justification to axe IA-64.
Arguably it would make even more sense when you compare IA-64 volumes
to Alpha volumes at the point Alpha was axed, but that's another rant
that's already happened and it's dull.

My personal thought on the matter is that two of the things that have
kept IA-64 safe from the bean counters so far are :

1) Minimal competition (MIPS, Alpha, SPARC sidelined on performance
or support)
2) Executive face saving.
3) Maintaining a small edge in the performance stakes.

AMD have addressed 1) very aggressively, forcing Intel to respond in
kind with a catch-up product. That same catch-up product will be
filling the market segments that IA-64 was meant to descend into in
order to drive down it's price and broaden it's application base.

As for 2), people retiring/changing employers/getting fired could
change that in the blink of an eye.

Now for point 3). The margin of superiority over the competition has
not been stunning and the range of *commerical* software available
for it remains limited. So the customers are not necessarily being
attracted to the product, instead they are being kicked towards it
by vendors killing off lines they were quite happy with. Vendors can
get away with that to a degree, but if a viable alternative shows up
(AMD64) with lower migration cost (AMD64), wider software base (AMD64),
and lower system cost (AMD64), then they will probably lose those
same customers.

I still haven't decided whether St. Pfister leaping overboard is a
reliable indicator for the future of the good ship "Itanic".

Cheers,
Rupert

Sander Vesik · Jul 4, 2004

In comp.arch Bengt Larsson said:
In comp.arch, Rupert Pigott

Actually, there are only two systems-vendors for SPARC systems (Sun
and Fujitsu) but more for Itanium systems (IBM, HP, Dell, SGI,
NEC...). Raw processors is not what is bought by most people. So the
argument isn't totally stupid.

You are basicly wrong on all counts:
* there are far more than 2 system vendors for sparc[1]
* there are even more board vendors that just Sun & Fujitsu
* IBM, HP, Dell, SGI, NEC, ... are not good examples as
none of these have or even *could* have a second source
for Itanium CPUs, so its not clear what advantage you really
get from multiple repackagers

[1] Sun, Fujitsu, Tadpole, Naturetech all make their own original boards
from which a whole bunch of companies make systems.

In theory and legally you can clone SPARC, build Solaris systems to
compete directly with Sun. In practice, forget it. Sun has too much
momentum (Fujitsu has managed by building bigger systems than Sun).

"In Theory, there is no difference between Theory and Practice, in
Practice, there is."

In prcatice, there is also a difference between reality and getting
your facts wrong.

Tony Hill · Jul 4, 2004

As far as I recall, FX32 came out a long time after the P6 core introduced.
P6's first generation, PPro, was already obsolete, and they were already
into the second generation, PII. PPro was introduced in late 1995. I don't
think FX32 came out till sometime in 1997.

FX!32 actually made it out sometime in '96, and for a brief moment it
may well have been faster at running x86 code on Alpha systems than
the fastest x86 systems out there. At that time, all Intel had was
the PPro 200MHz. In fact, Intel was in a bit of a lull, with the PPro
having been released in late '95 at 200MHz and never really been
surpassed until the release of the 233-300MHz PII chips in May of '97.
Meanwhile DEC had managed to push their Alpha chips up to 500MHz or
maybe even 600MHz.

At this time I'm pretty certain that there were at least a few
applications that would have run faster on the Alpha and FX!32 than
the 200MHz PPro. Mind you, with translation/emulation you have a lot
of variation in performance and it was probably only some sort of
best-case type of programs that were faster. All in all though it
probably would have made for a pretty respectible x86 workstation,
albeit a waste of money if that was all you were going to do with it.
Seems a shame to cripple a chip with such incredible (at the time)
native performance by translating and emulating code all the time.

Is Itanium the first 64-bit casualty?

Greg Lindahl

daytripper

Rupert Pigott

Alexander Grigoriev

Alexander Grigoriev

Stephen Sprunk

Yousuf Khan

Scott Moore

Scott Moore

Rob Warnock

Anton Ertl

Paul Repacholi

Paul Repacholi

Nick Maclaren

Peter Dickerson

Bengt Larsson

Bengt Larsson

Rupert Pigott

Sander Vesik

Tony Hill