Okay, I get it. It isn't really a 64-bit CPU if it is 32-bit in and out. At
least not to me.
Yup, I'm familiar with the 386SX chips. In fact, I own an old system that
was a gift from my sister-in-law. A bit of nostalgia. They (the 386SX)
always seemed to be a waste of time and effort to me. An interim, stop-gap
measure at best and a waste of Intel's resources. Not to mention their
customer's money. Customers that were fooled into thinking something that
did not provide the benefit that was promised.
Ed Cregger
Excuse me, but there seem to be some misunderstanding regarding these
bits. Maybe I can clear that up.
(you are correct about the '386SX having only a 16-bit bus, and it
very likely had an obvious effect on performance vs '386DX. But it has
nothing to do with the '386's 32-bitness in terms of software
capability. It's just a performance feature, like the original
Pentiums 66MHz 64-bit bus, and the PCI architecture, etc.)
First of all, there is widespread temptation to interpret
"cpu-bitness" as some width of computing that has directly with
performance to do. That is not correct. It has to do with the
capability of the CPU to do things. - At all.
In the specific regards of what is called "64-bit class computing"
generally, and in regards of 16-, 32- and 64-bit
computing/CPU/software/OS on the PC particularly, the bits refer to
the length of the address field that a machine instruction use to
refer to any code or data. This is the only kind of "CPU-bitness" that
is meaningful today.
One very old, but common, convention was to refer to the width of data
registers. That quickly had to be qualified as "general purpose
registers", due to developments.
(Well, both the AMD K8 line (A64), and Intel's EM64T enabled Prescott
line, have 64-bit GPRs.)
This "bitness" is still greatly favored by old men, who have been in
the industry for a long while. They sometimes also vehemently claim
that it represents some kind of "definition" of cpu-"bitness".
Well, I believe that "definition" talk is rubbish. But enough
nonsense, because it really doesn't matter anyway. The main problem
with using register length, any register length, for understanding
"cpu-bitness", is that it is completely meaningless. It doesn't really
directly say anything about the CPUs properties, in the sense of what
kind of class of computing the CPU is capable of, or suitable for.
The annoying thing for all rigid 'systematizists', is that there isn't
any single bit-thing, that is consistent through the ages and CPU
developments, in saying anything meaningful, without also being
misleading, about the class of a CPU.
CPU-BITNESS:
- It is a property of the machine instructions provided in the ISA!
The only thing that is interesting today, in any context of "64-bit
class computing", is the length of an address, that a native binary
instruction can use.
Similarly, "16-bit", "32-bit" computing, in terms of PC use, _ALSO_
refers to the instructions address length. If you believe it to be
anything different, you've been assuming wrongly.
The '286 (16-bit) added a new processor personality to the 8086 (also
16-bit). The new one featured protected mode addressing and a
different segmented addressing that made a larger space available.
That means nothing for old 8086 software. That runs with the same
limitations on the old personality, which is kept for compatibility
reasons.
The '386 (32-bit) added a new processor personality to the '286. It,
again, means nothing for old software, which still has to run on the
old personalities. But new software, written and compiled for the
'386, can do 32-bit addressing, and can also have a flat virtual
space, instead of segmented. It wasn't put much to use, before
Windows95, though.
That is really what happened when Win95 came. We started to use a
'processor' that actually was already introduced with the '386.
Same again with AMD86-64 (Intel calls it EM64T, but that's not the
proper name). It effectively adds a 64-bit processor to the collection
of old processor personalities, already available. The A64 basically
represents four different CPUs in the same package. 8086, '286, '386
and AMD64.
Again, we're not using the 64-bit part yet. I suppose we'll have to
wait for Longhorn.
WIDTH OF COMPUTING AND WIDTH OF DATA
It is true that increased computing width can be exploited for
increased performance. But that has already been going on for quite a
while, and doesn't affect the fundamental 32-bitness of our earlier
Pentiums and Athlons.
For instance, already the first original Pentium introduced 64-bit
data bus. And the P4 have 128-bit data bus in two 64-bit channels.
The Pentium Pro, PentiumII/III, AMD K6 and Athlon also continued to
aggressively pursue increased width of internal execution as means to
increase performance.
Taking the modern K8 (A64, X2, Opteron) as example, the logical
execution width of the core is 192 bits. That is the number of
computed data bits that can be committed every cycle.
Does that make the K8 a 192-bit CPU?
Well, let me put it this way: - For those who think that 32-bit is
twice as fast as 16-bit and 64 twice as fast as 32-bit, and ask for a
"128-bit processor", or are expecting that soon, as next logical
evolutionary step, - yes it does

.
3 execution pipelines, 64 bit wide = 192 bits.
In hardware terms it is even considerably wider. Which is often
observed in schematics of these CPUs. And leads to misunderstandings.
But the number of hardware execution units is of secondary importance.
These are connected to the pipeline, and used by it. It's the pipeline
that controls execution
Intel's next major core technology, 'Conroe' (Pentium5?, it will
probably debut as a mobile cpu, 'Merom') will be even wider, 256-bit.
4 pipelines. Observe that this is per core. A four core CPU will be
1024 bits wide, in the sense of data width computed.
In regards of what kind of software it can run, it is 64-bit!
Which is the kind of bitness we are discussing.
WIDTH OF REGISTERS
Now lets discuss the width of registers for a while.
The length of a register is for accommodating the data type that is to
be processed. Any excess length is wasted.
Floating point numbers are 32 or 64 (double precision) bits long. And
guess what, - we already have had 64-bit FP registers ever since the
FPU was integrated in the '486DX.
Characters are 8 or 16 (unicode) bits long.
Counters, indexes, identifiers, flags, masks, and other integers can
be 16, 32, or even 64 bit.
But there are few reasons to use integers longer than 32 bit. They
exist, but the opportunities to increase performance by fitting longer
integers into a single register are fairly rare.
However: What about processing several pieces of data collectively?
Right, again we have been doing exactly that for quite a while
already. We have 128-bit and 64-bit wide SIMD registers. These are
used to handle two 64 bit, four 32 bit, four 16 bit or eight 8 bit
data simultaneously. This also didn't make the 32-bit processor either
64-bit or 128-bit.
However: Pointers are also integers. And here we have the real reason
for expanding integer register width to 64 bits in AMD86-64. Because
we want to use 64-bit pointers! 64-bit software uses 64-bit pointers.
One further thing about "registers" that need to be mentioned, is that
they more and more, are mainly something that the instruction set
makes "visible", as part of the CPU's software interface, and virtual,
functional model. And something that less and less, has any direct
correlation in the actual hardware.
In terms of data storage&accumulation devices, that are available to
execution units, both P4 and K8 have hundreds. (And yes again, at
least in the K8, all of them are generalized to at least 64 bits.)
These are not visible in the ISA.
The ones the ISA defines, relates to the formal function model. The
program interface. Not to how the hardware actually goes about doing
its thing.
For instance the 128-bit SIMD registers. This is not how the hardware
handles it. For the hardware, a vector instruction (SIMD) expresses
explicit parallelism, that is more expedient and simple to schedule in
its multiple parallel pipelines. It will however handle the data in
separate 32 or 64 bit chunks, and again assemble the 128 bit segment
at the end. So the 128-bit registers are to a degree only 'virtual'.
But this doesn't represent any cheating, or compromised performance.
Quite the contrary. It's part of a decoupling that makes the CPU
scalable.
64-BIT AND PERFORMANCE
Much of the performance improvement from 32-bit code vs 16-bit code
came from the fact that we abandoned a segmented virtual memory model,
and adopted a flat, linear virtual memory model instead.
The idea that 32- vs 16-bit is about computing things in larger chunks
is essentially false. There is that too, to some extent. But it's not
the main thing.
This time, we are retaining the flat virtual memory model. 64-bit
makes it bigger (which is essential, and highly needed), but that's
all. So actually, we would have no direct reason to expect a
performance improvement from going to 64-bit software.
We will, however. But that's not so much from going to 64-bit, as
actually switching to a new ISA, that provides twice as many
registers, and makes the integer registers all general purpose.
Both the 600-, 501-, 800- series Pentium4/PentiumD and A64/Opteron/X2
must be considered "true" 64-bit CPUs, IMO.
However, Intel have done a modification to a 32-bit design already
underway. While AMD did a 64-bit design from scratch. Perhaps that is
the reason why Intel don't see the same performance improvement from
64-bit code vs 32-bit code?
But it could just as well be due to compiler optimizations in gcc. I
don't know.
I believe Intel also have an issue with memory mapping beyond 4GB,
which at least appear to be due to 64-bit mapping being strapped onto
old 32-bit mapping. Other evidence of that is Intel's 36 bit limit.
It doesn't matter much, as AMD is so much faster, cooler, more stable
today, and cheaper as well.
I'm sure that will change. But clearly, people who purchase Intel
right now, do so for other reasons than technology, economy or
performance.