howto: port x86 .asm to ia64

A

Anatoly Greenblatt

Hi,

I'm porting a device driver, part of which is written in .asm files.
Apparently ml64 does not support ia64 and IAS (intel assembler) does not
understand my .asm files. Has anyone resolved this problem ?

Thanks,
Anatoly.
 
D

Don Burn

First are you porting to x64 (the AMD / Intel extensions to x86 for 64-bit)
or to ia64 ( Itanium ). If you are really trying to port to Itanium, then
try to eliminate as much of the assembler as humanly possible, since this
architecture is a mess and a PITA to program.
 
D

Don Burn

Anatoly,

I know Intel has an assembler for IA64, I think you will have to
investigate that. This really will be a pain, since the instruction set has
parallelism directly in it. Really as much as you can look to moving things
to C code, and using intrinsic's.

I used to be a compiler code generation guy and looked at the
architecture in its early days for a large PC firm, I told them to run away
as fast as they could.


--
Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting
http://www.windrvr.com
Remove StopSpam from the email to reply
 
A

Anatoly Greenblatt

Hello Don,
I think I'm not in right direction. I noticed that instruction set for ia64
is different, but when I connect windbg to 64 bit target, I see same good
old instructions set. So what I actually need is to port my .asm files to
x64 and not to ia64?! But ddk files has amd64 and ia64 switches and binaries
in ddk folders are split into amd64 and ia64, so which one of them is x64.

Thanks,
Anatoly.
 
D

Don Burn

AMD64 is x64, this is a superset of the pentium instruction set and a lot
easier to port to. x64 is also selling well, while IA64 is not selling
enough in the Windows market to be noticed.


--
Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting
http://www.windrvr.com
Remove StopSpam from the email to reply
 
G

Guest

I know Intel has an assembler for IA64, I think you will have to
investigate that. This really will be a pain, since the instruction set has
parallelism directly in it. Really as much as you can look to moving things
to C code, and using intrinsic's.

I used to be a compiler code generation guy and looked at the
architecture in its early days for a large PC firm, I told them to run away
as fast as they could.

Why?
If parallellism is built into the assembly language itself, wouldn't it be a
huge advantage for writing a compiler that makes use of this inherent
parallelism?
 
D

Don Burn

Bruno van Dooren said:
Why?
If parallellism is built into the assembly language itself, wouldn't it be
a
huge advantage for writing a compiler that makes use of this inherent
parallelism?

The compiler has to do all the work, this is no small task. Second, every
decent study of parallellism in regular programs indicates that 4 parallel
op's is about the best you can do, so of course Itanic started with 6
parallel operations. It was funny looking at the first Itanium compilers
output, it was almost always 1 real instrcution and 5 NOP's of course in
some cases this turned a 1 byte x86 instructuon into a 32 byte instructon!

For things that are parallel the CPU is ok, but the bottom line is that
after more than 40 years of research, most programs can not be made
parallel. As Dave Kuck an expert in parallellism said: "if you have
infinite parallellism and a program is 50% parallel you can double the speed
of the program".
 
G

Guest

The compiler has to do all the work, this is no small task. Second, every
decent study of parallellism in regular programs indicates that 4 parallel
op's is about the best you can do, so of course Itanic started with 6
parallel operations. It was funny looking at the first Itanium compilers
output, it was almost always 1 real instrcution and 5 NOP's of course in
some cases this turned a 1 byte x86 instructuon into a 32 byte instructon!

For things that are parallel the CPU is ok, but the bottom line is that
after more than 40 years of research, most programs can not be made
parallel. As Dave Kuck an expert in parallellism said: "if you have
infinite parallellism and a program is 50% parallel you can double the speed
of the program".

I think I get what you're saying.
The parallellism in a ia64 is per thread. 1 thread can have several
instructions executing in parallel, but there can be only 1 active thread per
instruction core.

As a result, you can only parallellize simple localized instructions (like
different add instructions to independent variables.

The C / C++ language does not lend itself well to this, because the compiler
has a very hard time figuring out if the ordering of statements is required
or not.
With this in mind, I think the only real use for itanium at this moment is
to run special hand-crafted algorithms for number crunching and thinks like
that.

Is this understanding correct?

Hm, a language like LabVIEW would be perfectly suited for this architecture.
Sadly, The cost for porting the compiler, as well as the cost of the platform
itself make this prohibitive.
 
D

Don Burn

Bruno,

Yes the hardware is designed to issue 6 instructions at a time, the
compiler lays down all 6 so yes it needs to be a single thread. It is not
only the problem of C/C++ none of the common languages do parallelism well.
But part of this is that people do not do parallelism well, yes we can think
about a few actions at once, and yes for things like manipulating an array
there is inherent parallelism, but we normally think of things as step 1,
step 2, etc. Try taking those steps and arranging them so 6 actions are
always active.


--
Don Burn (MVP, Windows DDK)
Windows 2k/XP/2k3 Filesystem and Driver Consulting
http://www.windrvr.com
Remove StopSpam from the email to reply
 
T

Tim Roberts

Don Burn said:
Yes the hardware is designed to issue 6 instructions at a time, the
compiler lays down all 6 so yes it needs to be a single thread. It is not
only the problem of C/C++ none of the common languages do parallelism well.
But part of this is that people do not do parallelism well, yes we can think
about a few actions at once, and yes for things like manipulating an array
there is inherent parallelism, but we normally think of things as step 1,
step 2, etc. Try taking those steps and arranging them so 6 actions are
always active.

It's a real problem. The Trimedia processor, used in many set-top boxes,
is a VLIW processor that can issue 5 operations per instruction, with a
large register set. They have poured vast amounts of resources into their
compiler over the last 10 years or so, and the typical program averages
about 2.4 ops per instruction.

On the other hand, you can do some very impressive things if you
concentrate on the inner loops. The Trimedia does MPEG like a bat out of
hell, but they haven't been afraid to introduce custom instructions to fit
the need.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top