Why was Intel a no-show on No Execute?

Yousuf Khan · May 26, 2004

This has been discussed at quite some length in these newsgroups, but now it
looks like the mainstream press are starting to hear about it too. Intel had
to be embarrassed into including NX into its AMD64 implementation.

http://story.news.yahoo.com/news?tmpl=story&cid=1738&ncid=1209&e=7&u=/zd/20040525/tc_zd/127930

There's a few things that this article writer has gotten wrong, but a few
things were right.

One thing he got partially wrong was his statement about Intel having no
execute protection in the 16-bit segments. The feature was still there in
the 32-bit segments, Intel never got rid of them. It was stupid OS designers
who decided to ignore the feature that caused this problem.

Yousuf Khan

Grumble · May 26, 2004

Yousuf said:
One thing he got partially wrong was his statement about Intel
having no execute protection in the 16-bit segments. The feature
was still there in the 32-bit segments, Intel never got rid of
them. It was stupid OS designers who decided to ignore the
feature that caused this problem.

Are you calling them "stupid" because they opted for paging
instead of segmentation, in an effort to write a portable OS?

Do you think there should be an x86-specific Linux branch,
using segmentation instead of paging?

glen herrmannsfeldt · May 26, 2004

Grumble said:
Yousuf Khan wrote:

Are you calling them "stupid" because they opted for paging
instead of segmentation, in an effort to write a portable OS?

Do you think there should be an x86-specific Linux branch,
using segmentation instead of paging?

I don't think it would be so hard to put all the data in a
data segment, and the code in a code segment, without overlapping
them. It requires the CS: prefix on any loads from the code
segment. Self modifying code is out of style these days,
so that shouldn't be much of a problem.

Now, for things like JIT where code is constantly being
written while running some arrangement would need to be made.

-- glen

Sander Vesik · May 26, 2004

In comp.arch Grumble said:
Are you calling them "stupid" because they opted for paging
instead of segmentation, in an effort to write a portable OS?

Do you think there should be an x86-specific Linux branch,
using segmentation instead of paging?

There was one for quite a while for pre-386 modes/machines.

Robert Redelmeier · May 26, 2004

In comp.sys.ibm.pc.hardware.chips glen herrmannsfeldt said:
I don't think it would be so hard to put all the data in a
data segment, and the code in a code segment, without overlapping
them. It requires the CS: prefix on any loads from the code
segment. Self modifying code is out of style these days,
so that shouldn't be much of a problem.

That _still_ won't help (never mind interpreted or JIT).

If an attacker can redirect execution by modifying the
return address on the stack, s/he doesn't need their own
executable code. Just point to data like "/bin/sh" and
return to an `exec` syscall.

-- Robert

Robert Redelmeier · May 26, 2004

In comp.sys.ibm.pc.hardware.chips Robert Redelmeier said:
That _still_ won't help (never mind interpreted or JIT).

If an attacker can redirect execution by modifying the
return address on the stack, s/he doesn't need their own
executable code. Just point to data like "/bin/sh" and
return to an `exec` syscall.

Ah, but you make me think -- all current CPUs have an internal
hardware call/return stack to speed up branch [mis]prediction.

It would be relatively simple to check this hw stack against
the memory stack and generate a fault if return addresses
don't match.

This could be enabled by a bit in the MSR if the OS has support
to handle/log "return addr faults". Most pgms should never
generate a return fault, but a mechanism could be made to
except those few that do.

A slightly bigger problem is the hw stacks are of limited
depth (6?) and it might be possible to flood them out.
But variable stack entry pointers would become more effective.

-- Robert

Stefan Monnier · May 26, 2004

It would be relatively simple to check this hw stack against

the memory stack and generate a fault if return addresses
don't match.

Lookup "call-with-current-continuation" to see why this is not a good idea.
Or maybe just think of how to implement exception handling.

Stefan

Yousuf Khan · May 26, 2004

Grumble said:
Are you calling them "stupid" because they opted for paging
instead of segmentation, in an effort to write a portable OS?

No, for not opting to use both. There was no mutual exclusivity between
paging and segmentation. Both could be used and complement each other.

I think the original OS designers in their haste to port Unix to the new
32-bit Intel chip did a simple cross-compile, and then didn't bother to make
use of any of the Intel-specific features of their architecture. They just
left it at "good enough". Of course, using Intel features would've made them
non-portable, but a lot of stuff gets non-portable at the lowest levels of
the kernel anyways.

Do you think there should be an x86-specific Linux branch,
using segmentation instead of paging?

There already was. The original pre-1.0 Linux kernels were using segments
*and* paging. I think with addition of new people into the development team,
Linux's original purpose got changed from being the ultimate Intel OS (Unix
or otherwise), to being a free version of portable Unix.

Yousuf Khan

Yousuf Khan · May 27, 2004

Robert Redelmeier said:
That _still_ won't help (never mind interpreted or JIT).

If an attacker can redirect execution by modifying the
return address on the stack, s/he doesn't need their own
executable code. Just point to data like "/bin/sh" and
return to an `exec` syscall.

How's an attacker to do that, when the the code, the stack and the heap
don't even share the same memory addresses?

Yousuf Khan

Robert Redelmeier · May 27, 2004

In comp.sys.ibm.pc.hardware.chips Yousuf Khan said:
How's an attacker to do that, when the the code, the stack and the heap
don't even share the same memory addresses?

Easy. Overwrite the stack with crafted input to an unrestricted
input call (getch() is a frequent culprit). This is the basic
buffer overflow.

In the location for the return address (where EBP is usually
pointing), put in a return address that points to a suitably
dangerous part of the existing code. Like an `exec` syscall.
Above this return address, put in data to make that syscall
nefarious.

-- Robert

Robert Redelmeier · May 27, 2004

In comp.sys.ibm.pc.hardware.chips Stefan Monnier said:
Lookup "call-with-current-continuation" to see why this is not a good idea.
Or maybe just think of how to implement exception handling.

Exception handling is easy -- mismatch produces a MC interrupt.
The kernelspace ISR checks the MSRs which tell it that a return
addr mismatch occurred. Kenel decides what to do -- abort proc,
log, or proceed.

Sure it'll be slow, but how often are calls not paired with
returns? call jtable[eax*4] is the standard syntax for a
jump table, not `push eax/ret`

-- Robert

Yousuf Khan · May 27, 2004

Robert Redelmeier said:
In comp.sys.ibm.pc.hardware.chips Yousuf Khan

Easy. Overwrite the stack with crafted input to an unrestricted
input call (getch() is a frequent culprit). This is the basic
buffer overflow.

In the location for the return address (where EBP is usually
pointing), put in a return address that points to a suitably
dangerous part of the existing code. Like an `exec` syscall.
Above this return address, put in data to make that syscall
nefarious.

Nope, won't work. Segmentation would protect it completely. There is no way
for data written to the heap to touch the data in the stack. Stack segment
and data segment are separate. It's like as if the stack had its own
container, the code has its own, and the data heap its own. What happens in
one container won't even reach the other containers.

Face it, segments were the perfect security mechanism, and systems
developers completely ignored it!

Yousuf Khan

Yousuf Khan · May 27, 2004

Sander Vesik said:
There was one for quite a while for pre-386 modes/machines.

That was Minix. Linux has always been for 386 and later machines only.

Yousuf Khan

Grumble · May 27, 2004

Robert said:
Overwrite the stack with crafted input to an unrestricted
input call (getch() is a frequent culprit).

There is no getch() in ISO C.

fgetc(), getc(), and getchar() return a single character.

Perhaps you meant gets().

Grumble · May 27, 2004

Robert said:
Ah, but you make me think -- all current CPUs have an internal
hardware call/return stack to speed up branch [mis]prediction.

e.g. the Athlon implements a 12-entry return address stack to
predict return addresses from a near or far call. As CALLs are
fetched, the next EIP is pushed onto the return stack. Subsequent
RETs pop a predicted return address off the top of the stack.

It would be relatively simple to check this hw stack against
the memory stack and generate a fault if return addresses
don't match.

I think you've just killed the performance of recursive functions.

This could be enabled by a bit in the MSR if the OS has support
to handle/log "return addr faults". Most pgms should never
generate a return fault

This is where I think you are wrong.

The K8 has a counter to measure this event:

88h IC Return stack hit
89h IC Return stack overflow

It would be interesting to take, say, SPEC CPU2000, and count
the number of overflows for each benchmark. I might try.

Benny Amorsen · May 27, 2004

YK> That was Minix. Linux has always been for 386 and later machines
YK> only.

I think the ELKS people will be saddened to hear that.

/Benny

Casper H.S. Dik · May 27, 2004

I think you've just killed the performance of recursive functions.

And possibly longjmp()/setcontext() and the like; quite a bit of
additional work is needed to fix all such things (and if you want to
throw in binary compatibility, it's going to be harder still.

Casper

Robert Redelmeier · May 27, 2004

In comp.sys.ibm.pc.hardware.chips Grumble said:
I think you've just killed the performance of recursive functions.

I don't think so. For a recursive function there are many
calls, possibly flooding out the hw return stack. But every
call has a return, and that address _is_ correct on both the
hw and memory stacks.

88h IC Return stack hit
89h IC Return stack overflow

It would be interesting to take, say, SPEC CPU2000, and count
the number of overflows for each benchmark. I might try.

Excellent! I do not suggest trapping out overflows.
They're to occur on deep recursion which should not contain
evil getch() calls. Just trap misses.

-- Robert

Robert Redelmeier · May 27, 2004

In comp.sys.ibm.pc.hardware.chips Grumble said:
There is no getch() in ISO C.
Perhaps you meant gets().

Thank you for the correction. I do mean gets().
I apologize for any confusion.

-- Robert

Robert Redelmeier · May 27, 2004

In comp.sys.ibm.pc.hardware.chips Yousuf Khan said:
Nope, won't work. Segmentation would protect it completely. There is no way
for data written to the heap to touch the data in the stack. Stack segment
and data segment are separate. It's like as if the stack had its own
container, the code has its own, and the data heap its own. What happens in
one container won't even reach the other containers.

True in a literal sense.

But `c` compilers have this habit of allocating local variable
space on the stack. So when `char input[80];` is coded in a
routine, ESP gets decreased by 80 and that array is sitting
just below the return address!

I don't think it's _required_ by any standard that local vars are
allocated on the stack, but it sure makes memory managment easy.

AFAIK, only global vars and large malloc()s are put on the heap.

-- Robert

Why was Intel a no-show on No Execute?

Yousuf Khan

Grumble

glen herrmannsfeldt

Sander Vesik

Robert Redelmeier

Robert Redelmeier

Stefan Monnier

Yousuf Khan

Yousuf Khan

Robert Redelmeier

Robert Redelmeier

Yousuf Khan

Yousuf Khan

Grumble

Grumble

Benny Amorsen

Casper H.S. Dik

Robert Redelmeier

Robert Redelmeier

Robert Redelmeier

Ask a Question

Similar Threads