+---------------
| > I don't think it's _required_ by any standard that local vars are
| > allocated on the stack, but it sure makes memory managment easy.
|
| It also facilitates recursion and re-entrancy. But it needn't be the
| same stack as the return linkage pointer.
+---------------
But if you *don't* do it, then you have trouble with stack fragmentation
and/or collisions with your "argument stack" expanding at a different rate
than your "linkage stack", resulting in one or the other bumping into
arbitrary limits at inconvenient times. As a result, one or the other
of the stacks gets pushed off into the heap (usually the argument stack)
as a linked list of stack-allocated "malloc()" blocks [optimized by
allocating a bunch at a time], which puts a lot of stress on "malloc()",
or gets pushed into a separately-managed segment of address space, which
puts pressure on memory allocation in general and the dynamic loader in
particular.
We had some of these issues with the Am29000 Subroutine Calling Standard
(circa 1987), which had both a "register cache" stack for linkage
information and "small" arguments (which were passed in registers)
and a "memory" stack for "large" arguments (as well as *any* argument,
regardless of size, that the called subroutine referenced by address).[1]
Had the 29k CPU family ever made it into the 32-bit Unix[2] workstation
market, where as we know address space layout has become an issue
(especially with an ever-larger number of DLLs or DSOs competing for space),
the two-stack calling sequence could have become quite problematic.
[As it was, in the embedded-processor space it was pretty much a non-issue.]
-Rob
[1] Actually, the rule was that the first 16 *words* of arguments got
passed in registers and any further words of arguments got passed
on the memory stack, except that if the called routine referenced
any of the first 16 words by address (e.g., "&foo") then that word
and all subsequence words of the register args would get copied into
the memory stack at subroutine entry. Yes, this meant that whenever
the memory stack got used at all there was a 64-byte area at the
front reserved in case the first 16 words needed to be manifested
in memory. (*Ugh*)
[2] Both BSD and System-V ports were done to the Am29000 -- both were
quite straightforward since the 29k was a friendly target enviroment --
but shortly after both were up & running AMD chose not to promote
the 29k as a Unix engine, and they were abandoned.