Advantages of Parallel Hz

joseph2k · May 14, 2007

MooseFET said:
It's a matter of personal preference.

Click to expand...

Yes but what is the basis of this preference? Did a ROM maker say
something rude to you? :> That would be a good reason.

[....]

Why?

Click to expand...

Lets try a simpler example: I want to make a box that finds the
length of the third side of a triangle when two sides and the angle
are known. This is the equation:

Z = SQRT( X*X + Y*Y - 2*X*Y*COS(Angle))

Lets imagine that we are going to use Newton's method to find a square
root.

Newton's method works like this:

INPUT Y
TryValue = SomeGuessMethod(Y)

Loop till good enough
NewValue = Y / TryValue
TryValue = (TryValue + NewValue)/2
end loop
PRINT "The squareroot is ", TryValue

One of the very nice thing about this way of finding the square root
is that the accuracy of the TryValue grows very quickly. The number
of digits it is good to doubles every time around the loop. The
problem is that you need to come up with an initial guess. Always
guessing 1.0 initially does work but it takes a large number of loops
to find the square root of a million if you start that way.

Adding a fairly simple look up table will get the answer much quicker.

Another issue pops up for the example equation that it needs the value
2.0. Since 2.0 is 1.0+1.0, the value can be generated but this costs
more time.

There is also the method for finding the cos() to consider. Most
cosine routines use a look up table to find the exact cosine of a
nearby angle and how the curve is changing in that area. In just a
few steps, you can have a quite exact cosine if you do this. There
are other ways of finding the cosine that don't need any tables. I
bet you've already guessed why those aren't used.

[....]

I don't use zip, unless I really need to. So far, I've never had to.

Winzip is a pain in the @$$.

Click to expand...

Yes it is gzip, pkzip and UncleBobsZip will do better than WinZip.
They all take time to do however.

Is the ROM built into the keyboard?

Click to expand...

There is a ROM in the keyboard and one in the PC. Inside the keyboard
is a little micro-controller with a ROM. It keeps track of which
buttons you push etc. This information is sent to the PC. Back in
the days of DOS, it was then the ROM in the PC that assigned the keys
their ASCII codes.

It may sound funny to make it take too steps to do the assigning of
the ASCII but there is a very good reason for it. The basic keyboard
design could be used all around the world. The BIOS could be made
different from country to country to handle the differences in what
the keys mean.

Close on the keyboard thing, actually sending key number closures/openings
(which is what the original IBM PC did) is absurdly handy to handle shift,
control, alt etc., keys and other multiple keypresses (control-alt-delete)
as well as simplifying keyboard design.

joseph2k · May 14, 2007

Radium said:
Confidential information remains no matter how much you try to format
it.

The stored magnetic information always exists to some extent no matter
how many times it is "erased" or overwritten.

Not at all true, overwrite with different garbage each of several times, the
original is no longer recoverable. Perhaps you should study some of the
physics of how various storage media actually work. Please remember that
writing ones or zeros or both regardless of media is relatively
ineffective.

Device sensitive to extremely weak magnetic signals, can still recover
information.

With an electric chip it is much easer to permanently remove
confidential data.

Any kind of "stored" field is equally recoverable.

I like things to be lively. Discrete logic is far more effervescent
than ROM.

Please explain how hardwired logic can be more effervescent (ephemeral) than
ram.

joseph2k · May 14, 2007

Radium said:
In that case, yes, some amount of info is stored, however, it is the
minimum required.

My dream PC stores this "ROM" as discrete logic.

What a curious idea. Too bad the inter-chip interconnect time losses wipe
out any possible advantage.

Discrete logic is
faster than ROM.

My dream PC is as hardware, real-time, and digital as possible.

As if anything about a PC other than the power supply and some sensors was
anything else.

In
addition, it uses the least amount of buffering required [hopefully
none]

What are you rambling on about?

and experiences the least amount of latency possible [again,
hopefully none].

Latency is a fact of nature. Interactivity is a design issue, which is
seriously degraded with graphical interfaces. The masses have spoken,
eye-candy is king.

Andrew Reilly · May 14, 2007

That is very true but to Radium they are seen as different is some
other way. I think the problem is that he doesn't understand the
technology well enough to be comfortable with the idea that a PROM, a
CPLD, a FPGA and descrete logic that solve a given problem must be
encoding the same information. Exactly how it is encoded will be very
different as you say but it none the less must be encoded.

Having seen Radium go on about sound cards on comp.dsp some time ago, I
think (now) that I understand his point, and it's a subtle one, but it's
actually (I think) becoming a more significant one as the performance
relationship between memory and processors change.

It's the difference between using the 2D video accelerator in your
graphics card to render your desktop icons from SVG, rather than copying
some bitmaps that someone has rendered (once) with PhotoShop.

It's TruType or PostScript fonts, rather than bitmap fonts.

It's on-the-fly synthesis rather than wave-table or sample synthesis.

It's the wheel of reincarnation coming up behind Terje's "all computing
can be viewed as an exercise in caching" aphorism.

Pretty obviously there are serious limits to how far one can "generate on
the fly" and still do anything useful, but I don't think that we've
returned to the limit yet, either.

Cheers,

Radium · May 14, 2007

It's on-the-fly synthesis rather than wave-table or sample synthesis.

Exactly.

MooseFET · May 14, 2007

Having seen Radium go on about sound cards on comp.dsp some time ago, I
think (now) that I understand his point, and it's a subtle one, but it's
actually (I think) becoming a more significant one as the performance
relationship between memory and processors change.

It's the difference between using the 2D video accelerator in your
graphics card to render your desktop icons from SVG, rather than copying
some bitmaps that someone has rendered (once) with PhotoShop.

It's TruType or PostScript fonts, rather than bitmap fonts.

It's on-the-fly synthesis rather than wave-table or sample synthesis.

It's the wheel of reincarnation coming up behind Terje's "all computing
can be viewed as an exercise in caching" aphorism.

Pretty obviously there are serious limits to how far one can "generate on
the fly" and still do anything useful, but I don't think that we've
returned to the limit yet, either.

He has included the conflicting requirement that it be fast and low
powered. All of these methods require more time and / or power. You
can use fractals to generate a picture of a mountain but a ROM picture
is a lot quicker and a lot lower powered way to go.

Andrew Reilly · May 15, 2007

He has included the conflicting requirement that it be fast and low
powered. All of these methods require more time and / or power. You
can use fractals to generate a picture of a mountain but a ROM picture
is a lot quicker and a lot lower powered way to go.

Once upon a time, that was a given. It isn't necessarily the case, now
that FUs consume less power than pin drivers and memory controllers. Can
be faster, too, if latency to access the picture is an issue. I'm not
saying that all caching and storage will go away, by any means, but it's
worthwhile noticing that some of the trade-offs are changing.

For example, OpenVG (for 2D) and OpenGL ES seems to be becoming popular in
mobile phone GUI development because it's *cheaper*, both in terms of
storage space and power consumption to render on the spot, rather than
carry around pre-rendered bitmaps.

Cheers,

Radium · May 15, 2007

Once upon a time, that was a given. It isn't necessarily the case, now
that FUs consume less power than pin drivers and memory controllers.

What are "FUs"?

MooseFET · May 15, 2007

On May 13, 7:57 pm, Andrew Reilly <[email protected] [....]
He has included the conflicting requirement that it be fast and low
powered. All of these methods require more time and / or power. You
can use fractals to generate a picture of a mountain but a ROM picture
is a lot quicker and a lot lower powered way to go.

Click to expand...

Once upon a time, that was a given. It isn't necessarily the case, now
that FUs consume less power than pin drivers and memory controllers. Can
be faster, too, if latency to access the picture is an issue. I'm not
saying that all caching and storage will go away, by any means, but it's
worthwhile noticing that some of the trade-offs are changing.

It is still true enouugh today to be a good assumption. Remember that
Radium is also ruling out the methods by which the software can be
stored. He isn't just against the images being stored he has also
stated that he wants to hold the amount of ROM to the mathematical
minimum. This would mean that a lot of the methods used to render
things in real time would go a foul of his requirements.

Radium · May 15, 2007

It is still true enouugh today to be a good assumption. Remember that
Radium is also ruling out the methods by which the software can be
stored. He isn't just against the images being stored he has also
stated that he wants to hold the amount of ROM to the mathematical
minimum. This would mean that a lot of the methods used to render
things in real time would go a foul of his requirements.

Well, as for digital images, I don't mind storing them as long they
are fully uncompressed and have enough pixelXpixel of resolution so
that I don't notice any "squares" that result from inadequate image
resolution.

Bob Myers · May 15, 2007

Radium said:
What are "FUs"?

An essential component of a FUBAR.

Bob M.

Bob Myers · May 15, 2007

Well, as for digital images, I don't mind storing them as long they
are fully uncompressed and have enough pixelXpixel of resolution so
that I don't notice any "squares" that result from inadequate image
resolution.

Squares have nothing to do with image resolution; they
have to do with improper reconstruction filtering being
imposed by the display. Pixels, in the sense of those things
that make up images (as opposed to the physical elements
of a fixed-format display), are not little squares.

For that matter, something in the form of "X by Y pixels"
is NOT a statement of resolution.

Bob M.

Winfield · May 19, 2007

ChrisQuayle said:
Sounds a great idea, but to clarify, I guess the instruction
stream, which of course wouldn't originate from parallel
programming techniques, languages, or compilers, just utilises
simple hardware serial to parallel converters to dish out
intructions to all the cpu's, radially ?.

Amazing that no one has thought of this before...

Chris

Greenfield Designs Ltd
Electronic and Embedded System Design
Oxford, England
(44) 1865 750 681

They haven't, because it's wrong.

For cmos, P = f C V^2, which tells us that for N
processors running at f' = f/N the power will be
P = N f' C V^2 = N f/N C V^2 = f C V^2, which is
no improvement at all, even if N = 1 billion.

krw · May 19, 2007

They haven't, because it's wrong.

For cmos, P = f C V^2, which tells us that for N
processors running at f' = f/N the power will be
P = N f' C V^2 = N f/N C V^2 = f C V^2, which is
no improvement at all, even if N = 1 billion.

I'm not sure how this plays out when you throw in leakage (it's quite
significant). If 'f' can be kept to a minimum one can do all sorts
of things to reduce leakage. It doesn't solve the parallel
programming issue though.

MooseFET · May 19, 2007

They haven't, because it's wrong.

For cmos, P = f C V^2, which tells us that for N
processors running at f' = f/N the power will be
P = N f' C V^2 = N f/N C V^2 = f C V^2, which is
no improvement at all, even if N = 1 billion.

Actually it is P = f C V^2 + MoreStuff

With f=0, each gate loses a little power due to leakage. This means
that 1 processor doing 1 million instructions persecond will draw less
power than 1 million 1Hz processors.

krw · May 19, 2007

Actually it is P = f C V^2 + MoreStuff

With f=0, each gate loses a little power due to leakage. This means
that 1 processor doing 1 million instructions persecond will draw less
power than 1 million 1Hz processors.

Not necessarily. At 1Hz, one can lower the voltage pretty low and
build gates pretty thick. Both will minimize leakage.

MooseFET · May 20, 2007

Not necessarily. At 1Hz, one can lower the voltage pretty low and
build gates pretty thick. Both will minimize leakage.

Good point.

I think you run into a lower limit. Vcc must be more than Vth and the
OP wants to make a billion processors in a reasonable amount of
space. I'm fairly sure that something would set a lower limit on
leakage before you get down to 1Hz.

krw · May 20, 2007

Good point.

I think you run into a lower limit.

That limit is quite low, as evidenced by a digital watch.

Vcc must be more than Vth and the
OP wants to make a billion processors in a reasonable amount of
space.

Not necessarily. Sub-threshold operation is possible and Vth can be
very low. There is a lot of work in this area because leakage is
such a big problem.

I'm fairly sure that something would set a lower limit on
leakage before you get down to 1Hz.

Probably the wires from the battery. ;-)

However, 1Hz x 1Billion is an absurdity to show an example. How
about a MHz by a thousand? If the parallelism problem can be cracked
there are all sorts of games that can be played with all those free
transistors.

MooseFET · May 20, 2007

That limit is quite low, as evidenced by a digital watch.

Not necessarily. Sub-threshold operation is possible and Vth can be
very low. There is a lot of work in this area because leakage is
such a big problem.

Click to expand...

I thought that when you made a low Vth, you always got a higher
leakage. It seems that this must be true if the GM is to remain
finite at the extreme of Vth=0.

Probably the wires from the battery. ;-)

Click to expand...

It is likely the leakage in the supply capacitors would be more than
in the wires. Also if the supply has Schottky diodes in it, the
leakage in them will be high. If we are considering subvolt Vdd then
the recifier would likely be a MOSFET.

However, 1Hz x 1Billion is an absurdity to show an example.

Click to expand...

I see it more as an extreme case to make the argument clearer, but yes
it is absurd.

How
about a MHz by a thousand? If the parallelism problem can be cracked
there are all sorts of games that can be played with all those free
transistors.

Click to expand...

Parallelism has been cracked for special case problems. The very
large CPU time users seem also to be the places where the problem is
parallel in nature. Modeling the flow of fluids, explosions and the
propigation of waves through nonuniform mediums are the things that
come to mind quickly.

Click to expand...

Radium · May 20, 2007

However, 1Hz x 1Billion is an absurdity to show an example. How
about a MHz by a thousand? If the parallelism problem can be cracked
there are all sorts of games that can be played with all those free
transistors.

Well, "parallel Hz" is meant for problems that are serial.

"Parallel Hz" actually doesn't have anything to do with whether the
task is parallelizable or not.

The parallelism you are describing has to do with the bits being
parallel [such as in a parallel printer].

"Parallel Hz" is a different story.

Advantages of Parallel Hz

joseph2k

joseph2k

joseph2k

Andrew Reilly

Radium

MooseFET

Andrew Reilly

Radium

MooseFET

Radium

Bob Myers

Bob Myers

Winfield

krw

MooseFET

krw

MooseFET

krw

MooseFET

Radium