Advantages of Parallel Hz

  • Thread starter Thread starter Radium
  • Start date Start date
Multiplies usually take more than one cycle.

Typically they do but no law of the universe enforces this. You can
have a fast multiplier that does the job in one cycle. This is a lot
easier to do if the word length is small. As a result we can't hold
him to it taking more cycles.
The "THEN" (a conditional branch) also takes a cycle (unless you do
E = 1 as a conditional move).

There are a few porcessors out there that have other conditionals. The
ATT DSP chip had several such instuctions. So we can't hold him to
that either.

Naturally, if both of these things were included in the processors he
was repeating a billion times, the power would certainly be much
higher.


[....]
It's 5 with the branch (1 cycle if not-taken), and if you assume all variables were
allocated to registers and a multiply take just one cycle. But remember that the
OP was proposing a simple bit-serial CPU, so if registers are 32 bits wide, and
the multiply is done bit-serially too, it would take 1152 seconds to complete!

Atually if it is very simple, you need 32*65 shifts. The 64 bit output
area needs to go around one full cycle to do the add and then one
extra to align for the next bit. Only if you use some extra logic do
you get it down to a lower value.
 
I mentioned this earlier to him albeit without detailed example. But
Radium seems to conveniently overlooked it so I figured he's probably
not going to answer even if I repeated it a second time. Or maybe he
didn't understand the implications of my statement but let's see if he
does with your detailed example :ppPp

I think it is likely he didn't understand your point. He appears to
be inteligent but ignorant. He didn't know where the leakage current
happened for example.
 
krw said:
Multiplies usually take more than one cycle.

Usually, but not necessarily. Some processors do an fused multiply-
add too, saving another cycle ( [(A+B)] * [C-D] > 0). That wasn't
the issue though.
The "THEN" (a conditional branch) also takes a cycle (unless you do
E = 1 as a conditional move).

Nope. That's in the compare.
So E = 1; F = 2; G = 3; H = 4; I = 5; also takes zero cycles in total???

I don't see F, G, H, and I in the problem. Who said anything about
total cycles being zero?
Moves don't take zero cycles.

They certainly can. A processor with register renaming can certainly
do a move in zero cycles. In this case, all the MOVE does is change
pointers in the register file. BTW, some processors even to
exchanges in zero cycles.
Although they are pretty simple, they
need resources such as a constant decoder

WTF do I need a "constant decoder"?
and a register port to write the result to.

Nope. Just change the register name(s).
So moves take 1 cycle just like any other basic
ALU instructions.

Nope. They can be hidden since they need not use the ALU at all,
just a change to the rename file. Note that this move isn't an
independent instruction, so it's even easier to hide the move (a
rename is required at the completion of every arithmetic instruction
anyway).
It's 5 with the branch (1 cycle if not-taken), and if you assume all variables were
allocated to registers and a multiply take just one cycle. But remember that the
OP was proposing a simple bit-serial CPU, so if registers are 32 bits wide, and
the multiply is done bit-serially too, it would take 1152 seconds to complete!

Branches can be predicted or speculatively executed with the
rename/completion contingent on the results. There are all sorts of
games that can be played to save cycles.
 
krw said:
krw said:
Multiplies usually take more than one cycle.

Usually, but not necessarily. Some processors do an fused multiply-
add too, saving another cycle ( [(A+B)] * [C-D] > 0). That wasn't
the issue though.

Sure. I was just pointing out that the example will take a lot longer than you
might think. There are a lot of unstated assumptions here (register based
architecture etc)...
Nope. That's in the compare.

Few architectures combine a compare and branch, MIPS is one of the few
that can do that, but even MIPS cannot do the above comparison and branch
in one instruction.
I don't see F, G, H, and I in the problem. Who said anything about
total cycles being zero?

If you claim that E = 1 takes zero cycles then it follows that any similar
assignment takes zero cycles and thus my example would take zero
cycles. Instructions do not take zero cycles.
They certainly can. A processor with register renaming can certainly
do a move in zero cycles. In this case, all the MOVE does is change
pointers in the register file. BTW, some processors even to
exchanges in zero cycles.

Yes it is possible for a move to be implemented using register renaming
in an advanced CPU rather than using the ALU. However this only applies
to register-register moves, not constant moves. And it still doesn't mean
the move takes zero cycles. An advanced implementation could make
moves have zero cycle *latency*, but no CPU I know of does that.
WTF do I need a "constant decoder"?

Constants need to be extracted from the instructions, the bits grouped
together if not adjacent already and zero or signextended to the register
width. On some architectures constants can include a shift or expand into
a pattern like AB00AB00. So you always need to decode the constant
before it can be used.
Nope. Just change the register name(s).

Change the name to what exactly? Register renaming can only rename
registers to other registers, not constants to registers...
Nope. They can be hidden since they need not use the ALU at all,
just a change to the rename file. Note that this move isn't an
independent instruction, so it's even easier to hide the move (a
rename is required at the completion of every arithmetic instruction
anyway).

Which particular CPU are you talking about? An advanced CPU would
have executed E = 1 long before it had done any of the other instructions,
without using any register renaming tricks.
Branches can be predicted or speculatively executed with the
rename/completion contingent on the results. There are all sorts of
games that can be played to save cycles.

Sure. But the issue was how long a simple CPU like the OP proposed would take.
You're thinking of Pentium-class CPUs, but we're talking about far far simpler CPUs
executing at best 1 instruction per cycle (or 1 every N cycles for N-bit serial).

Wilco
 
How does SB16 ISA's FM synth freshly generate its instructions?

SB16 ISA's FM synth doesn't freshly generate its instructions.

It freshly synthesizes (electrical representations of) sounds based on
instructions it receives from the device that owns, and is controlling,
the ISA bus.
I want my CPU to freshly generate instructions in a similar manner
instead of playing them back from ROM.

Go ahead! Build one and show us how it's done! If you're just trying
to convince somebody to do the work for you, then offer money. ;-)

If you're looking for someone to show you how to implement your
ideas, then please learn how to visit the astral plane.

Good Luck!
Rich
 
SB16 ISA's FM synth doesn't freshly generate its instructions.
It freshly synthesizes (electrical representations of) sounds based on
instructions it receives from the device that owns, and is controlling,
the ISA bus.

Okay. That is what I meant. Sorry for the misunderstanding.

I want my CPU to freshly generate electronic signals [instead of
playing them back from ROM] "based on instructions it receives from
the device that owns, and is controlling" it.

Thanks for clearing this up.

Sometimes, I need to be told what I want.
 
Radium said:
SB16 ISA's FM synth doesn't freshly generate its instructions.
It freshly synthesizes (electrical representations of) sounds based on
instructions it receives from the device that owns, and is controlling,
the ISA bus.

Okay. That is what I meant. Sorry for the misunderstanding.

I want my CPU to freshly generate electronic signals [instead of
playing them back from ROM] "based on instructions it receives from
the device that owns, and is controlling" it.

So you want micro code :-) But where do you get the instructions that
control the micro code from? Or are you generating those instructions
too? If so, where do those instructions come from? And so on...

You always need to store and play back instructions one way or another,
they can't generate themselves automatically you know. Lots of people
would be out of a job if computers could write their own software...
Thanks for clearing this up.

Sometimes, I need to be told what I want.

Thanks for making my day!

Wilco
 
I want my CPU to freshly generate electronic signals [instead of
playing them back from ROM] "based on instructions it receives from
the device that owns, and is controlling" it.


So you want to pervert the CPU into a device that doesn't
process. Once you have that new non-CPU device, all you'd
have to do is then add a CPU to the system in addition to
this new device so you can regain the functions you stripped
away.
 
Wilco said:
On May 5, 9:57 pm, "Bob Myers" <[email protected]> wrote:
You have yet to ask a serious question.
How does SB16 ISA's FM synth freshly generate its instructions?
SB16 ISA's FM synth doesn't freshly generate its instructions.
It freshly synthesizes (electrical representations of) sounds based on
instructions it receives from the device that owns, and is controlling,
the ISA bus.

Okay. That is what I meant. Sorry for the misunderstanding.

I want my CPU to freshly generate electronic signals [instead of
playing them back from ROM] "based on instructions it receives from
the device that owns, and is controlling" it.


So you want micro code :-) But where do you get the instructions that
control the micro code from? Or are you generating those instructions
too? If so, where do those instructions come from? And so on...

You always need to store and play back instructions one way or another,
they can't generate themselves automatically you know. Lots of people
would be out of a job if computers could write their own software...

Thanks for clearing this up.

Sometimes, I need to be told what I want.


Thanks for making my day!

Wilco
Of course one can generate the instructions to the sound card
algorithmically rather that using stored sequences. Likewise with Just
in Time compilation or interpretation, one can generate sequences of
instructions algorithmically rather than use a classic stored object
program. Just think, a digital theramin.
 
How does it not mean anything?

By not meaning anything.

Here's how to find out:

Get a piece of paper and a pencil or pen, and write down exactly what
it _does_ mean, in plain English. That should help to understand the
phrase "doesn't mean anything."

Good Luck!
Rich
 
Del Cecchi said:
Wilco said:
On Sun, 06 May 2007 01:08:43 -0700, Radium wrote:


You have yet to ask a serious question.

How does SB16 ISA's FM synth freshly generate its instructions?

SB16 ISA's FM synth doesn't freshly generate its instructions.

It freshly synthesizes (electrical representations of) sounds based on
instructions it receives from the device that owns, and is controlling,
the ISA bus.

Okay. That is what I meant. Sorry for the misunderstanding.

I want my CPU to freshly generate electronic signals [instead of
playing them back from ROM] "based on instructions it receives from
the device that owns, and is controlling" it.


So you want micro code :-) But where do you get the instructions that
control the micro code from? Or are you generating those instructions
too? If so, where do those instructions come from? And so on...

You always need to store and play back instructions one way or another,
they can't generate themselves automatically you know. Lots of people
would be out of a job if computers could write their own software...

Thanks for clearing this up.

Sometimes, I need to be told what I want.


Thanks for making my day!

Wilco
Of course one can generate the instructions to the sound card algorithmically rather
that using stored sequences. Likewise with Just in Time compilation or interpretation,
one can generate sequences of instructions algorithmically rather than use a classic
stored object program. Just think, a digital theramin.

But in these cases there is another program that does the generation/translation/
interpretation/decompression. That program itself must be stored and played back
from memory somehow.

In any case the data used to generate the instructions from is a stored program
as well that is simply played back from memory. Does it matter whether the same
binary can be intepreted, JIT'd, executed as micro code sequences or directly
executed by different implementations? In many cases you can't tell, eg. Transmeta.
I call that binary a stored program played back from memory irrespectively of how
it is executed.

Wilco
 
Rich Grise said:
By not meaning anything.

Exactly....welcome to Radium's Department
of Redundancy Department.

"When I use a word, it means just what I choose it
to mean - neither more nor less."

- Humpty Dumpty, to Alice,
"Alice's Adventures in Wonderland"

Bob M.
 
Okay. That is what I meant. Sorry for the misunderstanding.
I want my CPU to freshly generate electronic signals [instead of
playing them back from ROM] "based on instructions it receives from
the device that owns, and is controlling" it.
So you want micro code :-) But where do you get the instructions that
control the micro code from?

No microcode.
Or are you generating those instructions
too?
Yes.

If so, where do those instructions come from?

Hardware logic.

If you read the wikipedia links I posted and quoted, you'll find that
there is a real-time, hardware-based alternative to ROM and microcode.
 
one can generate sequences of
instructions algorithmically rather than use a classic stored object
program.

Obviously, I prefer the former over the latter. I like real-time
hardware. I dislike latency and buffering and want the least of them
as possible. In order to have the least amount of latency and
buffering, all parts of the PC must be fully-hardware with as little
software as necessary.
 
Obviously, I prefer the former over the latter. I like real-time
hardware. I dislike latency and buffering and want the least of them
as possible. In order to have the least amount of latency and
buffering, all parts of the PC must be fully-hardware with as little
software as necessary.

OK, then just go ahead and design the hardware algorithms that generate
the programs that do what you want them to do, and show us.

Even a block diagram would be fine. :-)

BTW, Radium, are you, by any chance, autistic?

Thanks!
Rich
 
OK, then just go ahead and design the hardware algorithms that generate
the programs that do what you want them to do, and show us.

I wish commercial PCs were made that way. Sadly, my wish is way too
good to ever be true.
Even a block diagram would be fine. :-)
LOL.

BTW, Radium, are you, by any chance, autistic?

I am Aspergered.
 
Robert said:
Thanks. Very nice. There are some decent thinclients available.
Mostly looks like for internet cafes, classrooms, workrooms.

A bigger market might be "standalone" thin clients. Home internet
appliances with built-in browsers, but otherwise no programmability
(state retained). Certainly more than the 32 MB LTSP minimum, but
probably not more than 128 MB RAM. US$50 plus monitor (or S-video
out for exhibitionists/remotecontrol-hogs who want to surf on the TV!)

Such a device would be attractive to non-computer experts
(nothing to go wrong) or as second PCs in a household.

-- Robert
Add an HDMI output to that for all the large wide-screen types. The added
resolution is quite worth it.
 
I am Aspergered.

Thanks for this.

Would you like special kid-glove treatment, or is your condition under
control enough that our feedback is bearable?

Thanks,
Rich
 
I wish commercial PCs were made that way. Sadly, my wish is way too
good to ever be true.

Not getting what you want can be frustrating but the real disasters
happen when you do get what you want but didn't fully think through
what you wanted. This may be that sort fo case. The current line of
processor evolution is from a simple concept that the CPU does what it
is told. They are very slavish in this in that if you tell them to
add zero to a number, they will perform the add without noticing that
it is a silly thing to do.

Some processor are "microcoded". I suggest you look this up and study
it a bit. It is a concept that is heading in the direction you seem
to be thinking. The microprocessor translates a single instruction
into a short list of more basic operations that do the operation step
by step.

"microcode" tends to make for a slower processor. In order for it to
show any speed advantage, more microcode operations must be done per
second than the native instruction speed. If the internals of the
micor runs at many GHz but the I/O is only 100MHz, there can be
advantages.
 
Back
Top