Tilera to Introduce 64-Core Processor

Q

Qu0ll

Tilera to Introduce 64-Core Processor

Is it YAXP (Yet Another X86 Processor) or is it introducing a new
instruction set?

--
And loving it,

-Q
_________________________________________________
(e-mail address removed)
(Replace the "SixFour" with numbers to email me)
 
K

krw

a?n?g?e? said:
So it'll just be a real mess isn't it?

That's why they pay the big bux. ;-)
And they can't just make more cores using the same blue print can
they?

They do, More or less. The designs are identical, the layouts may or
may not be the same. Sometimes a mirror or rotated image is wanted.
If the cache, memory controller, or bus interface unit is in the
center a better layout can be had by tailoring the cores to the
specific needs. They are not photographic images of each other
though.
Since changing the design
from 64 core to say 128 core would require a whole new design with
like double the number of interconnects no?

Double the number, or 2^n. They're spread over a larger area though.
Wiring congestion and lengths get to be problems though. I've never
been involved with 128 processors, though. I'm sure there are other
gotchas in there.
Or are they actually just saying every core is identical and only
directly connected to their immediate neighbours? Since this seems to
be the logical way to me.

Logical, simple, wrong. ;-) How do you get to the outside? You're
strangling the inner processors of memory and swamping the outer ones
with data.
Wouldn't this be like getting close to the same problem as whowasit's
parallel hz idea?

Seems to me like this 64core thingy is only going to
be good for certain problems since most stuff just isn't going to be
really parallelizable to such an extent. While many things will just
suffer from the low clockspeed/high latency. In other words, not going
to be a mass market product?

Some day it may. That someday isn't today, at least in the commodity
market.
 
K

krw

Like L'Angel, I'm a bit puzzled, but in a different direction:
Lay the cores out in a flat checkerboard. Bus each row together
to a shared x8 section of L2 cache. One/two layers max. Those
cores better each have their own L1s!

My point was that as transistors get smaller, more layers are needed
simply to support the local interconnect. We were at ten layers
three years ago. More doesn't come cheap.

One layer doesn't do much since you don't get to put resistors on top
to jump over wires. ;-) Two aren't going to support much density of
wiring either.

And power/gd current handled. Also to avoid overloading
L2 and main RAM busses.

Yes, and think about the number of L2 ports needed. <shudder>
 
W

Wes Newell

Is it YAXP (Yet Another X86 Processor) or is it introducing a new
instruction set?

It's a new cpu. Right now only supported with Linux. Which is fine with
me. I don't use Microsloth anyway.
 
G

Guy Macon

Alex said:
That's the point where it becomes interesting. Having only a few of some
resource makes it hard to manage. Having one is easy - there's no choice.
having lots is easy - use what you can. Having a few means you have to
choose carefully.

That one is going into my Quote File:

"[This is] where it becomes interesting. Having only a few
of some resource makes it hard to manage. Having one is
easy - there's no choice. Having lots is easy - use what
you can. Having a few means you have to choose carefully."
-Alex Colvin
 
G

Guy Macon

Nate said:
I doubt it will work very well as a game platform CPU per se, although for
GPUs, NVidia is going in sort of that direction with their generalized
stream processors on the GF8 series.

It does make you wander about it n-bing used as a *chess* game
playing platform though. 64 cores, 64 squares... Maybe there
is some clever optimization waiting to be discovered. :)
 
G

Guy Macon

Qu0ll said:
Is it YAXP (Yet Another X86 Processor) or is it introducing a new
instruction set?

Tilara isn't saying. And the *would* be saying if it was x86.
 
G

Guy Macon

Wes said:
It's a new cpu. Right now only supported with Linux. Which is fine with
me. I don't use Microsloth anyway.

It seems strange to me that they say the following things:

[1] It runs Linux.

[2] It has a new instruction set and architecture

[3] The C/C++ compiler is still in the future.

What's wrong with this picture?
 
J

John M

I find all of the discussions interesting; most of them were analogous to
the ones we had in the 80's when DARPA sponsored the Supercomputing Program.
A number of multiprocessor chips were built and used to implement massively
parallel computers. An example of which, was the Thinking Machines 64,000
processor computer, which was later extended beyond 64K processors.
Communications between processors and fast memory accessibility to all the
processors was and still is the problem with massively parallel computers.
It became obvious that these computers were very useful for specific types
of problems which fit their architectures. They ended up not being very
useful as general purpose machines, unless you were running many
simultaneous applications and even then they usually were not cost
effective. Don't get me wrong they were very cost effective for some very
specific problems.

There is not enough information in this article to show me that they have
come up with a substantially better solution to the high speed general
purpose computing problem in a cost effective manner. If anything, the
market they are going after coincides with the ones that the 80's machines
were useful in.

Regards

John




AirRaid said:
Tilera to Introduce 64-Core Processor
By Andy Patrizio

An MIT-inspired startup will introduce a new multi-core chip today at
the annual Hot Chips conference at Stanford University. The TILE64
boasts a "clean sheet" design, unencumbered by any legacy
compatibility concerns, that Tilera says will provide a huge leap in
multithreaded performance.

Tilera was founded in 2004 to bring to market the multi-core processor
designs of MIT researcher Anant Agarwal. Agarwal created what he
called a "mesh" multi-core architecture, where the cores are all
interconnected rather than going through a frontside bus, as Intel's
multi-core chips do.

Agarwal first created this multi-core architecture in 1996, long
before Intel and AMD were anywhere close to doing it. The project
received funding from the Defense Advanced Research Project Agency
(DARPA) and the National Science Foundation, the agency that managed
the Internet for decades.

Tilera holds 40-plus patents for its multi-core design. TIL64 will be
the first in a series of processors built around massively multi-core
chips. The TILE64 processor contains 64 full-featured, programmable
cores that Tilera claims can perform 500 billion operations per second
and delivers ten times the performance and thirty times the
performance-per-watt of the Intel dual-core Xeon.

Agarwal said the company can make these performance leaps because it
doesn't use any legacy technologies or designs.

"The real problem with scale is existing multi-core architectures use
a bus. In that architecture, the bus is a central switch and all the
cores are connected to the single central switch. A packet has to go
through it no matter what, which is fine for one, two or four cores,
but it does not scale," he told internetnews.com.

Tilera uses a mesh architecture, where the cores are laid out in a
checkerboard-like grid, all connected through high-speed
interconnects. "In architectures of this sort, you can keep growing
and you won't have any serious congestion," said Agarwal.

Intel has promised to dispense with the frontside bus with the Nehalem
architecture, due late next year. AMD does not have a frontside bus in
the Opteron, but it's also using four cores at the most, while Tilera
is at 64.

The TILE family can scale up to even more, or down to a two-core
design for the smallest of designs, such as a cell phone. Its power
consumption is a few hundred milliwatts per core, Agarwal said. Its
clock speed will range from 600MHz to 1GHz.

But there's a lot more on the chip than just cores. It has a pair of
10 gigabit Ethernet ports directly on the chip for high speed
networking, as well as on-board I/O and peripheral controllers. Its
integrated memory controllers allow for up to 200 gigabits of memory
bandwidth within the chip.

That's what made the TILE64 chip so appealing to Top Layer, developer
of network security and intrusion detection appliance. The company had
built its own processors but now plans to switch to Tilera's chips,
according to Chief Strategy Officer Mike Paquette.

"Our software is a multi-core design, and we were able to map out
functionality almost 1 for 1 for each process to a core in a Tilera
chip," he said. "The performance we expect in our estimates exceeds
what we could have gotten from any silicon providers."

Top Layer decided to license processors for future products rather
than the expense of building any more, and no other processors had the
scalability. "Because the movement of data is so much of what we do,
we needed a multi-core chip that was optimized for what we were doing
rather than something optimized for general purpose computing Tilera
has capabilities for network capabilities that are far ahead of what
you can get from [x86] processors," said Paquette.

Tilera will ship a full development toolkit, called the Multicore
Development Environment (MDE), for building applications. It's an
Eclipse-based Integrated Development Environment (IDE) with an ANSI
standard C compiler, an application level library and tools for
debugging and profiling multi-core processors.

Wisely, Tilera is not taking on Intel and AMD right out of the gate,
as Transmeta did. It's going for the embedded market.

"We're focused on embedded because we are a startup and want to go
into a space where there is massive demand for performance like ours.
We can focus on a couple of markets and do really well in those
markets by addressing customer demands squarely and don't have to go
up against a dominant competitor," said Agarwal.

Tilera expects to sell the TILE64 processor for $435 in lots of 10,000
units. The company is also planning a 36-core and 120-core processor
for the near future.


http://www.internetnews.com/ent-news/article.php/3695116
 
R

Robert Redelmeier

In comp.sys.ibm.pc.hardware.chips krw said:
My point was that as transistors get smaller, more layers
are needed simply to support the local interconnect. We were
at ten layers three years ago. More doesn't come cheap.

A good point, but I wonder if those extra layers are
instrinsically necessary, or a result of trying to alleviate
timing problems from pushing clock. As you point out, heavy
MP could not push clock.
One layer doesn't do much since you don't get to put
resistors on top to jump over wires. ;-) Two aren't going
to support much density of wiring either.

Sorry. I guess you count insulating layers on chips,
unlike PCBs.
Yes, and think about the number of L2 ports needed. <shudder>

Maybe not. Just bus access by row and separate L2s for each row.
Let MOESI snooping pick it up.

-- Robert
 
D

Del Cecchi

Robert Redelmeier said:
A good point, but I wonder if those extra layers are
instrinsically necessary, or a result of trying to alleviate
timing problems from pushing clock. As you point out, heavy
MP could not push clock.


Sorry. I guess you count insulating layers on chips,
unlike PCBs.


Maybe not. Just bus access by row and separate L2s for each row.
Let MOESI snooping pick it up.

-- Robert
I think current state of art processes are at or above 10 levels of
metal, plus insulator and vias.
 
N

Nate Edel

In comp.sys.ibm.pc.hardware.chips Guy Macon said:
It does make you wander about it n-bing used as a *chess* game
playing platform though. 64 cores, 64 squares... Maybe there
is some clever optimization waiting to be discovered. :)

Perhaps so! I'm sure some of the AI guys are going to love these things.
 
K

krw

A good point, but I wonder if those extra layers are
instrinsically necessary, or a result of trying to alleviate
timing problems from pushing clock. As you point out, heavy
MP could not push clock.
The layers are necessary because the transistors are more densely
packed than wires can be. Wires don't scale like silicon. Copper
helped quite a bit, but there is nowhere to go from there.
Sorry. I guess you count insulating layers on chips,
unlike PCBs.

No, just counting metal. Wires can't be infinitely thin, nor with a
zero pitch.
Maybe not. Just bus access by row and separate L2s for each row.
Let MOESI snooping pick it up.

You'd still have at least 18 ports to each L2. <yikes!>
 
B

Bernd Paysan

krw said:
The layers are necessary because the transistors are more densely
packed than wires can be. Wires don't scale like silicon.

Actually, it's a bit different. If you just shrink down a design, you'll see
that R remained constant (supposed the height of the wire didn't change), C
up and down is reduced as well, and C to the sides increases. Overall, you
can say the RC constant of the wire hasn't changed. So it doesn't scale
like the transistors, where the RC constant improves. That's why wire speed
dominates recent processes, and shrinks don't give much performance
improvements.

However, for the number of layers, there's a completely different
consideration, and that depends on how many gates you want to connect on a
chip, and how long the average connection is. With more gates, the average
wire length goes up. So you approximately need one additional metal layer
when you double the gate numbers on a chip.

This rule of thumb doesn't apply for tiled chips. There, each tile is a
separate entity, and since it only communicates with the neighbors, no
longer wires are needed. You can create your tile, route it as dense as
possible (e.g. with 6 metal layers), and then replicate it - you won't need
more layers, as this is not a hierarchical connection, but a flat one.
Copper helped quite a bit, but there is nowhere to go from there.

There is: carbon nanotubes.
 
A

aku ankka

f(n) = (n-1)*n/2

f(64) = 2016

It's most likely not implemented as P2P, didn't read the specs but
that's a high propability.
 
?

=?ISO-8859-1?Q?Jan_Vorbr=FCggen?=

True. Most systems are running more than one program. But most programs
are single-threaded. And the cost of thread initiation and switching
doesn't justify a second core.

Turn on the thread-number display in your task manager process view on
Winwoes, and tell me that column contains mostly 1. Not true.

Jan
 
S

Sebastian Kaliszewski

Jan said:
Turn on the thread-number display in your task manager process view on
Winwoes, and tell me that column contains mostly 1. Not true.

But those threads are sleeping 99.99% of the time. That's the reality

rgds
\SK
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top