PC Review


Reply
Thread Tools Rate Thread

Clarifications about AMD TLB L3 bug

 
 
Robert Myers
Guest
Posts: n/a
 
      14th Dec 2007
A number of assertions have been made here about the AMD TLB L3 Bug:

1. Only affects virtualization.

2. Is fixed in 64-bit Linux without a significant performance hit.

1. TRUTH: AMD, which knew about the bug before the chip was released,
falsely made this claim. The bug apparently affects all workloads,
potentially resulting in a system freeze.

2. TRUTH: A fix is available under NDA for RHEL 4 and not otherwise
apparently.

http://techreport.com/discussions.x/13721

http://techreport.com/discussions.x/13724

Robert.
 
Reply With Quote
 
 
 
 
Yousuf Khan
Guest
Posts: n/a
 
      15th Dec 2007
Robert Myers wrote:
> A number of assertions have been made here about the AMD TLB L3 Bug:
>
> 1. Only affects virtualization.
>
> 2. Is fixed in 64-bit Linux without a significant performance hit.
>
> 1. TRUTH: AMD, which knew about the bug before the chip was released,
> falsely made this claim. The bug apparently affects all workloads,
> potentially resulting in a system freeze.



The truth actually is that it only affects virtualized workloads,
because the problem occurs when nested page tables are used. Nested page
tables only are used in virtualization, no other times. AMD never made
the claim it only affects virtualization, it is actually trying to keep
that hushed up: I assume because it does not want a virtualization bug
to be associated with its products since that kind of a reputation would
be hard to shake off, even if fixed.

> 2. TRUTH: A fix is available under NDA for RHEL 4 and not otherwise
> apparently.
>
> http://techreport.com/discussions.x/13721
>
> http://techreport.com/discussions.x/13724


How secret can it be if it's open-source?

Yousuf Khan
 
Reply With Quote
 
David Kanter
Guest
Posts: n/a
 
      15th Dec 2007
On Dec 14, 4:02 pm, Yousuf Khan <bbb...@yahoo.com> wrote:
> Robert Myers wrote:
> > A number of assertions have been made here about the AMD TLB L3 Bug:

>
> > 1. Only affects virtualization.

>
> > 2. Is fixed in 64-bit Linux without a significant performance hit.

>
> > 1. TRUTH: AMD, which knew about the bug before the chip was released,
> > falsely made this claim. The bug apparently affects all workloads,
> > potentially resulting in a system freeze.

>
> The truth actually is that it only affects virtualized workloads,
> because the problem occurs when nested page tables are used. Nested page
> tables only are used in virtualization, no other times. AMD never made
> the claim it only affects virtualization, it is actually trying to keep
> that hushed up: I assume because it does not want a virtualization bug
> to be associated with its products since that kind of a reputation would
> be hard to shake off, even if fixed.


It's not clear to me whether that is true or not. Here's the bug:

"The processor operation to change the accessed or dirty bits of a
page translation table entry in the L2 from 0b to 1b may not be
atomic. A small window of time exists where other cached operations
may cause the stale page translation table entry to be installed in
the L3 before the modified copy is returned to the L2. In addition, if
a probe for this cache line occurs during this window of time, the
processor may not set the accessed or dirty bit and may corrupt data
for an unrelated cached operation. The system may experience a machine
check event reporting an L3 protocol error has occurred. In this case,
the MC4 status register (MSR 0000_0410) will be equal to
B2000000_000B0C0F or BA000000_000B0C0F. The MC4 address register (MSR
0000_0412) will be equal to 26h."

I know what a Page Table Entry is, but I'm not sure what a PTTE
is...it sort of sounds like the nested page table. Perhaps someone
who is intimately familiar with the architecture could comment?

> > 2. TRUTH: A fix is available under NDA for RHEL 4 and not otherwise
> > apparently.

>
> >http://techreport.com/discussions.x/13721

>
> >http://techreport.com/discussions.x/13724

>
> How secret can it be if it's open-source?


Really easy, nobody cares enough to sue AMD/RH to get it. It's not
like there are more than 10-20 end users for Barcelona at the moment.

DK
 
Reply With Quote
 
Robert Myers
Guest
Posts: n/a
 
      15th Dec 2007
On Dec 14, 7:02 pm, Yousuf Khan <bbb...@yahoo.com> wrote:
> Robert Myers wrote:
> > A number of assertions have been made here about the AMD TLB L3 Bug:

>
> > 1. Only affects virtualization.

>
> > 2. Is fixed in 64-bit Linux without a significant performance hit.

>
> > 1. TRUTH: AMD, which knew about the bug before the chip was released,
> > falsely made this claim. The bug apparently affects all workloads,
> > potentially resulting in a system freeze.

>
> The truth actually is that it only affects virtualized workloads,
> because the problem occurs when nested page tables are used. Nested page
> tables only are used in virtualization, no other times. AMD never made
> the claim it only affects virtualization, it is actually trying to keep
> that hushed up: I assume because it does not want a virtualization bug
> to be associated with its products since that kind of a reputation would
> be hard to shake off, even if fixed.
>


Discussing AMD with you can be an interesting undertaking:

"In order to better understand this problem, TR spoke with Michael
Saucier, Desktop Product Marketing Manager at AMD. Saucier confirmed
that the TLB erratum can cause the system to hang when the chip is
experiencing high utilization. AMD has stated previously that
virtualization workloads can lead to this problem, but Saucier
clarified that other workloads can trigger system hangs, as well. He
characterized the issue as a race condition in the TLB logic "where
the other guy wins who isn't supposed to win," and said the likelihood
of the erratum causing a system hang is extremely rare."

The report could be factually incorrect, but since I cited something
other than my own impression to support my statement, I'd expect you
to do the same.

You know that I'm not an admirer of AMD, so you won't be surprised
that I think AMD may be mortally wounded. Between the ATI fiasco and
this, AMD is a company with products that no one is going to want to
buy and seems unlikely to survive until it will have products that
someone does want to buy. That AMD is publicly whining about the
pounding its stock price has taken should tell you something. Vendors
who *finally* took a chance on AMD after years of hanging back have
been fried. First there was the lame roadmap. Now this.

What's the difference between this and Intel's botched FDIV bug?
Very, very simple. At the time of the FDIV bug, x86 was for
"peecees," and no one cared if Intel made mistakes that IBM (or DEC or
Sun) never would. Now they do.

> > 2. TRUTH: A fix is available under NDA for RHEL 4 and not otherwise
> > apparently.

>
> >http://techreport.com/discussions.x/13721

>
> >http://techreport.com/discussions.x/13724

>
> How secret can it be if it's open-source?
>

How is part of SUSE kept proprietary?

Robert.
 
Reply With Quote
 
The little lost angel
Guest
Posts: n/a
 
      15th Dec 2007
On Thu, 13 Dec 2007 16:35:26 -0800 (PST), Robert Myers
<(E-Mail Removed)> wrote:

>A number of assertions have been made here about the AMD TLB L3 Bug:
>
>1. Only affects virtualization.
>
>2. Is fixed in 64-bit Linux without a significant performance hit.
>
>1. TRUTH: AMD, which knew about the bug before the chip was released,
>falsely made this claim. The bug apparently affects all workloads,
>potentially resulting in a system freeze.
>
>2. TRUTH: A fix is available under NDA for RHEL 4 and not otherwise
>apparently.


A number of assertions have been made here by Mr Myers about the AMD
TLB L3 Bug:

1. That a fix is available under NDA for RHEL 4 and not otherwise
apparently.

Truth: Mr Myers, which knew about the openly released fix before the
post was released, falsely made this claim. The fix apparently is
available for all, not requiring a NDA that could potentially result
in an information freeze.

Truth : AMD released the fix publicly without a NDA requirement on 5
Dec, documented on the same day by the same website used by Mr Myers
to cite the two truths above, 8 days before Mr Myer's posting on 13
Dec...

http://www.techreport.com/discussions.x/13742
https://www.x86-64.org/pipermail/dis...er/010260.html

=P

--
A Lost Angel, fallen from heaven
Lost in dreams, Lost in aspirations,
Lost to the world, Lost to myself
 
Reply With Quote
 
Robert Myers
Guest
Posts: n/a
 
      15th Dec 2007
On Dec 15, 9:12 am, a?n?g?...@lovergirl.lrigrevol.moc.com (The little
lost angel) wrote:
> On Thu, 13 Dec 2007 16:35:26 -0800 (PST), Robert Myers
>
> <rbmyers...@gmail.com> wrote:
> >A number of assertions have been made here about the AMD TLB L3 Bug:

>
> >1. Only affects virtualization.

>
> >2. Is fixed in 64-bit Linux without a significant performance hit.

>
> >1. TRUTH: AMD, which knew about the bug before the chip was released,
> >falsely made this claim. The bug apparently affects all workloads,
> >potentially resulting in a system freeze.

>
> >2. TRUTH: A fix is available under NDA for RHEL 4 and not otherwise
> >apparently.

>
> A number of assertions have been made here by Mr Myers about the AMD
> TLB L3 Bug:
>
> 1. That a fix is available under NDA for RHEL 4 and not otherwise
> apparently.
>
> Truth: Mr Myers, which knew about the openly released fix before the
> post was released, falsely made this claim. The fix apparently is
> available for all, not requiring a NDA that could potentially result
> in an information freeze.
>
> Truth : AMD released the fix publicly without a NDA requirement on 5
> Dec, documented on the same day by the same website used by Mr Myers
> to cite the two truths above, 8 days before Mr Myer's posting on 13
> Dec...
>
> http://www.techreport.com/discussion...er/010260.html
>

As I'm sure you know, I wasn't aware of the follow-up article.
Somewhere, there might be a customer who matters who would apply such
an "invasive" patch without support. Who or where that customer might
be is beyond my imagining, except that someone important must have a
bunch of these AMD chips installed somewhere and has no choice but to
take the chance. So,

1. We rushed a chip into production and missed an infrequently-
occurring but potentially disastrous bug.

2. We are now rushing out a patch that purports to fix the bug without
a serious penalty. We told you to trust us about the chip, and it
turns out you shouldn't have. Now we're telling you *not* to trust us
about the patch. Why, exactly, would anyone install the unsupported
patch? Presumably there is a handful of important customers whose
hands are being held. For everyone else, it's just PR.

Robert.
 
Reply With Quote
 
Sebastian Kaliszewski
Guest
Posts: n/a
 
      18th Dec 2007
Robert Myers wrote:
> As I'm sure you know, I wasn't aware of the follow-up article.


Don't use such lame excuses.

> Somewhere, there might be a customer who matters who would apply such
> an "invasive" patch without support. Who or where that customer might
> be is beyond my imagining, except that someone important must have a
> bunch of these AMD chips installed somewhere and has no choice but to
> take the chance. So,
>
> 1. We rushed a chip into production and missed an infrequently-
> occurring but potentially disastrous bug.
>
> 2. We are now rushing out a patch that purports to fix the bug without
> a serious penalty. We told you to trust us about the chip, and it
> turns out you shouldn't have. Now we're telling you *not* to trust us
> about the patch. Why, exactly, would anyone install the unsupported
> patch? Presumably there is a handful of important customers whose
> hands are being held. For everyone else, it's just PR.


Nonsense. Go, check how many errata there was in the Core Duo. Just see the
example from the same site, from the comments from the article you quoted...

http://techreport.com/forums/viewtop...d97652f7222e65


rgds
\SK
 
Reply With Quote
 
Sebastian Kaliszewski
Guest
Posts: n/a
 
      18th Dec 2007
Robert Myers wrote:
> What's the difference between this and Intel's botched FDIV bug?
> Very, very simple. At the time of the FDIV bug, x86 was for
> "peecees," and no one cared if Intel made mistakes that IBM (or DEC or
> Sun) never would. Now they do.


What a nonsense!

You know what is the difference?
There is a workaround for this AMD bug, like there are for Inte's TLB bus in
their Core2 Duos. Both AMD & Intel fixes reduce the perofrmance a bit.

You know what is the difference? There was no fix for FDIV bug at all.
Reducing performance slightly or not. The buggy stuff was hard coded and not
bypassable. Intel has learned from that disaster and AMD has too.

>>>2. TRUTH: A fix is available under NDA for RHEL 4 and not otherwise
>>>apparently.

>>
>>>http://techreport.com/discussions.x/13721

>>
>>>http://techreport.com/discussions.x/13724

>>
>>How secret can it be if it's open-source?
>>

> How is part of SUSE kept proprietary?


Go buy a little clue and read how GPL works. Then you'll know that parts
which are not derived work of the GPL Linux kernel can be proprietary and
how those which are dervied work (as such patch has to) can not.

BTW. The patch is public, so the point is moot, you're just spreading
unfounded FUD.

rgds
\SK
 
Reply With Quote
 
Robert Myers
Guest
Posts: n/a
 
      19th Dec 2007
On Dec 18, 1:29 pm, Sebastian Kaliszewski
<s...@get.it.off.to.reply.informa.pl> wrote:
> Robert Myers wrote:
> > What's the difference between this and Intel's botched FDIV bug?
> > Very, very simple. At the time of the FDIV bug, x86 was for
> > "peecees," and no one cared if Intel made mistakes that IBM (or DEC or
> > Sun) never would. Now they do.

>
> What a nonsense!
>
> You know what is the difference?
> There is a workaround for this AMD bug, like there are for Inte's TLB bus in
> their Core2 Duos. Both AMD & Intel fixes reduce the perofrmance a bit.
>

The workaround costs anywhere from 5% (one of AMD's numbers) to 50%
(other's numbers, naturally) in performance. You think that's
acceptable? AMD bought it on this one. Perhaps AMD should have had
you go out and address investors. You'd have been a big hit.

Your comment that "both AMD & Intel fixes reduce the perofrmance [sic]
a bit" is like Yousuf coming out with the item about Intel's bug right
after the AMD bug, as if they canceled one another out. Go look at
the financial press, and see if anyone but AMDroids (or anyone that
matters) reads it that way.

> You know what is the difference? There was no fix for FDIV bug at all.
> Reducing performance slightly or not. The buggy stuff was hard coded and not
> bypassable. Intel has learned from that disaster and AMD has too.
>
> >>>2. TRUTH: A fix is available under NDA for RHEL 4 and not otherwise
> >>>apparently.

>
> >>>http://techreport.com/discussions.x/13721

>
> >>>http://techreport.com/discussions.x/13724

>
> >>How secret can it be if it's open-source?

>
> > How is part of SUSE kept proprietary?

>
> Go buy a little clue and read how GPL works. Then you'll know that parts
> which are not derived work of the GPL Linux kernel can be proprietary and
> how those which are dervied work (as such patch has to) can not.
>
> BTW. The patch is public, so the point is moot, you're just spreading
> unfounded FUD.
>

If you can't be bothered to read the entire thread, then I can't be
bothered to respond.

Robert.
 
Reply With Quote
 
Robert Myers
Guest
Posts: n/a
 
      19th Dec 2007
On Dec 18, 1:22 pm, Sebastian Kaliszewski
<s...@get.it.off.to.reply.informa.pl> wrote:
> Robert Myers wrote:
> > As I'm sure you know, I wasn't aware of the follow-up article.

>
> Don't use such lame excuses.
>

When you've grown up, you'll know better than to talk to people that
way, especially to people you don't know.

> > Somewhere, there might be a customer who matters who would apply such
> > an "invasive" patch without support. Who or where that customer might
> > be is beyond my imagining, except that someone important must have a
> > bunch of these AMD chips installed somewhere and has no choice but to
> > take the chance. So,

>
> > 1. We rushed a chip into production and missed an infrequently-
> > occurring but potentially disastrous bug.

>
> > 2. We are now rushing out a patch that purports to fix the bug without
> > a serious penalty. We told you to trust us about the chip, and it
> > turns out you shouldn't have. Now we're telling you *not* to trust us
> > about the patch. Why, exactly, would anyone install the unsupported
> > patch? Presumably there is a handful of important customers whose
> > hands are being held. For everyone else, it's just PR.

>
> Nonsense. Go, check how many errata there was in the Core Duo. Just see the
> example from the same site, from the comments from the article you quoted...
>
> http://techreport.com/forums/viewtop...=next&sid=a3a9...
>

There are mistakes, and there are mistakes. This mistake is one that
AMD could not afford. Your idea that "errata happen" and that they're
all equivalent is interesting. I suggest that you buy some AMD
stock. It's a bargain right now.

Robert.
 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
NDA clarifications? Zack Whittaker Windows Vista General Discussion 25 17th Apr 2006 07:57 PM
[PL] Clarifications Genna Reeney Freeware 106 22nd Jul 2004 05:57 AM
Re: Clarifications on CetusDev A.A. Fussy Freeware 3 4th Jun 2004 07:02 PM
Clarifications please! Susan Microsoft Outlook 0 1st Jun 2004 11:25 PM
Clarifications on Backup topokin Microsoft Windows 2000 0 18th Feb 2004 01:16 PM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 12:15 AM.