sata read error

T

Torben Zick

Hi out there!

Following problem is driving me mad:

A Fujitsu-Siemens Econel 50 is to be equipped with a (working) backup
solution.

Thats what happend so far:
The econel has a buildin Sata-Raid on which two 160 GB Seagate are
connected (mirrored). The sata-chip is from lsi and its firmwareversion is
5.4.02141659r. An additional Adaptec 29160 is installed in PCI-Slot 5,
there is no irq-sharing with the Sata. The Tape is a Quantum DLT VS160,
attached to the LVD/SE connector at the 29160. Termination is correct, an
active Terminator is attached to the very end of the scsi-cable. OS is W2K
Server SP4.

So far so good.

I can do a good backup using the W2K-Backup, this backup is readable and i
can do a restore. if i try using another backup-solution i get corrupted
data on the tape, verify fails and no restore is possible. it is
regardless which programm i use, i've tried severall ( e.g backupexec 8.x,
9.x, 10.x, Arcserve..)

i've crosschecked all things you can think off, replaced the
scsi-controller, cable, terminator, even the drive. a quantum v4 does not
work either. both the tapedrives are functioning fine in other systems,
even on the same 29160 with the same cables, terminators, software, os...

the only thing i can finally think of, is the Sata-Controller itself. On
LSI's website i found a releasenote that says, that within an
firmwareupdate for a pci-sata expansion card a problem with read-errors
during heavy i/o activities has been fixed.

could that be my problem? and, how do i fix it????

i called fsc, but for the usual support-gnagna, nothing. just my drives
arent certified for use in the econel. big deal.
the same with lsi. they even do not know, what chip is integrated in the
econel 50 (btw: neither do fsc). do they sell them to fsc? their web-site
says so. but perhaps the only one who knows is the guy who wrote the
press-announcment.

ok, is there anybody out there who has suffered the same issue and got a
solution yet? or just a few hints? or at least a few cosy words to comfort
me???

thanks for your help

regards
torben
 
P

Peter

Can you temporarily install Windows 2003 SP1 and test again?
If it passes, then hardware is OK and you will need to look into existing
W2K SP4 for answers.
(48-bit LBA fix?)
 
T

Torben Zick

Can you temporarily install Windows 2003 SP1 and test again?
If it passes, then hardware is OK and you will need to look into existing
W2K SP4 for answers.
(48-bit LBA fix?)

hmmm, not so easy. to achief equal prerequisites i got to install two
sata-drives, build a mirrorset and install w2k3. sounds like a lot of work
and, of course, downtime. i see no way for prepare that in advance except
i buy another of that boxes.
i'd rather in for something else, but i consider that as a 'last option'.
thanks!
 
P

Peter

Torben Zick said:
hmmm, not so easy. to achief equal prerequisites i got to install two
sata-drives, build a mirrorset and install w2k3. sounds like a lot of work
and, of course, downtime. i see no way for prepare that in advance except
i buy another of that boxes.
i'd rather in for something else, but i consider that as a 'last option'.
thanks!

Go back to your vendor, say that you have a production problem on your only
box. Ask for a short term eval unit for testing. That should't be difficult
since those servers are in "economy" line.

What puzzles me is that you have said there is LSI RAID, but manual lists
ICH7R which is definetely Intel's. What kind of RAID animal is that?
 
O

Odie

Torben said:
Hi out there!

Following problem is driving me mad:

A Fujitsu-Siemens Econel 50 is to be equipped with a (working) backup
solution.

Thats what happend so far:
The econel has a buildin Sata-Raid on which two 160 GB Seagate are
connected (mirrored). The sata-chip is from lsi and its firmwareversion is
5.4.02141659r. An additional Adaptec 29160 is installed in PCI-Slot 5,
there is no irq-sharing with the Sata. The Tape is a Quantum DLT VS160,
attached to the LVD/SE connector at the 29160. Termination is correct, an
active Terminator is attached to the very end of the scsi-cable. OS is W2K
Server SP4.

So far so good.

I can do a good backup using the W2K-Backup, this backup is readable and i
can do a restore. if i try using another backup-solution i get corrupted
data on the tape, verify fails and no restore is possible. it is
regardless which programm i use, i've tried severall ( e.g backupexec 8.x,
9.x, 10.x, Arcserve..)

i've crosschecked all things you can think off, replaced the
scsi-controller, cable, terminator, even the drive. a quantum v4 does not
work either. both the tapedrives are functioning fine in other systems,
even on the same 29160 with the same cables, terminators, software, os...

the only thing i can finally think of, is the Sata-Controller itself. On
LSI's website i found a releasenote that says, that within an
firmwareupdate for a pci-sata expansion card a problem with read-errors
during heavy i/o activities has been fixed.

could that be my problem? and, how do i fix it????

i called fsc, but for the usual support-gnagna, nothing. just my drives
arent certified for use in the econel. big deal.
the same with lsi. they even do not know, what chip is integrated in the
econel 50 (btw: neither do fsc). do they sell them to fsc? their web-site
says so. but perhaps the only one who knows is the guy who wrote the
press-announcment.

ok, is there anybody out there who has suffered the same issue and got a
solution yet? or just a few hints? or at least a few cosy words to comfort
me???

thanks for your help

regards
torben

I'd be inclined to purchase a separate SATA RAID controller and use
that, rather than the motherboard's offering.

Should the worst come to the worst, it would be easier to recover as
well.


Odie
 
A

Arno Wagner

Previously Torben Zick said:
Hi out there!
Following problem is driving me mad:
A Fujitsu-Siemens Econel 50 is to be equipped with a (working) backup
solution.
Thats what happend so far:
The econel has a buildin Sata-Raid on which two 160 GB Seagate are
connected (mirrored). The sata-chip is from lsi and its firmwareversion is
5.4.02141659r. An additional Adaptec 29160 is installed in PCI-Slot 5,
there is no irq-sharing with the Sata. The Tape is a Quantum DLT VS160,
attached to the LVD/SE connector at the 29160. Termination is correct, an
active Terminator is attached to the very end of the scsi-cable. OS is W2K
Server SP4.
So far so good.
I can do a good backup using the W2K-Backup, this backup is readable and i
can do a restore. if i try using another backup-solution i get corrupted
data on the tape, verify fails and no restore is possible. it is
regardless which programm i use, i've tried severall ( e.g backupexec 8.x,
9.x, 10.x, Arcserve..)
i've crosschecked all things you can think off, replaced the
scsi-controller, cable, terminator, even the drive. a quantum v4 does not
work either. both the tapedrives are functioning fine in other systems,
even on the same 29160 with the same cables, terminators, software, os...
the only thing i can finally think of, is the Sata-Controller itself. On
LSI's website i found a releasenote that says, that within an
firmwareupdate for a pci-sata expansion card a problem with read-errors
during heavy i/o activities has been fixed.
could that be my problem? and, how do i fix it????
i called fsc, but for the usual support-gnagna, nothing. just my drives
arent certified for use in the econel. big deal.
the same with lsi. they even do not know, what chip is integrated in the
econel 50 (btw: neither do fsc). do they sell them to fsc? their web-site
says so. but perhaps the only one who knows is the guy who wrote the
press-announcment.
ok, is there anybody out there who has suffered the same issue and got a
solution yet? or just a few hints? or at least a few cosy words to comfort
me???

Not quite what you had, but maybe you have bit-corruption in main memory
or the mainboard bus? If the main memory is ECC, that still leaves the
bus. Main memory can be tested very well with memtest86. For the bus
the only thing I found works to some degree is to slow it down and
see whether the problem disappears.

It would also certainly help to find out what bit/byte pattern the
corruption follows. If it is allways in the same position relative to
32 bit boundaries, then it is likely memory or bus. If it is in the
same position relative to the full byte, then it could be the SATA
controller. Other patterns would make other parts suspect.

Arno
 
T

Torben Zick

Not quite what you had, but maybe you have bit-corruption in main memory
or the mainboard bus? If the main memory is ECC, that still leaves the
bus. Main memory can be tested very well with memtest86. For the bus
the only thing I found works to some degree is to slow it down and
see whether the problem disappears.

i see no chance in slowing down the busspeed, though the bios offers no
such option. i don't think, its a good idea to patch the bios and/or to
tweak the bios parameters. (keep in mind that the box is productive...)
It would also certainly help to find out what bit/byte pattern the
corruption follows. If it is allways in the same position relative to
32 bit boundaries, then it is likely memory or bus. If it is in the
same position relative to the full byte, then it could be the SATA
controller. Other patterns would make other parts suspect.

interessant approach. do i see the patterns on file-level? or do i have to
compare it on fs-level?

well, i'm getting more and more confinced that the whole mess got
something to do with the transfer-speed.
next step is obviously to build a simmilar system and to test the various
options. but if it is the sata, i will
probably fail in creating an image :-(
ok, then i will know...

thank you, Arno
Torben
 
A

Arno Wagner

i see no chance in slowing down the busspeed, though the bios offers no
such option. i don't think, its a good idea to patch the bios and/or to
tweak the bios parameters. (keep in mind that the box is productive...)

I understand. One thing that might work is reducing the CPU FSB speed.
Another one is if there is an "overclocking" setting, which sometimes
can be used to "underclock". A third option might be to just load
the "safe defaults". However I would say memory or controller are more
likely the issue, so better test that first.
interessant approach. do i see the patterns on file-level? or do i have to
compare it on fs-level?

File level is fine. What you basically need is a file and a corrupted
copy. I usually do this by copying a large file and compating it to
the original using "cmp" (Unix/Linux). Some replacement should be
available for windows as well. If you tell it to give the detailed
differences, you usually get position and bit pattern. That will already
tell a lot.
well, i'm getting more and more confinced that the whole mess got
something to do with the transfer-speed.

May well be. Slower is allways less error prone and unfortunately PC
hardware is often run at the limit today.
next step is obviously to build a simmilar system and to test the various
options. but if it is the sata, i will
probably fail in creating an image :-(
ok, then i will know...

You will know more, at least...

Arno
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top