Windows RAID

Rod Speed · Jul 9, 2011

David Brown wrote

Rod Speed wrote

dd can do many things, including offline disk imaging to a file. It won't do COW snapshotting

Why are you repeating what I just said ?

- to do that you need to either use a filesystem that supports COW snapshots (like btrfs), or use LVM. Then
you can snapshot your partition and use dd to make a copy.

And you cant do it with dd. You can with Win backup thats supplied with the OS.

OK.

Even Norton's didnt. Havent bothered to keep track of where that
steaming turd is up to on that now.

Nope.

(or "yes in spades", as you might say - the concepts are completely different).

Nope.

Win auto treats it as read only if it decides that its damaged.

Thats even safer.

You've told me many times that it's possible - but still not mentioned how.

Just did.

Yousuf Khan · Jul 10, 2011

Well, until it gets to be a supported feature, I would count
it as basically not present. Incidentally, Linux has support for
booting of RAID1 for a long time, since a RAID1 drive looks like
an ordinary drive with the .9 RAID superblock version that
is stored at the end of the disk. Unless you start writing,
you do not need the RAID to run and the bootloader does not
write.

Yeah, the RAID1 should be pretty trivial, I'd gather even Windows 95 or
even DOS for that matter could've been booted off of a RAID1 partition.
One half a RAID1 partition is just the mirror of the other, so it holds
exactly identical data to the other and it's organized exactly like as
if it's an unmirrored unmodified partition. So any kernel should be able
to boot off of that one without even knowing about the mirroring. That
is unless the RAID software messes with the boot sector or partition
tables. There may have been a few that did that in the past, but I think
for the most part that's safely over and done with these days.

When I used to run Solaris boxes, the boot process simply involved the
kernel booting off of one of the two mirrors first and then once the
kernel loaded the RAID drivers, then stuff could be loaded from either
mirrored drive. If the mirrored drive that died was the primary mirror,
then it was simply a matter of making the other the primary mirror
through the BIOS (OpenBoot in Sun parlance) and then you can at your
leisure replaced the failed mirror.

Yousuf Khan

Arno · Jul 10, 2011

Yeah, the RAID1 should be pretty trivial, I'd gather even Windows 95 or
even DOS for that matter could've been booted off of a RAID1 partition.
One half a RAID1 partition is just the mirror of the other, so it holds
exactly identical data to the other and it's organized exactly like as
if it's an unmirrored unmodified partition. So any kernel should be able
to boot off of that one without even knowing about the mirroring. That
is unless the RAID software messes with the boot sector or partition
tables. There may have been a few that did that in the past, but I think
for the most part that's safely over and done with these days.

When I used to run Solaris boxes, the boot process simply involved the
kernel booting off of one of the two mirrors first and then once the
kernel loaded the RAID drivers, then stuff could be loaded from either
mirrored drive. If the mirrored drive that died was the primary mirror,
then it was simply a matter of making the other the primary mirror
through the BIOS (OpenBoot in Sun parlance) and then you can at your
leisure replaced the failed mirror.

Pretty much the same with Limux. You just need to make sure to
install grub to each raw disk, not to the RAID. The other thing
you can do is to use RAID on partition level on just have
two un-raided 100MB or so boot partitions.

Arno

Rod Speed · Jul 10, 2011

David Brown wrote

Rod Speed wrote

The difference is, you choose it when you decide which edition of windows to buy.

And with Linux, you choose which distribution to use. So no difference in practice.

If you want a desktop, you buy Win7. If you want the a server, you buy Win Server.

Its nothing like that black and white.

If you want a Fedora desktop, you download and install Fedora -
choosing mostly desktop-style packages for installation. If you want
a Fedora server, you download and install Fedora - choosing mostly
server-style packages for installation. If you want a Debian
desktop, you download and install Debian - choosing mostly
desktop-style packages for installation. If you want a Debian
server, you download and install Debian - choosing mostly
server-style packages for installation.

And plenty of the desktop oriented distributions arent that great to start with if you want a server.

Spot the difference?

There is no important difference.

If you want paid-for commercial support, it's a different matter - you buy a support package aimed at desktop use or
server use. This may or may not come with a distribution. The price and the level of support are very different
there.

Corse it is with something thats free.

Yes, but with Win you choose it before buying and installing

Wrong, again.

- with Linux, you can change as you want on a full system.

You can with Win too. Thats what in place upgrades are about.

Vista too.

OK. There was a difference in earlier versions, but I'll take your
word for it that the Win7 kernel and the Win Server 2008 R2 (that's
the current server Windows version, I believe) are the same.

<http://en.wikipedia.org/wiki/Windows_7#Processor_limits>

Doesnt say that all of the versions have different limits.

<http://www.microsoft.com/windowsserver2008/en/us/r2-compare-specs.aspx>

Doesnt say that all of the versions have different limits on the number of cores.

The limitations are in the number of sockets, rather than the number of cores.

So you were wrong, again.

Arno · Jul 10, 2011

To me, Linux and Unix are the same thing. Just get used to it, when I
say Unix, I'm also talking about Linux.

You are, of course, wrong, as Linux is GNU and GNU = "GNU is Not Unix"!

You are also right, as the Linux API and philosopy is derived
from bith BSD and SYSV.

The correct term for Linux is "Unix-like". A "real" Unix has
its source code derived from the original BSD or System V
source code.

However, it would also be fair to say Linux is an improved
Unix reimplementation, that offers both the BSD and SYSV API.

So for the religious ones, you are a heretic, to be burned at the
stake ASAP. For all people that just care about what a thing
can do, you are perfectly correct ;-)=)

I've seen situations where there's been enough damage to a filesystem
that I've been told to go into single-user mode to run a manual fsck on
a filesystem. Mind you the manual method is not that inconvenient, just
use the "-y" option to answer yes to all of the questions. It's just a
little disconcerting when it happens.

The good thing, compared to some other non-Unix "OS"es is that
typically you get everything recovered undamaged, except for files
that were written when the crash happened. But there is really
nothing that can be done about that.

So far, I haven't seen much that couldn't be handled by the Linux-based
NTFS repair utils. The only instance I've seen which it couldn't handle,
I believe was an NTFS boot volume that was locked down by Windows
because it put the OS into hibernate state. The Linux NTFS repair utils
refused to handle those (probably smartly).

Good to know. I still avoid NTFS were possible, but maybe I
can change that now.

That's not any different than Windows, however, there are some
artificial locks in Windows due to licensing. There are also
sub-editions in Windows desktop and server, and very few licensing locks
exist between the sub-editions. For example most apps will run in
Windows 7 Home and Home Premium and Ultimate. They may lock themselves
out of Windows Server not because the interface is different but because
of licensing restrictions.

Indeed. One more reason to move away from an "OS" that has as its
primary purpose to make money. For critical infastructure that
is not a good basis, also because it is controlled by a single
entity (a big no-no in vendor sourcing for critical components).

I agree. I have made the same experience. The only advantage I ever
found in hardware RAID is automated hotplugging. But if that means
I cannot query SMART status anymore, then the hardware solution
does not fulfil even basic requirements for reliable opearation.
And if really needed, this could be scripted with Linux as well.
However, the only reason I see for it are really, really incompetent
Admins and they are rare on Linux.

I think the over-complicated RAID schemes that emerged in software RAID

What "over-complicated RAID schemes" are you talking about?
I have yet to encounter them. Or are you referring to LVM, a
thing that is simply not there with hardware RAID?

were as a result of performance and reliability problems with
server-controlled disks, i.e. JBODs. Hardware raid controllers could
dedicate themselves to monitoring every component inside them,
continuously.

And they often do not, with no warning in the documentation.
I tried this a couple of years ago with an Adaptec SATA RAID
controller and a nearly dead disk (no failed SMART status,
but > 1000 reallocated sectors and dog slow) and the controller
did not complain about anything. Unusable pice of trash.

But a server has other work to do, so it can't dedicate
itself to constantly monitoring its disks.

Huh? Whyever not? It takes almost no resources. On my server,
smartd is at <1 second CPU while running for 14h at this time
(changed something yesterday). That is with "constant monitoring".

So a lot of issues could
arise inside a server that results in false diagnosis of a failure. So
they came up with complex schemes that would minimize downtime.

I still do not understand what you are referring to.

This is absolutely not needed with hardware raid. The processors inside

Yes. Instead you get things like two undetected disks that are near
death and when one dies and you plug in a spare, the other one
dies during rebuild and takes the array with it. And you cannot
do SMART selftests. And you cannot monoitor disk health trends.
And you cannot monitor disk temperature. Or if you can, you have
to patch obscure vendor code into the kernel. Not acceptable at all.

The processors inside
hardware raid units are doing nothing else but monitoring disks.

Not in my experience.

So it
made more sense for them to simplify the raid schemes and go for greater
throughput.

I think you have an overly positive (maybe even a bit naive) view
of hardware RAID and the people that design these controllers.
In my experience, to anybody that _can_ administrate a sofware
RAID competently, hardware RAID is a problem, not a solution.
The others should not be let near any storage solution in the
first place.

Arno

Yousuf Khan · Jul 11, 2011

OK. Most of the time, it's fine to talk about *nix covering Unix, Linux,
the BSD's, and other related OS's. But given your experience with
Solaris (which really is UNIX), and your references to older limitations
of Unix, I thought you were making a distinction.

Most of the advanced software or hardware RAID setups that I've ever
seen were on Solaris & HP-UX systems, attached to SANs. Linux boxes were
mainly self-sufficient quick-setup boxes used for specific purposes.

Linux (and probably *nix won't make repairs to a filesystem without
asking first, although it will happily tidy things up based on the
journal if there's been an unclean shutdown. And if it is not happy with
the root filesystem, then that means running in single-user mode when
doing the repair.

/Any/ filesystem check which does repairs is a bit disconcerting. But
when talking about "manual" or advanced repairs, I've been thinking
about things like specifying a different superblock for ext2/3/4, or
rebuilding reiserfs trees.

This is the sort of thing that Windows' own chkdsk handles without too
many questions. It's not to say some really bad filesystem problems
don't happen to NTFS that require extra attention, but somehow the
chkdsk can ask one or two questions at the start of the operation and go
with it from that point on. It may run for hours, but it does the
repairs on its own without any further input from you.

I'm not sure if it's because NTFS's design allows for simpler questions
to be asked by the repair utility, than for other types of filesystems.
Or if it's because the repair utility itself is just designed to not ask
you too many questions beyond an initial few.

I would suspect it's the latter, as Windows chkdsk only has two
filesystems to be geared for, NTFS or FAT. Whereas Unix fsck has to be
made generic enough to handle several dozen filesystems, and several
that can be added at some future point without warning. They usually
implement fsck as simply a frontend app for several filesystem-specific
background utilities.

I haven't tried the Linux NTFS repair programs - I have only heard that
they have some limitations compared to the Windows one.

Well, I just haven't personally encountered any major issues with them
yet. I'll likely encounter something soon and totally change my mind
about it.

I don't quite agree here. There are different reasons for having
different RAID schemes, and there are advantages and disadvantages of
each. Certainly there are a few things that are useful with software
raid but not hardware raid, such as having a raid1 partition for
booting. But the ability to add or remove extra disks in a fast and
convenient way to improve the safety of a disk replacement is not an
unnecessary complication - it is a very useful feature that Linux
software raid can provide, and hardware raid systems cannot. And layered
raid setups are not over-complicated either - large scale systems
usually have layer raid, whether it be hardware, software, or a
combination.

It's not necessary, all forms of RAID (except RAID 0 striping), are
redundant by definition. Any disk should be replaceable whether it's
hardware or software RAID. And nowadays most are hot-swappable. In
software RAID, you usually have to bring up the software RAID manager
app, and go through various procedures to quiesce the failed drive to
remove it.

Usually inside hardware RAID arrays, in the worst cases, you'd have to
bring up a hardware RAID manager app, and send a command to the disk
array to quiesce the failed drive. So it's not much different than a
software RAID. But in the best cases, in hardware RAID, all you have to
do is go into a front control panel on the array itself to quiesce the
failed drive, or even better there might be a stop button right beside
the failed drive right next to a blinking light telling you which drive
has failed.

When you replace the failed drive with a new drive, the same button
might be used to resync the new drive with the rest of its volume.

Other advanced features of software raid are there if people want them,
or not if they don't want them. If you want to temporarily add an extra
disk and change your raid5 into a raid6, software raid lets you do that
in a fast and efficient manner with an asymmetrical layout. Hardware
raid requires a full rebuild to the standard raid6 layout - and another
rebuild to go back to raid5.

I see absolutely /no/ reason to suppose that software raid should be
less reliable, or show more false failures than hardware raid.

Well there are issues of i/o communications breakdowns, as well as
processors that are busy servicing other hardware interrupts or just
busy with general computing tasks. Something like that might be enough
for the software RAID to think a disk has gone offline and assume it's
bad. It's getting less of a problem with multi-core processors, but
there are certain issues that can cause even all of the cores to break
down and give up, such as a Triple Fault. The computer just core dumps
and restarts at that point.

There was a time when hardware raid meant much faster systems than
software raid, especially with raid5 - but those days are long past as
cpu power has increased much faster than disk throughput (especially
since software raid makes good use of multiple processors).

Not really, hardware raid arrays are still several orders of magnitude
faster than anything you can do inside a server. If this wasn't the
case, then companies like EMC wouldn't have any business. The storage
arrays they sell can service several dozen servers simultaneously over a
SAN. Internally, they have communication channels (often optical) that
are faster than any PCI-express bus and fully redundant. The redundancy
is used to both increase performance by load-balancing data over
multiple channels, and as fail-over. A busy server missing some of the
array's i/o interrupts won't result in the volume being falsely marked
as bad.

The processors inside hardware raid units simplify the raid schemes
because it's easier to make accelerated hardware for simple schemes, and
because the processors are wimps compared to the server's main cpu. The
server's cpu will have perhaps 4 cores running at several GHz each - the
raid card will be running at a few hundred MHz but with dedicated
hardware for doing raid5 and raid6 calculations.

I'm not talking about a RAID card, I'm talking about real storage arrays.

Yousuf Khan

Rod Speed · Jul 11, 2011

David Brown wrote

Yousuf Khan wrote

I think it is perhaps just that Windows chkdsk will do the best it can without bothering the user,

Because the user cant be of any help.

while the Linux utilities will sometimes ask the user as they /might/ know something of help.

Thats never the case with Win file systems. What
can the user know that would be any use at all ?

It's a difference of philosophy, not of filesystem design.

Wrong, as always.

I have, on a W2K machine, had an NTFS filesystem that chkdsk found
faulty but was unable to repair - but was happy to continue using. I never had any problems with using the partition,
but chkdsk always reported faults.

They were obviously very minor faults.

fsck is just a front-end that identifies the filesystem, then calls fsck.ext3, fsck.xfs, fsck.reiserfs, etc., as
appropriate. Most of these have few choices, but some (such as fsck.reiserfs) offer many options.

Then its a completely ****ed file system.

All forms of RAID (except RAID0) are redundant

Why are you repeating what he just said ?

- when everything is working.

And when it isnt too.

But if you have one-disk redundancy, such as RAID5 or a two-disk RAID1, then you are vulnerable when a disk fails

Wrong, again.

or is being replaced. The nature of a RAID5 rebuild is that you stress the disks a great deal during the rebuild -
getting a second failure is a real risk,

Wrong, again. There is no increased risk of failure with use.

and will bring down the whole array. There are countless articles and statistics available online about the risks
with large RAID5 arrays.

Pity you ****ed up so spectacularly, again.

The most obvious way to avoid that is to use RAID6 - then you still have one disk redundancy while another disk is
being replaced.

If you are seeing drives fail that frequently, you have a problem, stupid.

With software raid (at least on Linux), you can add extra redundancy
temporarily when you know you are going to do a replacement (due to a drive that is failing but not dead, or a size
upgrade).

In the best case, with the best setup, replacing disks on a hardware array may be as easy as you say - take out the
bad disk, put in a new one.

It aint just the best case, its the usual case.

If the system is configured to automatically treat the new disk
as a spare and automatically rebuild, then it does the job straight
away. In the worst case, you have to reboot your system to get to the raid bios setup (maybe the card's raid manager
software doesn't run on your choice of operating system) and fix things.

With Linux mdadm, in the worst case you have to use a few mdadm commands to remove the failed drive from the array,
add the new drive (hot plugging is fine), and resync. It's not hard.

That assumes that there is someone there to do that.

:> But if you want to have automatic no-brainer replacements,

Or a config that will do it for itself when no one is there...

you can arrange for that too - you can set up scripts to automatically remove failed drives from the array, and to
detect a new drive and automatically add it to the list of spares.

And you shouldnt have to setup scripts to do something that basic.

With Linux software raid, these sorts of things may involve a bit more learning, and a bit more trial and error

And that is a completely ****ed approach if you want a reliable system.

(but you will want to do trial replacements of drives anyway, with hardware or software raid).

Wrong, again.

But you can do as much or as little as you want - you are not constrained by whatever the hardware raid manufacturer
thinks you should have.

Nonsense. If you are likely to get IO breakdowns or trouble from too
many hardware interrupts, then you are going to get those regardless
of whether your raid is hardware or software. It doesn't make any
difference whether the disk controller is signalling the cpu to say
it's read the data, or if it is the raid controller - you get mostly
the same data transfers. You have some more transfers with software
raid as the host must handle the raid parts, but the overhead is
certainly not going to push an otherwise stable system over to failure.

It can do.

And hardware raids have vastly more problems with falsely marking
disks as offline due to delays - that's one of the reasons why many
hardware raid card manufacturers specify special expensive "raid"
harddisks instead of ordinary disks. The main difference with these
is that a "raid" disk will give up reading a damaged sector after
only a few seconds, while a "normal" disk will try a lot harder. So
if a "normal" disk is having trouble with reading a sector, a hardware raid card will typically drop the whole drive
as bad. Linux mdadm raid, on the other hand, will give you your data from the other drives while waiting, and if the
drive recovers then it will continue to be used (the drive firmware will automatically re-locate the failing sector).

Only on writes, not on reads.

<http://www.smallnetbuilder.com/nas/nas-features/31202-should-you-use-tler-drives-in-your-raid-nas>
<http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery>

Have you ever seen, heard of or read of such an event being caused by a system accessing too many disks too quickly?
Baring hardware
faults or astoundingly bad design, no system is going to ask for data
from disks faster than it can receive it!

Thats not what he is talking about there.

There are several reasons why hardware raid is still popular, and is
sometimes the best choice. Speed, compared to a software raid
solution, is not one of them.

First, consider small systems - up to say a dozen drives, connected
to a card in the server. The most common reasons for using hardware
raid are that that's what the system manufacturer provides and
supports, that's what the system administrator knows about, and that
the system will run Windows and has no choice. The system is slower
and more expensive than software raid, but the system administrator
either doesn't know any better, or he has no practical choice.

Yousuf Khan · Jul 11, 2011

I think it is perhaps just that Windows chkdsk will do the best it can
without bothering the user, while the Linux utilities will sometimes ask
the user as they /might/ know something of help. It's a difference of
philosophy, not of filesystem design.

I find that to be bad design. What should know more about a filesystem
than the utility that's designed to fix that filesystem? I was always
annoyed by the "backup superblock" question, you had to calculate the
location of the first backup superblock, and if that didn't work then
the next superblock, etc. Why should a human have to do that, when the
utility itself should do that?

fsck is just a front-end that identifies the filesystem, then calls
fsck.ext3, fsck.xfs, fsck.reiserfs, etc., as appropriate. Most of these
have few choices, but some (such as fsck.reiserfs) offer many options.

These things have a long way to go for intelligence. Even without using
a GUI, they really should ask questions at the appropriate time its
required, and then stop asking questions. This sort of thing can be had
even on a serial console.

All forms of RAID (except RAID0) are redundant - when everything is
working. But if you have one-disk redundancy, such as RAID5 or a
two-disk RAID1, then you are vulnerable when a disk fails or is being
replaced. The nature of a RAID5 rebuild is that you stress the disks a
great deal during the rebuild - getting a second failure is a real risk,
and will bring down the whole array. There are countless articles and
statistics available online about the risks with large RAID5 arrays. The
most obvious way to avoid that is to use RAID6 - then you still have one
disk redundancy while another disk is being replaced. With software raid
(at least on Linux), you can add extra redundancy temporarily when you
know you are going to do a replacement (due to a drive that is failing
but not dead, or a size upgrade).

That's possible, but it's just as likely that the additional stress on
the other drives is a result of having to rebuild data on the fly from
parity. So from the moment the drive failed till the moment it's been
discovered to have failed and eventually replaced, puts a lot of load on
the additional drives.

In the best case, with the best setup, replacing disks on a hardware
array may be as easy as you say - take out the bad disk, put in a new
one. If the system is configured to automatically treat the new disk as
a spare and automatically rebuild, then it does the job straight away.
In the worst case, you have to reboot your system to get to the raid
bios setup (maybe the card's raid manager software doesn't run on your
choice of operating system) and fix things.

Yes, if there are hot spares available, then that eliminates the wait
time between failure detection and disk replacement. The hot spare
automatically replaces the failed disk without any manual intervention.
So you just physically remove the failed drive at your leisure.

Nonsense. If you are likely to get IO breakdowns or trouble from too
many hardware interrupts, then you are going to get those regardless of
whether your raid is hardware or software. It doesn't make any
difference whether the disk controller is signalling the cpu to say it's
read the data, or if it is the raid controller - you get mostly the same
data transfers. You have some more transfers with software raid as the
host must handle the raid parts, but the overhead is certainly not going
to push an otherwise stable system over to failure.

But the point is if the server is overloaded with i/o, then you don't
have the danger about the server marking a disk bad in a hardware raid
array, since the disk management is taken care of by the array's own
processors. It'll simply show up as a communications error on the
server, but it won't result in anything more serious on the drives
themselves. The separation between the disks and the server acts much as
a firewall in networking.

And hardware raids have vastly more problems with falsely marking disks
as offline due to delays - that's one of the reasons why many hardware
raid card manufacturers specify special expensive "raid" harddisks
instead of ordinary disks. The main difference with these is that a
"raid" disk will give up reading a damaged sector after only a few
seconds, while a "normal" disk will try a lot harder. So if a "normal"
disk is having trouble with reading a sector, a hardware raid card will
typically drop the whole drive as bad. Linux mdadm raid, on the other
hand, will give you your data from the other drives while waiting, and
if the drive recovers then it will continue to be used (the drive
firmware will automatically re-locate the failing sector).

I think we're not talking about the same hardware RAID. I'm talking
about intelligent disk arrays, not add-in cards.

Have you ever seen, heard of or read of such an event being caused by a
system accessing too many disks too quickly? Baring hardware faults or
astoundingly bad design, no system is going to ask for data from disks
faster than it can receive it!

Yup, I've seen it. That's why I bring it up.

There are several reasons why hardware raid is still popular, and is
sometimes the best choice. Speed, compared to a software raid solution,
is not one of them.

First, consider small systems - up to say a dozen drives, connected to a
card in the server. The most common reasons for using hardware raid are
that that's what the system manufacturer provides and supports, that's
what the system administrator knows about, and that the system will run
Windows and has no choice. The system is slower and more expensive than
software raid, but the system administrator either doesn't know any
better, or he has no practical choice.

I personally don't see much use for add-in card based hardware raid over
software raid myself. However, storage arrays are a different matter.

Then look at big systems. There you have external boxes connected to
your server by iSCSI, Fibre Channel, etc. As far as the host server is
concerned, these boxes are "hardware raid". But what's inside these
boxes? There is often a lot of hardware to improve redundancy and speed,
and to make life more convenient for the administrator (such as LCD
panels). At the heart of the system, there are one or more chips running
the raid system. Sometimes these are dedicated raid processors -
"hardware raid". But more often than not they are general purpose
processors running software raid - it's cheaper, faster, and more
flexible. Sometimes they will be running a dedicated system, other times
they will be running Linux (that's /always/ the case for low-end SAN/NAS
boxes).

From the server administrator's viewpoint, he doesn't care what's
inside the box - as long as he can put data in and get the data out,
quickly and reliably, he is happy. So EMC (or whoever) make a big box
that does exactly that. And inside that box is what can only be
described as software raid.

You haven't got a clue then. EMC isn't making billions of dollars a year
just because there are lazy system administrators out there that can't
be bothered to learn software raid commands. These storage arrays are
shared amongst multiple servers. These days servers aren't really file
servers anymore, you can get a gadget to do that. These days servers are
more applications servers, e.g. web servers, database servers, etc. In
fact, in high-availability environments, you'll often have clusters of
servers doing the same job, and they need to have access to the exact
same data as each other. If the data were only on one of the servers,
then other servers would have to share it off of that one server and the
main server would be overburdened. You can then dedicate the main server
to serving only data to the other servers, but that itself becomes a
burden at some point. And it also becomes a single-point of failure.

Storage arrays with their specialization and internal redundancy have
less chance of being a single-point of failure.

Yousuf Khan

Rod Speed · Jul 11, 2011

David Brown wrote

Rod Speed wrote

They probably were minor,

No probably about it, they must have been if it continued to work fine with them.

but it was annoying that they couldn't be fixed.

Sure, but what matters is why they couldnt be.

Why is it such a problem that a file system repair utility has options?

A properly designed file system doesnt need them.

It fixes most common issues automatically, but gives extra options for more difficult cases.

And a properly designed file system shouldnt need to ask
the user about how the user wants the problems fixed.

The "automatic" alternative would be just to tell you the filesystem is trashed,

The other file systems obviously dont.

The other file systems obviously dont.

Are you really saying that if you take a RAID5 system, remove or
otherwise fail a disk, then you /still/ have a one-disk redundancy?
Nope.

Or is this one of these cases when you are going to later claim to have said something else,
Nope.

or that you said something stupid intentionally?

Never ever said that.

<http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162>

Doesnt say anything like what you said.

The main risk is for an unrecoverable read error from one of the other disks, rather than a complete disk failure.

There is no reason why either is any more likely to happen during a recovery.

Disks fail.

Hardly ever with properly designed systems and drives bought with an eye to what is reliable.

You have to be unlucky to hit a second disk death during a rebuild,

Very unlikely indeed, or just plain stupid because the fault is not in the drive.

but it happens. If you have a large RAID5 array, and are
replacing each disk to increase the size or to phase out old disks, then your risks are not insignificant.

There is no reason why a drive is any more likely to fail in that situation.

And only a fool doesnt have a backup in that situation anyway.

Lets try some numbers - with disks that are two or three years old, you have an 8% annual failure rate (Google's
numbers).

Google's numbers are nothing like that black and white
and that is nothing like the failure rate normally seen.

Suppose you have a RAID5 array with 10 disks, each 1 TB in size, that you are replacing.

In that situation you would be a fool to not have backups.

At 20 MB per second rebuild rate (since the system is in use otherwise), that's 14 hours for each disk's rebuild. The
total vulnerable time, when you don't have a full disk's redundancy, is then 140 hours. The chances of having at
least one disk die of natural causes during that 140 hour period is just over 1%.

In that situation you would be a fool to not have backups.

And while RAID6 can make sense TEMPORARILY in that
particular situation, that does not mean that it makes sense
to run RAID6 all the time for that relatively rare situation.

The chances of having an unrecoverable read error, rather than a full disk failure, are a lot higher.

Wrong, again.

It turns out that there is in fact a good reason why RAID6 is getting more popular.

Nope, most dont do that sort of thing very often and while it can
make sense to have a RAID6 config while doing it, that does not
mean that it makes sense to run RAID6 all the time.

Yes - but I normally assume there is someone trained and competent to do drive replacements in a raid array.

Makes a lot more sense to have the system handle that auto and
just get some monkey to replace a failed drive when one fails.

That requires that the monkey be a lot less 'trained'.

Of course, it's easy to configure a disk as a hot spare in Linux, just as with hardware raid systems.

And automatic recover on a failure is the only sensible approach.

You would prefer to learn about replacing drives when you have had a failure?

Nope. I would prefer that the system handles that automatically and just informs
you that a particular drive has died and needs to be physically replaced.

Personally, I tried it when the disks were all working and contained nothing but a basic system install - that way if
a disk fails, I know what I'm doing.

A system that handles drive failure automatically doesnt need you to know what you are doing.

Have a look here <http://en.wikipedia.org/wiki/S.M.A.R.T.> at the definition of the "Reallocated Sectors Count". If a
drive has a read failure, it re-maps that sector.

Its just plain wrong if the data cannot be read from that sector.

<http://en.wikipedia.org/wiki/Time-Limited_Error_Recovery>

No news.

I'll let Yousuf tell me himself.

He already did.

Tom Del Rosso · Jul 12, 2011

Yousuf said:
That's possible, but it's just as likely that the additional stress on
the other drives is a result of having to rebuild data on the fly from
parity. So from the moment the drive failed till the moment it's been
discovered to have failed and eventually replaced, puts a lot of load
on the additional drives.

If you do a RAID 5 rebuild offline, with no operation other than the
rebuild, there should be a lot less stress on them, yes?

Arno · Jul 12, 2011

If you do a RAID 5 rebuild offline, with no operation other than the
rebuild, there should be a lot less stress on them, yes?

Not really. In fact the stress level may or may not be higher.
But, offline rebuild is far faster. Rebuilding a loaded RAID
can take weeks.

In fact, this is a very strong reason to use RAID6 (or
3-way RAID1) for any system that needs to rebuild under
load. If you then lose a second disk during the rebuild,
you take the array offline and rebuild that way.

RAID6 (or 3-way RAID1) is a good idea for anything with
high uptime needs today. Incidentally, that is another
good reason for software RAID: At least on Linux software
RAID supports >=3 way RAID1 without any issues. I am
not aware of hardware controllers able to do that.

Arno

Rod Speed · Jul 12, 2011

Tom said:
Yousuf Khan wrote:

If you do a RAID 5 rebuild offline, with no operation other than the rebuild, there should be a lot less stress on
them, yes?

Obvously depends on how much its used when its not offline.

Rod Speed · Jul 12, 2011

David Brown wrote

Rod Speed wrote

Then what /are/ you saying?

That you arent vulnerable when a disk fails.

On this, I agree entirely - RAID is no substitute for decent backups.
But rebuilding a system completely from backups is not something you want to do if you can avoid it by simply
investing in an extra disk in your RAID array.

You aint established that doing that eliminates the need to do that.

I can agree that it is very rare that having RAID6 rather than RAID5
is actually going to save your array - and that it is a useful thing
to be able to temporarily turn a RAID5 into a RAID6.

Which brings us full circle - one of the nice things about Linux
software raid, which is normally hard or impossible with Windows
software raid or hardware raid, is the ability to add an extra disk to
your RAID5 array for extra redundancy without affecting the layout of
the rest of the array.

You're just plain wrong, again.

Arno · Jul 12, 2011

Offline raid rebuild will run at the maximum speed, but online raid
rebuild doesn't have to be /that/ slow. You can normally balance the
rebuild time and the impact on online usage by changing the rates for
the rebuild.

If the controller allows you to. It can still take a lot longer
than offline rebuilt, especially when you do not have a lot of
reserves to use for it.

I've used 3-way mirrors with Linux md raid (I've also used 1-way
mirrors, which is arguably a bit odd).

I have too. The 1-disk ones are a good way to get persistent
disk IDs when you tend to move them around controller
interfaces. Quite convenient. The 3-way ones are in my
home-fileserver that uses 2.5" disks and they have a tendency to
become unresponsive about once a year. No data loss, the
disks just stops responding and I have not been able to track
the issue down. With 3-way I can just unplug and replug the
unresponsive drive when it is convenient. And if I mess
up and unplug the wrong one (dark kitchen cabinet), I
get a second chance.

If you want high uptimes, multiple disk redundancy, and fast
low-overhead disk rebuilds, then you can build a RAID5 on top of RAID1
mirror pairs. Then a one-disk rebuild is done as a direct copy from the
other half of the mirror pair, getting close to optimal rebuild time
with only a small impact on the speed of the rest of the system. And if
the drive pairs are Linux raid10,far sets, then you get striped access
across all the drives.

Well, expensive. But there may be applications justifying this.
And configuring something like this is unproblematic with Linux.
One application I can see is to use SSDs as main drives and
conventional HDDs (with "write mostly") as mirrors. Then you lose
SSD write speed (not so bad because of OS write buffering)
and still have full SSD read speed. And the normal HDDs are
far cheaper. I have one 3-way RAID1 with one SSD and two
normal partitions. Very nice for a maildir inbox with >1000
meil files...

Of course, the storage efficiency is not great...

Indeed.

Arno

Arno · Jul 12, 2011

I expect any controller /would/ allow you to change that balance, or
specify bandwidth limitations (min and max values).

I think this expectation is not met by typical hardware
controllers. It was certainly not met by an 8 port SATA
RAID controller from Adaptec I bought a few years back
(and then used as paperweight after careful evaluation).

I hope you use write intent bitmaps - then you can re-add your
unresponsive drive without having to re-sync everything.

Not so far. By the system has low load, resync is not an issue.

Have you tried using hdparm commands to put the unresponsive drive to
sleep, then wake it again? Or with the "-w" flag to reset it?

That would not work, as the interface got kernel-disabled.
I tried the remove and scan commands from rescan-scsi-bus.sh
with no effect. If I unplug and replug the power (SATA disks),
they come up again immediately and have nothing in their
error log.

Arno

Rod Speed · Jul 12, 2011

David Brown wrote

Rod Speed wrote

I think it is fairly obvious from what I wrote that you are "vulnerable" in the sense that your array will not survive
a second disk failure.

Yes, but when even one failure is unlikely, and two is even more
unlikely, and you have a proper backup in case the very unlikely
happens, you arent actually vulnerable at all. The most you might
be is inconvenienced if you need to use the backup.

Perhaps you feel that the chances of a second disk failure are too small to worry about

Nope, that the risk of that is small and that if you have even half
a clue, the backup will handle that unlikely possibility, so you are
not in fact vulnerable at all.

- maybe that's true for most practical purposes, but you are still running with no redundancy,

But you do have the backup if the shit does hit the fan.

and therefore vulnerable, until your rebuild is complete.

Nope, because you have the backup if you need it.

Somehow, I just /knew/ you were going to say that.

I call a spade a spade and a wrong a wrong. You get to like that or lump it.

So, in what way do you think I'm wrong here?

The stupid claim that only Linux can do that.

You've said yourself that being able to temporarily turn a RAID5 into a RAID6 can make sense.

Yes, that was a comment that its hard or impossible with Windows.

That is just plain wrong, again.

Yousuf Khan · Jul 15, 2011

I was under the impression that Ry - a RAID0 stripe of RAID1 mirrors -
is more common. But I'll let you continue...

For software raid, striping followed by mirroring is the most common.
For hardware raid, i.e. storage arrays, it's hard to tell what what's
most common with striping and mirroring, as that's usually proprietary
information to the storage array's manufacturer. But obviously most
storage arrays wouldn't be used with striping and mirroring, but most
likely with RAID5 so I would assume nobody pays too much attention to
the striping and mirroring design that's more of an afterthought.

Yes, I know of the advantages of RAID10 (what you call Ry) over RAID01
(Rx). It is also much better when you are doing a re-build - you only
need to copy one disk's worth of data, not half the array.

Well even in RAID01, mirrored disks usually correspond one-to-one with
each other. So as long as they keep updating the remaining disks in the
failed mirror, you would only need to update just the one failed disk.
Of course it's more likely that they stop updating the whole mirror, and
not just the one failed disk.

Now in a striped/mirrored setup, they usually have a maximum number of
stripes. So let's say you have a maximum of a 4-way stripe, and your
striped/mirrored volume might have 8 disks per mirror (16 altogether),
so the way it would be setup is that the first 4 disks are part of one
striping set, and the next 4 are part of another. The two stripe sets
would simply be concatenated onto each other. So if a disk fails in the
first stripe set, it won't affect the second stripe set.

RAID01 may have an advantage in the topology, such as having each RAID0
set in its own cabinet with its own controller, so that you have
redundancy in that hardware too.

It's possible to do it that way. But in most cases they just have
virtual cabinets with a single storage array. Each mirror might be
assigned to a different backplane in the array.

If you've got the extra disk, why would you prefer RAID5 + hot spare
over RAID6? RAID6 has slightly more overhead for small writes, but it
also has slightly faster reads (since you have an extra spindle online).

RAID6 uses up that one whole extra disk for parity, thus taking away
from your own disk capacity. And if you want the extra speed of the
extra spindle, then just add that extra disk to RAID5, and you get both
the extra speed and the extra space.

You gotta realize in a storage array, you won't have just one big RAID5
volume, but you may have dozens of them. The hot-spares can be utilized
by any of the RAID5 volumes as the need arises, it won't be dedicated
just for one RAID5 volume.

Yousuf Khan

John Turco · Jul 15, 2011

Arno said:
Indeed. One more reason to move away from an "OS" that has as its
primary purpose to make money. For critical infastructure that
is not a good basis, also because it is controlled by a single
entity (a big no-no in vendor sourcing for critical components).

<edited>

Halt, Arno! Users of "Apple" computers often claim that their
machines are superior to "Windoze" boxes, >becauae< the former
are "controlled by a single entity"...correct?

Arno · Jul 16, 2011

<heavily edited for brevity>

Halt, Arno! Users of "Apple" computers often claim that their
machines are superior to "Windoze" boxes, >becauae< the former
are "controlled by a single entity"...correct?

They do. And they do not get it either. However Apple users
have one advantage: Quite a lot of what they do can be done
on other platforms, if not as prettily. So they have sort
of an exit-strategy (and that is what the single vendor
problem is about).

Arno

Tom Del Rosso · Jul 16, 2011

Arno said:
I think you have an overly positive (maybe even a bit naive) view
of hardware RAID and the people that design these controllers.
In my experience, to anybody that _can_ administrate a sofware
RAID competently, hardware RAID is a problem, not a solution.
The others should not be let near any storage solution in the
first place.

It seems like you're assuming the system runs Linux. I'm not sure if you
would take this position with a Windows system where software RAID has
limitations.

Raid Controller and installing XP pro on raid 0 help	2	Mar 2, 2013
NCQ vs TCQ performance?	9	May 5, 2008
RAID 1 vs RAID 5 and to the bottom of it !	12	Sep 20, 2006
Windows RAID 5 recovery - Cannot import Foreign Disks	4	Sep 27, 2006
RAID controler	3	Jan 21, 2005
TLER needs special RAID controller? was: Xeon ...	13	Jul 8, 2006
Raid 1 or Raid 5	4	Mar 30, 2004
Problems with Limux software RAID after OS upgrade (long)	1	Jul 6, 2007

Windows RAID

Rod Speed

Yousuf Khan

Arno

Rod Speed

Arno

Yousuf Khan

Rod Speed

Yousuf Khan

Rod Speed

Tom Del Rosso

Arno

Rod Speed

Rod Speed

Arno

Arno

Rod Speed

Yousuf Khan

John Turco

Arno

Tom Del Rosso

Ask a Question

Similar Threads