RAID 5 Question

  • Thread starter Thread starter Eli
  • Start date Start date
E

Eli

I've always read that when you set up a RAID 5 array you generally
give up approximately 1 drive worth of storage space. Does this hold
true no matter how many drives are in the array? For instance, in a
12 drive array?

I've also always read that the overall read performance of the array
increases with then number of spindles in the array. Is there some
point of diminishing (or negative) returns to this? Say on a 12 drive
PATA or SATA RAID 5 controller, if you don't need the expanded space
of a single 12 drive array, might there be any avantages to operating
two 6 drive arrays on the same controller?
 
Eli said:
I've always read that when you set up a RAID 5 array you generally
give up approximately 1 drive worth of storage space.

True.

Does this hold
true no matter how many drives are in the array? For instance, in a
12 drive array?
Yes.

I've also always read that the overall read performance of the array
increases with then number of spindles in the array.
Yes.

Is there some
point of diminishing (or negative) returns to this? Say on a 12 drive
PATA or SATA RAID 5 controller, if you don't need the expanded space
of a single 12 drive array, might there be any avantages to operating
two 6 drive arrays on the same controller?

Depends entirely on details/specifics of the RAID controller, configuration
and OS/drivers.

Two 6 drive arrays will have 1/12 less space.
 
I've always read that when you set up a RAID 5 array you generally
give up approximately 1 drive worth of storage space. Does this hold
true no matter how many drives are in the array? For instance, in a
12 drive array?
Yes.

I've also always read that the overall read performance of the array
increases with then number of spindles in the array. Is there some
point of diminishing (or negative) returns to this? Say on a 12 drive
PATA or SATA RAID 5 controller, if you don't need the expanded space
of a single 12 drive array, might there be any avantages to operating
two 6 drive arrays on the same controller?

Not on a PATA or SATA controller. But on a SCSI controller you might
want to use multiple SCSI channels. 12 drives on 1 channel might
saturate that channel. Some controllers will let you create a single
Raid4 array using multiple scsi channels. For some others you might
need to create multiple arrays.

PATA and SATA controllers don't have this issue since you have a
dedicated ide channel for every disk.

Multiple spindles increase the performance of the array for two main
reasons:
1) the sequeantial transfer rate increases with more spindles. Might
be usefull in some rare situations were you really need that. But in
those situations you will probably already saturate the pci bus with 6
drives and won't gain anything more with 12 drives. (creating two
arrays also won't help then, because the pci bus is then the limiting
factor

2) average seektimes decrease with more spindles. The reason for this
is that while part of the array is retrieving a file, other parts of
the array that don't contain the data for that file can retrieve
another file at the same time. When your array has more spindles the
chance of this becomes bigger. (It is of course also dependent on the
size of your files, the stripesize, and whether your server opens a
lot of files, or just one at a time)

Marc
 
When thinking about that 12 port RAID controller, keep in mind drive failure
and hot spares, I wouldn't plan on using ALL of the drives for the array. I
have a small raid 5 array of 3 disks, if 1 drive fails my data is still
intact but I would feel most comfortable with a hot spare just in case
another drive fails the same day-I do not currently have a hot spare. With
a much larger array, the chances of drive failure are greater so I would
consider at least 1 hot spare an absolute necessity, 2 hot spares even
better. 12 drives at 250GB each is 3 TB of data. Using them in a raid 5
array with 2 hotspares and you have 2.25TB of data that is pretty darn safe.
I don't even like thinking about having that much data to take care of, what
a nightmare if something goes wrong!

--Dan
 
Marc de Vries said:
Not on a PATA or SATA controller. But on a SCSI controller you
might want to use multiple SCSI channels. 12 drives on 1 channel might
saturate that channel. Some controllers will let you create a single
Raid4 array using multiple scsi channels. For some others you might
need to create multiple arrays.

PATA and SATA controllers don't have this issue since you have a
dedicated ide channel for every disk.

Not with PATA.
It is still *your* choice to not use a second drive on same channel.
Multiple spindles increase the performance of the array for two main
reasons:
1) the sequeantial transfer rate increases with more spindles.

That is for striped arrays.
Might be usefull in some rare situations were you really need that. But
in those situations you will probably already saturate the pci bus with
6 drives and won't gain anything more with 12 drives. (creating two ar-
rays also won't help then, because the pci bus is then the limiting factor

2) average seektimes decrease with more spindles.

That is for mirrored arrays.
The reason for this is that while part of the array is retrieving a file,
other parts of the array that don't contain the data for that file can
retrieve another file at the same time.

So that's not really seek time. Just another way of how transfer rate increases.
When your array has more spindles the chance of this becomes bigger.
(It is of course also dependent on the
size of your files, the stripesize, and whether your server opens a
lot of files, or just one at a time)

So no.
 
Not with PATA.
It is still *your* choice to not use a second drive on same channel.

I have never seen a PATA Raid5 controller where each drive didn't have
a dedicated ide channel.
But you are right in the case of some Raid 0/1/10 controllers where
you can have two drives connected on a ide channel, although you then
loose the hotplug capability of those cards.
That is for striped arrays.

Raid5 is also a form of stiping, so it also applies here.
That is for mirrored arrays.

Wrong. It applies to all forms of Raid. Although it also depends on
the implementation. Very old and/or cheap arrays didn't support this
in the past, not even in mirrors. But nowadays almost any controller
supports it.
So that's not really seek time. Just another way of how transfer rate increases.

No. It has nothing at all to do with transfer rate. I can retrieve
multiple small files simultaneously, but since the files are small it
is not something that is determined by transfer rate, so not something
for which I need the bigger transfersrates of Raid arrays. I won't
even exceed the transferrate of a single disk with those files.
But the perceived seek time of the array will be smaller than the
minimum seek time of a single disk. So it is exactly as I said: the
average seektime of the entire array decrease with more spindles.
Of course this only works if you open more then one file.

You will now understand that the correct answer is: So yes.

Marc
 
When thinking about that 12 port RAID controller, keep in mind drive failure
and hot spares, I wouldn't plan on using ALL of the drives for the array. I
have a small raid 5 array of 3 disks, if 1 drive fails my data is still
intact but I would feel most comfortable with a hot spare just in case
another drive fails the same day-I do not currently have a hot spare. With
a much larger array, the chances of drive failure are greater so I would
consider at least 1 hot spare an absolute necessity, 2 hot spares even
better. 12 drives at 250GB each is 3 TB of data. Using them in a raid 5
array with 2 hotspares and you have 2.25TB of data that is pretty darn safe.
I don't even like thinking about having that much data to take care of, what
a nightmare if something goes wrong!

Who says that he will be using 250GB disks?

It's common practice to use small disks in arrays with many spindles
to increase performance of the array. For example: my servers have
Raid6 arrays consisting of 14 disks of 36GB each.
 
Who says that he will be using 250GB disks?

It's common practice to use small disks in arrays with many spindles
to increase performance of the array. For example: my servers have
Raid6 arrays consisting of 14 disks of 36GB each.

What have you found speedwise? I seem to get about the same transfer rate off a 160GB disk than off a striped pair of 80s, as the 160 is inherently faster. Hence I only stripe if it's cheaper to do so (or impossible not to - like you canot get a 500GB IDE)



--
*****TWO BABY CONURES***** 15 parrots and increasing http://www.petersparrots.com
93 silly video clips http://www.insanevideoclips.com
1259 digital photos http://www.petersphotos.com
Served from a pentawatercooled dual silent Athlon 2.8 with terrabyte raid

"I am" is reportedly the shortest sentence in the English language. Could it be that "I do" is the longest sentence?
 
What have you found speedwise? I seem to get about the same transfer rate off a 160GB disk than off a striped pair of 80s, as the 160 is inherently faster. Hence I only stripe if it's cheaper to do so (or impossible not to - like you canot get a 500GB IDE)

Why would the 160 be inherently faster? Or are the 80s older drives?

Drives of the same series should have the same transferrate no matter
the size.

I use multiple spindles to get higher IO/s, not to get higher
transferrates. So I have never checked the performancegain on
transfferrates on my servers.

But I have a promise Sata Raid5 card in my desktop at home, and I
definitely get a much higher transferrate from that array of 4 disks
then from a single drive.

There are also lots of reviews and people on ng where two disks are
striped with onboard el-cheapo raid cards where the transferrate is
much higher than with a single disk.

So I'm suprised that you see about the same transfer rate. with two
striped disks.

But very little applications benefit from higher transferrates. When
applications benefit from raid arrays it is usually because of higher
IO/s

Marc
 
Why would the 160 be inherently faster? Or are the 80s older drives?

Drives of the same series should have the same transferrate no matter
the size.

Cecause bigger drives are newer technology, or if they are the same technology, they have more platters - hence it's almost a raid in itself.
I use multiple spindles to get higher IO/s, not to get higher
transferrates. So I have never checked the performancegain on
transfferrates on my servers.

What would you recommend for cluster sizes? This is a desktop but I do a lot of stuff at once, and it's also a web server.
But I have a promise Sata Raid5 card in my desktop at home, and I
definitely get a much higher transferrate from that array of 4 disks
then from a single drive.

There are also lots of reviews and people on ng where two disks are
striped with onboard el-cheapo raid cards where the transferrate is
much higher than with a single disk.

So I'm suprised that you see about the same transfer rate. with two
striped disks.

But very little applications benefit from higher transferrates. When
applications benefit from raid arrays it is usually because of higher
IO/s

The non-increase in transfer rate was when I was trying to do video capture.


--
*****TWO BABY CONURES***** 15 parrots and increasing http://www.petersparrots.com
93 silly video clips http://www.insanevideoclips.com
1259 digital photos http://www.petersphotos.com
Served from a pentawatercooled dual silent Athlon 2.8 with terrabyte raid

What has four legs, is big, green, fuzzy, and if it fell out of a tree would kill you?

A pool table.
 
Who says that he will be using 250GB disks?
Nobody.

It's common practice to use small disks in arrays with many spindles
to increase performance of the array. For example: my servers have
Raid6 arrays consisting of 14 disks of 36GB each.

Who says that he will be using 36GB disks?

Exactly.
--Dan
 
Marc de Vries said:
Why would the 160 be inherently faster? Or are the 80s older drives?

Drives of the same series should have the same transferrate no matter
the size.

I use multiple spindles to get higher IO/s, not to get higher transfer
rates. So I have never checked the performance gain on transferrates
on my servers.

But I have a promise Sata Raid5 card in my desktop at home, and I
definitely get a much higher transferrate from that array of 4 disks
then from a single drive.

There are also lots of reviews and people on ng where two disks are
striped with onboard el-cheapo raid cards where the transferrate is
much higher than with a single disk.

So I'm suprised that you see about the same transfer rate. with two
striped disks.

But very little applications benefit from higher transferrates.
When applications benefit from raid arrays it is usually because of
higher IO/s.

Which of course is one and the same if you look at speed only
(IO/s is MB/s, more IO/s is more MB/s).
What IO/s doesn't say is how big the IOs are and how much time is
spent in seeks relative to larger IO and how that affects transfer
rate. So more IO isn't necessarily faster IO when the same amount
of IO is broken into more and thus smaller pieces.

IOs, not IO/s, is important to servers to get the most out of the
bus (Queueing, reordering) and to distribute IO fairly among users
(cutting up potentially larger IO into smaller IO) for response
continuity, at the same time.
 
Marc de Vries said:
Marc de Vries said:
[snip]

PATA and SATA controllers don't have this issue since you have a
dedicated ide channel for every disk.

Not with PATA.
It is still *your* choice to not use a second drive on same channel.

I have never seen a PATA Raid5 controller where each drive didn't have
a dedicated ide channel.

Which obviously is entirely firmware limited unless the IDE chip used is
selfdesigned and doesn't support M/S.
But you are right in the case of some Raid 0/1/10 controllers where
you can have two drives connected on a ide channel, although you then
loose the hotplug capability of those cards.

What has that got to do with hotplugging?
Raid5 is also a form of stiping, so it also applies here.

Yes, and where did I say different?
That is why I said "That is for striped arrays" and not "Raid0".
Get it now?
Wrong. It applies to all forms of Raid. Although it also depends on
the implementation. Very old and/or cheap arrays didn't support this
in the past, not even in mirrors. But nowadays almost any controller
supports it.

Supports what? What is "it"?
No. It has nothing at all to do with transfer rate. I can retrieve
multiple small files simultaneously, but since the files are small it
is not something that is determined by transfer rate, so not something
for which I need the bigger transfersrates of Raid arrays. I won't
even exceed the transferrate of a single disk with those files.
But the perceived

Right, "perceived".
seek time of the array will be smaller than the minimum seek time of
a single disk. So it is exactly as I said:
the average seektime of the entire array decrease with more spindles.

Which obviously is false when it is "perceived" like that.
Perceived, as the word means, is a false assumption.

If transfer rate exists of STR divided by (seektime and actual
transfer time) per time unit and you then increase the STR you could
also say that that is perceived as if the seektime was decreased com-
pared to the old STR just because the total transfer time decreased.
It's not so.
Of course this only works if you open more then one file.

Thanks for confirming that this is
"Just another way of how transfer rate increases".

And btw, just as well you can say that when seektime (total time spent
in seeks) decreases that that is "perceived" as increased transfer rate.
You will now understand

You have no say in that whatsoever.
that the correct answer is: So yes.

So no (hint: stripesize on mirrors?).
 
Peter said:
Cecause bigger drives are newer technology, or if they are the same
technology, they have more platters - hence it's almost a raid in itself.

That's a common misconception. Drives with multiple platters do not in
general perform parallel reads on those platters--while there might on some
occasions be a slight performance gain if the data pattern is such that
access can be by a head-switch and short seek rather than a long seek, the
gain is going to be very, very minor unless the data is carefully
structured to match the drive.
 
That's a common misconception. Drives with multiple platters do not in
general perform parallel reads on those platters--while there might on some
occasions be a slight performance gain if the data pattern is such that
access can be by a head-switch and short seek rather than a long seek, the
gain is going to be very, very minor unless the data is carefully
structured to match the drive.

Well I've generally found the bigger the faster. Probably because when you get a bigger drive it's usually newer technology.


--
*****TWO BABY CONURES***** 15 parrots and increasing http://www.petersparrots.com
93 silly video clips http://www.insanevideoclips.com
1259 digital photos http://www.petersphotos.com
Served from a pentawatercooled dual silent Athlon 2.8 with terrabyte raid

When eating a tongue sandwich, how do you know when you've finished?
 
Peter said:
Well I've generally found the bigger the faster. Probably because when
you get a bigger drive it's usually newer technology.

Generally newer drives fit more data on a track, so can read or write more
in a single revolution, so get a higher data transfer rate.
 
Generally newer drives fit more data on a track, so can read or write more
in a single revolution, so get a higher data transfer rate.

So I stand by my argument - always buy bigger first, then raid :-)

--
*****TWO BABY CONURES***** 15 parrots and increasing http://www.petersparrots.com
93 silly video clips http://www.insanevideoclips.com
1259 digital photos http://www.petersphotos.com
Served from a pentawatercooled dual silent Athlon 2.8 with terrabyte raid

Bikini e pareo, camicia di pizzo e shorts, top e gonna di crochet!
 
Peter said:
So I stand by my argument - always buy bigger first, then raid :-)

Buy _faster_ first, not necessairly _bigger_. A 40 gig 7K250 will
outperform a 180 gig 180GXP for example.
 
Marc de Vries said:
[snip]

PATA and SATA controllers don't have this issue since you have a
dedicated ide channel for every disk.

Not with PATA.
It is still *your* choice to not use a second drive on same channel.

I have never seen a PATA Raid5 controller where each drive didn't have
a dedicated ide channel.

Which obviously is entirely firmware limited unless the IDE chip used is
selfdesigned and doesn't support M/S.

No, I saw one which had 8 sockets for 8 disks (and a smaller model with 4 sockets for 4 disks). This is BETTER than 4 sockets for 8 disks!
What has that got to do with hotplugging?

Something to do with the drives themselves expecting the slave to still be there? I dunno.
Right, "perceived".



Which obviously is false when it is "perceived" like that.
Perceived, as the word means, is a false assumption.

If transfer rate exists of STR divided by (seektime and actual
transfer time) per time unit and you then increase the STR you could
also say that that is perceived as if the seektime was decreased com-
pared to the old STR just because the total transfer time decreased.
It's not so.

If it's perceived, it's achieved. What else do you want to do?


--
*****TWO BABY CONURES***** 15 parrots and increasing http://www.petersparrots.com
93 silly video clips http://www.insanevideoclips.com
1259 digital photos http://www.petersphotos.com
Served from a pentawatercooled dual silent Athlon 2.8 with terrabyte raid

"I was walking down fifth avenue today and I found a wallet, and I was gonna keep it, rather than return it, but I thought: well, if I lost a hundred and fifty dollars, how would I feel? And I realized I would want to be taught a lesson."
-- Emo Philips
 
Back
Top