1.485 Gbit/s to and from HDD subsystem

M

Michael Daly

Al said:
Applying averages to a specific instance of the equipment is a classic
mistake.

In general, applying averages without knowing the distribution is fraught with
error. If these drives have a deep bathtub failure distribution (most either
fail relatively early in life or late in life but relatively few fail near the
average) then assuming average life is a crap shoot.

Personally, I start worrying about a "proven" drive once it gets around five
years old, regardless of what the manufacturer's claims are. My
not-very-scientific observation is that they tend to go around that age.

Mike
 
R

Ryan Godridge

I've read about Physical Track Positioning.
http://www.nyx.net/~sgjoen/disk1.html#ss6.8

For example, the Raptor WD1500 manages ~88 MB/s on outer tracks and
~60 MB/s on inner tracks, while the Barracuda 7200.10 starts at ~80 MB/s
on outer tracks and ends at ~40 MB/s on inner tracks.

http://anandtech.com/printarticle.aspx?i=2760

However, the WD1500 holds "only" 150 GB while the 7200.10 holds 750 GB.
If one looks at the throughput of the 7200.10 on the first 500 GB, it
never falls below 60 MB/s. And the throughput on the first 650 GB never
falls below 50 MB/s.

If it were easy to specify that one wants to use "only the outer x GB"
then one can get good performance from large disks.

Has anyone played with this at all? in Windows? in Linux?

Regards.

Hmm good points on the inner tracks.

Are these requirements burst or sustained transfer rates? If burst,
how much data at a time?

Are you going to be reading and writing concurrently?

The more I look at this, the chunkier the system required seems to
get.
 
S

Spoon

Ryan said:
Are these requirements burst or sustained transfer rates? If burst,
how much data at a time?

Sustained transfer rates.

E.g. capturing 1 minute of HD-SDI video means sequentially writing
186 MB/s for 1 minute, i.e. approximately 11 GB. Likewise, playing
out 1 minute means sequentially reading 11 GB.
Are you going to be reading and writing concurrently?

No. We'll capture a stream once, then play it out over and over.
The more I look at this, the chunkier the system required seems to
get.

Chunky?
 
S

Spoon

The said:
Actually, I think it means that if Western Digital sells 1.2 million
drives a month, one will fail every hour assuming 100% duty cycle.

I took it to mean:

Let X be the time between two failures, then P(X <= t) = 1/2 * t/MTBF

i.e. a continuous uniform distribution over [0, 2*MTBF]

With these assumptions, P(X <= 365) -- the probability for a disk to
fail within the first year -- equals 0.00365
So doing some maths, with some help, it works out to be around a 7.3%
chance of failure every year for 4 drives.

Our figures differ.

Assuming independent random variables, the probability for (at least)
one disk among 4 to fail within the first year equals 1 - 0.99635^4,
i.e. ~0.01452, i.e. roughly 4 * P(X <= 365)

How did you reach your result?

Assuming an exponential distribution,

P(X <= t) = 1 - e^(-t/MTBF)

P(X <= 365) = 0.00727

In this case, the probability for (at least) one disk among 4 to fail
within the first year equals ~0.02878.

Regards.
 
T

Tony Hill

In general, applying averages without knowing the distribution is fraught with
error. If these drives have a deep bathtub failure distribution (most either
fail relatively early in life or late in life but relatively few fail near the
average) then assuming average life is a crap shoot.

Personally, I start worrying about a "proven" drive once it gets around five
years old, regardless of what the manufacturer's claims are. My
not-very-scientific observation is that they tend to go around that age.

Keep in mind that the MTBF for drives only apply within the expected
life cycle of the drive, which is normally either 3 or 5 years
(usually the same as the warranty period), so any failures beyond 5
years just aren't counted at all. A lot of manufacturers also ignore
the first 90 days when they calculate the MTFB, saying that this is
not part of the Useful Life Cycle of the drive (pretty much BS, but
serves to boost MTBF numbers quite nicely).

Long story short, MTBF means shit-all for any practical purpose.
 
T

Tony Hill

You're right. I also need an HD-SDI PCIe board with Linux 2.6 support.


Is it possible to use only the outer tracks of larger disks?
(As discussed in another message.)

It should be possible, though I've never done it. Disks pretty much
always work from the outside in, so the first sector will be on the
outmost section of the platters while the last sector will be at the
innermost section.

With a single disk this is easy, you just assign the partition to only
use the first x sectors of the disk and then ignore the last y
sectors, quite easy to do in Linux fdisk and I believe Windows works
more or less the same way but with megabytes and gigabytes instead of
sectors. This gets trickier with a RAID array though, since your
partitions are actually being made by the RAID controller while what
the OS sees is a sort of virtual disk. As a result it will be up to
the controllers firmware to decide how the partition is going to
reside on the physical disk. I would take a guess that it will pretty
much always just do the simpliest thing and follow the same format as
for a single hard disk, putting the first sectors of the virtual disk
on the first sector of each of the physical disks.
I'm aiming for socket AM2 Athlon 64 X2 4600+ (dual core, 2.4 GHz).
(I'm also considering Core 2 Duo.)

Either option should be fairly good, though I would tend to favor the
Core 2 Duo myself. There is more headroom for higher processors on
the Core 2 Duo platform at the moment, if it turns out that the chip
you select doesn't provide enough ummpf for what you need, including a
quad-core chip at the very top-end of things.

Price/performance also tends to favor Core 2 Duo most of the time as
well, though of course that depends somewhat on your exact
applications, which likely hasn't been very well benchmarked!
Aiming for 2-4 GB.

Good call! 2GB actually will probably be sufficient so long as your
software operates properly. Caching the video stream is going to be
pretty much pointless since you'll overfill any cache in a matter of
seconds anyway, so really you're main concern is that nothing else
runs out of memory and tries to swap. In Linux it may actually be
acceptable to not have a swap partition at all, though the different
paging mechanisms of Windows might make that extremely difficult and
not necessarily intelligent on that platform.
For reference, the WD1500 dissipates 10W in use, 9W idle.
http://www.westerndigital.com/en/products/Products.asp?DriveID=189

Figure at least 4 of those plus one or two drives as
boot/OS/application drive and you're looking at a minimum of 50W for
the drives alone.

Hmm, it's unfortunate that the documents don't list peak power at
spin-up. Usually the initial spin-up is when these drives consume the
most power. Some RAID controllers allow you to specify a delay
between spinning up drives so that the whole array doesn't try to spin
up at once and overload the powersupply.
 
K

krw

Keep in mind that the MTBF for drives only apply within the expected
life cycle of the drive, which is normally either 3 or 5 years
(usually the same as the warranty period), so any failures beyond 5
years just aren't counted at all. A lot of manufacturers also ignore
the first 90 days when they calculate the MTFB, saying that this is
not part of the Useful Life Cycle of the drive (pretty much BS, but
serves to boost MTBF numbers quite nicely).

The useful life is generally stated. Yes there is a bathtub curve
for fails, but the MTBF includes the head end. If there is a
serious issue with early fails, manufacturers use burn-in
techniques..
Long story short, MTBF means shit-all for any practical purpose.

Only to the uninformed.
 
T

The little lost angel

Our figures differ.

Assuming independent random variables, the probability for (at least)
one disk among 4 to fail within the first year equals 1 - 0.99635^4,
i.e. ~0.01452, i.e. roughly 4 * P(X <= 365)

How did you reach your result?

Not sure, I failed my maths consistently and so asked a maths/stats
lecturer friend for a formula to calculate the probability of a
failure and 4 failures, within 1 year if the time between two failures
is given as 1.2M hours. She gave me a formula, involving integrating
from time 0 to 1 year along with some utterances about assuming some
distribution thing. But couldn't figure it out so she calculated for
me in the end :ppPp Any errors is likely entirely my fault since she
simply gave me what I asked for.

But I do know the final part of the equations involved 1 - 0.9xxxxxx
^4 too :pPp
 
T

Tony Hill

The useful life is generally stated.

5 years ago, yes. Now it's like pulling teeth trying to get this
information.
Yes there is a bathtub curve
for fails, but the MTBF includes the head end. If there is a
serious issue with early fails, manufacturers use burn-in
techniques..

You're giving companies too much credit. I wouldn't say that most
companies, or even many companies, ignore the first 90 days. However
I suspect that if you were to dig deep enough, you could find some.
Keep in mind that there is no standard for MTBF measurements.

For hard drives it has also become common to define the duty cycle as
being 8x5 for desktop drives rather than 24x7. This change in
definitnion alone could potentially increases MTBF by a factor of 4
with no improvement in reliability.
Only to the uninformed.

To be useful MTBF requires that you have a statistically relevant
sample size, it requires that you know how MTBF has been defined by
the manufacturer and it requires that your sample distribution is wide
enough that it will not be affected by variations in production runs
and of course, it requires that the MTBF was accurate in the first
place! Case in point, the imfamous IBM 75GXP drives were rated for an
MTBF of 1,000,000 hours, but their actual MTBF was SIGNIFICANTLY
lower.

The best-case scenario is that MTBF is an approximation of a statistic
based on using historical data to estimate real-world failures from
environmentally controlled failure rates. It's one of those numbers
that mostly works pretty well most of the time, but it's definitely
not the sort of thing that you want to take risks with important data
over.
 
R

Ryan Godridge

Ryan Godridge wrote:


Sustained transfer rates.

E.g. capturing 1 minute of HD-SDI video means sequentially writing
186 MB/s for 1 minute, i.e. approximately 11 GB. Likewise, playing
out 1 minute means sequentially reading 11 GB.
So you have to have guaranteed 186 MB/S write speed for the length of
the capture. This might be more problematic than the read speed.
What figures are you seeing for sustained writes?
No. We'll capture a stream once, then play it out over and over.


Chunky?

Chunky - big, powerful.
 
W

willbill

Spoon said:
Ryan Godridge wrote:

Sustained transfer rates.

E.g. capturing 1 minute of HD-SDI video means sequentially writing
186 MB/s for 1 minute, i.e. approximately 11 GB. Likewise, playing
out 1 minute means sequentially reading 11 GB.


11 GB per minute is a heavy
sustained throughput requirement
for even a high end raid array

i'd say forget about Raptor (and
SATA in general)

go with the fastest SCSI drives,
say 4 of them, with a high end
PCI-e x8 raid card (3Ware or Areca
to name two good companies;
figure spending at least $400+)

also forget about raid 10 and 01,
coz SCSI drives tend to be small,
and you're going to need all the
total space of all 4 drives as well
as every last ounce of speed

yes, you can plan on that 4 disk
raid0 array failing, so you better
figure out how to do backups in
some convenient way, say to a large
Seagate 7200.10 SATA drive

so you've already got 5 drives in the
machine and you're going to need/want
a server type mobo with lots of fan
connections (imo, at least 8 of them).
odds are you're looking at a $300+
mobo from the likes of SuperMicro
or Tyan

e.g. i've got a single 150GB raptor
on my faster PC (running XP Pro SP2
with full recent updates from microsoft)

the fastest sustained throughput i've seen
with it has been ~2.5GB/min

and that was only once. more typically
i see sustained speeds in the range of
1.5GB to 2.1GB/min

which makes me wonder if 11GB/min
is possible, even with the fastest
current hardware?

No. We'll capture a stream once, then play it out over and over.


Chunky?

by "chunky" i suspect that he means
big, heavy, and very expensive. :)

oh btw, that raid0 review at:
http://www.hothardware.com/printarticle.aspx?articleid=776
was interesting, but real life isn't so simple

meaning that everything that i've seen about
performance improvement from raid0 arrays
suggests that real world improvements are
unlikely to hit anything close to double
that of a single disk. even with a 4 disk
raid0 array

bill
 
S

Spoon

Ryan said:
So you have to have guaranteed 186 MB/s write speed for the length of
the capture. This might be more problematic than the read speed.

Given a typical hard disk drive, are write sustained transfer rates
always lower than read sustained transfer rates? By how much?

The 9650SE RAID controller equipped with 12 drives achieves similar read
and write throughput in RAID-0 (around 750 MB/s).

http://www.3ware.com/products/benchmarks_9590se.asp

Regards.
 
K

krw

root@localhost said:
Given a typical hard disk drive, are write sustained transfer rates
always lower than read sustained transfer rates?
Yes.

By how much?

That depends on how much you trust write buffering. An aggressive
drive may do fairly well on writes, but to you know if the data
actually got there? These things are important to some.
The 9650SE RAID controller equipped with 12 drives achieves similar read
and write throughput in RAID-0 (around 750 MB/s).

Go fer it, but you'd better test it on your workload. Only you
will know if it works in your application.
 
R

Ryan Godridge

That depends on how much you trust write buffering. An aggressive
drive may do fairly well on writes, but to you know if the data
actually got there? These things are important to some.


Go fer it, but you'd better test it on your workload. Only you
will know if it works in your application.

As Keith says - give it a try, it looks theoretically like it will
work. The 9650SE with 12 ports seems to have 256MB cache so that will
help your writing throughput. It'll want to go in an 8 lane PCIe
slot. Let us know how it works out with your application.
 
M

Michael Daly

Ryan said:
As Keith says - give it a try, it looks theoretically like it will
work. The 9650SE with 12 ports seems to have 256MB cache so that will
help your writing throughput. It'll want to go in an 8 lane PCIe
slot. Let us know how it works out with your application.

Poke around that site and you'll see other variants with fewer drives that still
make 200 MB/s or more. A 6 or 8 drive array is cheaper than a 12 drive version.

A bigger problem is that 200GB stores only 18 min of video. He'll have to get a
bunch of really big drives, so the fastest drives may be too small. I hope he's
not editing a three hour feature.

Mike
 
S

Spoon

willbill said:
I'd say forget about Raptor (and SATA in general)

The WD1500 is one of the fastest 10,000 RPM HDD, irrespective of
the interface. The bottleneck is not the interface.

Ultra-320 can handle 320 MB/s.
Modern SATA controllers can handle 300 MB/s.
My requirements are 186 MB/s.

However, AFAIK, there are no 15,000 RPM SATA drives.

Thus you might say that SCSI HDDs are faster than SATA HDDs, but
it is not SCSI that is inherently faster.
also forget about raid 10 and 01,
coz SCSI drives tend to be small,
and you're going to need all the
total space of all 4 drives as well
as every last ounce of speed

One exception to this rule is the Cheetah 15K.5 (300 GB).
135 MB/s sustained on outer tracks!
yes, you can plan on that 4 disk
raid0 array failing, so you better
figure out how to do backups in
some convenient way, say to a large
Seagate 7200.10 SATA drive
Agreed.

e.g. i've got a single 150GB raptor
on my faster PC (running XP Pro SP2
with full recent updates from microsoft)

the fastest sustained throughput i've seen
with it has been ~2.5GB/min

This is not typical. There must be a bottleneck somewhere else in
your system.
 
S

Spoon

Michael said:
Poke around that site and you'll see other variants with fewer drives
that still make 200 MB/s or more. A 6 or 8 drive array is cheaper than
a 12 drive version.

Could you provide the links to these other benchmarks?

I've come across this disheartening benchmark:
Maximum Performance for Linux Kernel 2.6
http://www.3ware.com/linuxbenchmarks.htm

Goal of tests:
Maximum performance possible with 3ware 9000 Series RAID controller
under Linux 2.6.

System configuration:
Processor: Xeon 2.4 Ghz (2)
12 70 GB WDC WD740GD drives in hardware RAID-0
Stripe size: 64k
Kernel: 2.6.5
Controller: 3ware 9500S-12
Driver: 2.26.00.005
Driver cmds_per_lun setting: 254 (default)
Firmware: FE9X 2.02.00.009
OS Runlevel: 3
Bonnie++ version: 1.02c
Iozone version: 3.203
Motherboard: SE7501 CW2
RAM: 512 MB
Amount of I/O performed: 20 GB

(Note: The WD1500 is only 10-20% faster than the WD740GD.)

cf. Test #8: Bonnie++ tuned w/xfs filesystem

They reach 410 MB/s read and (only) 200 MB/s write.

Holy mother of pearls! Only 200 MB/s with 12 drives?

Did I miss something?
 
S

Spoon

Spoon said:
Could you provide the links to these other benchmarks?

I've come across this disheartening benchmark:
Maximum Performance for Linux Kernel 2.6
http://www.3ware.com/linuxbenchmarks.htm

Goal of tests:
Maximum performance possible with 3ware 9000 Series RAID controller
under Linux 2.6.

System configuration:
Processor: Xeon 2.4 Ghz (2)
12 70 GB WDC WD740GD drives in hardware RAID-0
Stripe size: 64k
Kernel: 2.6.5
Controller: 3ware 9500S-12
Driver: 2.26.00.005
Driver cmds_per_lun setting: 254 (default)
Firmware: FE9X 2.02.00.009
OS Runlevel: 3
Bonnie++ version: 1.02c
Iozone version: 3.203
Motherboard: SE7501 CW2
RAM: 512 MB
Amount of I/O performed: 20 GB

(Note: The WD1500 is only 10-20% faster than the WD740GD.)

cf. Test #8: Bonnie++ tuned w/xfs filesystem

They reach 410 MB/s read and (only) 200 MB/s write.

Holy mother of pearls! Only 200 MB/s with 12 drives?

Did I miss something?

Adding to my confusion:
http://spamaps.org/raidtests.php

"I think in this test we've shown that the 3ware's hardware RAID5 is
totally inferior to Linux 2.6.x's Software RAID5. The only advantage to
using it is that when you're dual booting, it will work the same in
Windows/Linux. For a server, it's not even an option, as you see the
dips in the concurrent numbers. It might be more fair if the 3ware could
use larger chunk-sizes than 64KB."

(He benchmarked a 3ware Escalade 8506-8 SATA RAID Card.)
 
W

willbill

Spoon said:
willbill wrote:

The WD1500 is one of the fastest 10,000 RPM HDD, irrespective of
the interface. The bottleneck is not the interface.

Ultra-320 can handle 320 MB/s.


nice to know. :)

Modern SATA controllers can handle 300 MB/s.


is that SATA-1 or SATA-2?

not that it makes much diff
coz SATA HDDs still haven't
gotten to the SATA-1 limit

My requirements are 186 MB/s.


you seem to be doing your homework. :)

and you've gotten a load of good
comments/ideas in this thread. but...
However, AFAIK, there are no 15,000 RPM SATA drives.


that is very true

also, are 10k 150GB Raptors (SATA-1) really
going to be big enough for your use?

the 15k SCSI drives offer three things
that you likely need/want for your high
end video requirement:
1) faster single drive xfer (15k xfers faster
than 10k or 7200 and
2) especially faster read/write seeks (which
are likely to be important to you even
within some type of raid array) and
3) much much better server type performance
(assumming you'd like to do more than one
function at a time)
Thus you might say that SCSI HDDs are faster than SATA HDDs, but
it is not SCSI that is inherently faster.


have you ever used SCSI?

anyhow, disagreed for server type use

partly agreed for single user type use

One exception to this rule is the Cheetah 15K.5 (300 GB).
135 MB/s sustained on outer tracks!


interesting

what's the current street price
for one of these?

so there's a very good chance that
a "small" machine with a 6 disk raid0
Cheetah 15K.5 and a top end raid
controller would meet your needs

that'd be 1800GB

fwiw, i've looked at some of the
Areca comments about the 2 terabyte
HDD size limit, and it appears it
can be worked around depending
on the OS. for Windows:
<"it change the sector size from default
512 to 4k. the maximum volume capacity
up to 16TB. This option works under Windows
platform only. and it CAN NOT be converted
to Dynamic Disk, because 4k sector size
is not a standard format.">

anyhow, 1800/11/min gives you 163
minutes of high def video

maybe you'll want to put 2 of
these raid0 arrays in the machine

jeez just the thought makes me LOL

(sorry!)

re the 2 terabyte limit, Areca has a couple
of ways around that (depending on the OS)

also, given that you're looking at that
3Ware raid controller, you definitely
want to download the full manual and
check what the raid6 performance is
for writes

e.g. my somewhat high end Areca 1210 raid
card shows general raid6 performance as:
read = similar to raid0; write = *slower*
than a single disk!

fwiw, this Areca 1210 has raid 0, 1, 1E,
10, 3, 5 capability (and 6 for the 1220+)

all of them show writes as being equal
or less than a single drive, with the
sole exception of *raid0*

i have no clue if any of the other raid
classes offers better write performance,
but i'd hardly want to be setting up
an 8 or 10 disk raid0 array


wow a complement. :)

i hadn't realized that you were
prepared to go as far as a 12
drive raid6 array (with at least
one of those being a hot spare)

add an expensive double wide case to my
previous list of stuff that you'll need,
as well as an expensive power supply

This is not typical. There must be a bottleneck somewhere else in
your system.


my strong hunch is that it is typical

but i'll continue this in a separate post. :)

bill
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top