New hard disk architectures

J. Clarke · Dec 20, 2005

Yousuf said:
You would manually choose which components go into the flash disk. Or
you would get a program to analyse the boot sequence and it will choose
which components to send to the flash. You can even pre-determine what
devices are in the system and preload their device drivers.

You've just made the perfect case for why it's needed. 10% of a 100GB
drive is 10GB, 10% of 200GB is 20GB, and so on.

So what? And how much of that will you actually be recovering? Changing
the sector size doesn't recover _all_ of the space used by ECC. Further,
you've totally neglected sparing--to have the same number of spare sectors
available you'd have to devote 8 times as much space to sparing.

Folkert Rienstra · Dec 20, 2005

J. Clarke said:
So what? And how much of that will you actually be recovering?
Changing the sector size doesn't recover _all_ of the space used by ECC.
Further, you've totally neglected sparing--

to have the same number of spare sectors available
you'd have to devote 8 times as much space to sparing.

Nonsense. You're still using 512 byte logical sectors.

Arno Wagner · Dec 20, 2005

You would manually choose which components go into the flash disk. Or
you would get a program to analyse the boot sequence and it will choose
which components to send to the flash. You can even pre-determine what
devices are in the system and preload their device drivers.

O.k., so this is "experts only", i.e. again does not make sense in
a consumer product. And no, you cannot preload device drivers in
any meaningful way, since it is not loading but hardware detection
and initialisation that takes the time.

You've just made the perfect case for why it's needed. 10% of a 100GB
drive is 10GB, 10% of 200GB is 20GB, and so on.

10% is not significant and certainly does not justify such a change.
Seems this is corporate greed and stupidity at work with the
enginners not protesting loudly enough.

Arno

Arno Wagner · Dec 20, 2005

In comp.sys.ibm.pc.hardware.storage J. Clarke said:
Yousuf Khan wrote:

So what? And how much of that will you actually be recovering? Changing
the sector size doesn't recover _all_ of the space used by ECC. Further,
you've totally neglected sparing--to have the same number of spare sectors
available you'd have to devote 8 times as much space to sparing.

Good point. Since the defective sector rate is low, you will
likely have only one per sector, even with larger sectors and
indeed 8 times as much overhead for spares.

The more I look at this, the more bogus it looks to me.

Arno

J. Clarke · Dec 20, 2005

Arno said:
O.k., so this is "experts only", i.e. again does not make sense in
a consumer product. And no, you cannot preload device drivers in
any meaningful way, since it is not loading but hardware detection
and initialisation that takes the time.

10% is not significant and certainly does not justify such a change.
Seems this is corporate greed and stupidity at work with the
enginners not protesting loudly enough.

Researching this a bit I'm finding that they've had to increase the
complexity of the ECC code to cope with increased areal density--apparently
the size of typical defects in the disk doesn't change when the areal
density increases, which means that they have to be able to correct more
dead bits in a sector in order to meet their performance specifications. I
found a letter from Fujitsu that says that they've had to increase the ECC
space from 10% to 15% of total capacity, and that anticipated future
increases in areal density may drive the ECC levels as high as 30% with a
512 byte sector size. It's not clear how much they expect to save by going
to a 4096-byte sector though.

Arno Wagner · Dec 20, 2005

In comp.sys.ibm.pc.hardware.storage J. Clarke said:
Arno Wagner wrote:

In comp.sys.ibm.pc.hardware.storage Yousuf Khan said:

Arno Wagner wrote: [...]
I think that is nonsense. ECC is something like 10%. It does not
make sense to rewrite every driver and the whole virtual layer just
to make this a bit smaller, except meybe from the POV of a
salesperson. From an enginnering POV there is good reason not
to change complex systems for a minor gain.

Click to expand...

You've just made the perfect case for why it's needed. 10% of a 100GB
drive is 10GB, 10% of 200GB is 20GB, and so on.

Click to expand...

10% is not significant and certainly does not justify such a change.
Seems this is corporate greed and stupidity at work with the
enginners not protesting loudly enough.

Click to expand...

Researching this a bit I'm finding that they've had to increase the
complexity of the ECC code to cope with increased areal density--apparently
the size of typical defects in the disk doesn't change when the areal
density increases, which means that they have to be able to correct more
dead bits in a sector in order to meet their performance specifications. I
found a letter from Fujitsu that says that they've had to increase the ECC
space from 10% to 15% of total capacity, and that anticipated future
increases in areal density may drive the ECC levels as high as 30% with a
512 byte sector size. It's not clear how much they expect to save by going
to a 4096-byte sector though.

O.k., this does make sense. If they have a certain maximum error burst
length, then the longer the sector it is in, the less relative
overhead they need to correct it. It is not linear, but ECC does work
better for longer sectors. The mathematics is complicated, don't ask.
(Or if you want to know, Reed-Solomons Coding is the keyword for
burst-error correctin codes). And if they expect 30% overhead, that
may be significant enough to make the change.

There is some precedent, namely the 2048 byte sectors in CDs. That
is also the reason why most modern OSses can deal with 2k secotrs.
Other reason is MODs that also have 2k sectors. I am not aware of
any current random-access device with 4k sectors.

Arno

Rob Stow · Dec 20, 2005

Arno said:
Good point. Since the defective sector rate is low, you will
likely have only one per sector, even with larger sectors and
indeed 8 times as much overhead for spares.

Space on the platters is so cheap that an 8-fold increase in an
already small quantity is easily dismissed. Platters are so
cheap that drive manufactures routinely make drives that only use
60% or 80% of the platters. What matters is the logic involved
in managing the drives - and if the manufacturers think they can
make things work faster, more reliably, or both, with 4096 byte
sectors, then I'm willing to keep an open mind until they are
proven wrong.

Also being overlooked in this thread is how a drive with 4096
byte physical sectors will interact with the operating system.
With NTFS, for example, 512 byte allocation units (AKA
"clusters") are possible, but 4096 bytes is by far the mostly
commonly used cluster size. What kind of performance differences
might we see if there is a one-to-one correspondence between 4 KB
allocation units in the file system and 4 KB phyical sectors,
instead of having to address 8 separate 512 byte sectors for each
cluster?

In other words, the effect on how the drive and the OS work
together could be far more important than the effect on the raw
drive performance. Hardware design /should/ take into account
the software that will use it - and vice versa.

As well, clusters larger than 4 KB are possible with most file
systems, but except with FAT16 they are very seldom used. If the
option of super-sizing clusters was dropped, that would allow for
leaner and meaner versions of file systems like NTFS and FAT32 -
they could drop the "allocation unit" concept and deal strictly
in terms of physical sectors. Simpler software with less cpu
overhead to manage the file system can't possibly hurt.

Arno Wagner · Dec 20, 2005

In comp.sys.ibm.pc.hardware.storage J. Clarke said:
Arno said:

In comp.sys.ibm.pc.hardware.storage Yousuf Khan said:

Arno Wagner wrote: [...]
I think that is nonsense. ECC is something like 10%. It does not
make sense to rewrite every driver and the whole virtual layer just
to make this a bit smaller, except meybe from the POV of a
salesperson. From an enginnering POV there is good reason not
to change complex systems for a minor gain.

Click to expand...

You've just made the perfect case for why it's needed. 10% of a 100GB
drive is 10GB, 10% of 200GB is 20GB, and so on.

Click to expand...

10% is not significant and certainly does not justify such a change.
Seems this is corporate greed and stupidity at work with the
enginners not protesting loudly enough.

Click to expand...

[addendum to myself]
Also note that if you have 10% overhead, you will not save them all
by using longer sectors. More likely you will go down to 8%...5% or so.

Researching this a bit I'm finding that they've had to increase the
complexity of the ECC code to cope with increased areal density--apparently
the size of typical defects in the disk doesn't change when the areal
density increases, which means that they have to be able to correct more
dead bits in a sector in order to meet their performance specifications. I
found a letter from Fujitsu that says that they've had to increase the ECC
space from 10% to 15% of total capacity, and that anticipated future
increases in areal density may drive the ECC levels as high as 30% with a
512 byte sector size. It's not clear how much they expect to save by going
to a 4096-byte sector though.

(Seems I killed my first replay...)

This actually makes sense. If you have a certain maximum burst length
to correct, it requires less overhead per bit of data in a longer data
packet than in a smaller one. It is not linear, but the effect is
noticeable. (See theory of Reed-Solomons coding for details.) If they
expect 30% overhead, longer sectors could cause significant savings.

Arno

hackbox.info · Dec 23, 2005

They're just saying they can do a more efficient error correction over
4096 byte sectors rather than 512 byte sectors.

and its not only about capacity, speed maters too

daytripper · Dec 23, 2005

and its not only about capacity, speed maters too
Huh?

I really have no idea what this means. And since I can't install linux on
it, I'm gonna go back to surfing pr0n.

Shouldn't that be "p0rn"?

hackbox.info · Dec 23, 2005

Huh?

what is faster, four calls to crc routine or one call? This is an
unnecessary overhead for the drives firmware

daytripper · Dec 23, 2005

what is faster, four calls to crc routine or one call? This is an
unnecessary overhead for the drives firmware

What makes you think crc generation is handled by firmware?

I really have no idea what this means. And since I can't install linux on
it, I'm gonna go back to surfing pr0n.

Still think that ought to be "p0rn"...

The little lost angel · Dec 24, 2005

Still think that ought to be "p0rn"...

both are acceptable in current netspeak AFAIK, i think pr0n came about
as an attempt to defeat censor filters that might otherwise censor out
p[o|0]rn

hackbox.info · Dec 24, 2005

What makes you think crc generation is handled by firmware?

Wild guess. It may be on ASIC, but I personally would put it in fpga or
firmware (primary, secondary is loaded from magnetic media on start).

Still think that ought to be "p0rn"...

google it, second link, its a quote

daytripper · Dec 24, 2005

Wild guess. It may be on ASIC, but I personally would put it in fpga or
firmware (primary, secondary is loaded from magnetic media on start).

That's the problem with wild guesses, they're usually wrong. There's little
reason to push media ECC and interface CRC out of controller Si and into a uP,
these are well-understood, highly developed algorithms that are best embedded
where they won't be a throttling influence.

google it, second link, its a quote

Thanks, someone 'splained it already...

hackbox.info · Dec 24, 2005

That's the problem with wild guesses, they're usually wrong. There's
little reason to push media ECC and interface CRC out of controller Si
and into a uP, these are well-understood, highly developed algorithms
that are best embedded where they won't be a throttling influence.

ok, forget about CRC, more sectors for HDD firmware means more
computation, just like more smaller packets slows network connection

Thanks, someone 'splained it already...

I mean the whole sig is a quote, not only the "pr0n" part

Jan Panteltje · Dec 26, 2005

[Even] in the old 8072A FDC the CRC was even done in the chip.
There was a speed advantage: 'read cylinder' versus 'read sector'.
It had to do with the disk head moving past the next sequenctial sector
on a cylinder during the time the processor was calculating / communicating
the next sector number to be read.
From some POV it would then make sense to buffer whole cylinders, dunno if
this is done.
Actually 'disk interleave read' came from that delay IIRC, the processor was
too slow to do sector 1,2,3,4 in a cylinder, so it did (for example)
1, 3, 5, 7.
(A cylinder is a circular track with sectors).
Have not kept up a lot how they do it now, but same problems will likely apply.

Keith · Dec 27, 2005

[Even] in the old 8072A FDC the CRC was even done in the chip.

CRC calculation is trivial in hardware. Why not?

There was a speed advantage: 'read cylinder' versus 'read sector'.
It had to do with the disk head moving past the next sequenctial sector
on a cylinder during the time the processor was calculating / communicating
the next sector number to be read.
From some POV it would then make sense to buffer whole cylinders, dunno if
this is done.

What do you think the "disk buffer" is doing? ;-) Once the head
is in position, might just as well grab whatever one can.

Actually 'disk interleave read' came from that delay IIRC, the processor was
too slow to do sector 1,2,3,4 in a cylinder, so it did (for example)
1, 3, 5, 7.

Yes, but this is ancient history. DMA is wunnerful.

(A cylinder is a circular track with sectors).

A track is the group of sectors found on the same surface/head. A
cylinder is the group of tracks that can be accessed without moving
the head.

Have not kept up a lot how they do it now, but same problems will likely apply.

Which problems? Magnetic density? Head/amplifier bandwidth? Surface
flatness? Yep, those problems are still the limiters. Electronics
certainly isn't.

Jan Panteltje · Dec 27, 2005

What do you think the "disk buffer" is doing? ;-) Once the head
is in position, might just as well grab whatever one can.

Not exactly, the cache RAM in the disk drive would come into play here
from a design POV, any 'disk buffer' in the OS is the next level.
Also the drive will have to do bad sector management (remapping)...

Yes, but this is ancient history. DMA is wunnerful.

DMA has actually nothing to do with it.
Although DMA will free up the processor when putting the disk data (from the
disk drive cache memory usually) into the computer memory, even without DMA
this could be done, but of course it would take a lot more processor cycles.
So DMA frees up the processor from doing IO, it does not really speed up the
transfer as the (overall) transfer is set by mechanical parameters in the
drive. (seek time comes into play, rotation speed).
This is a funny misconception, I once designed (good old days) a small
embedded system that worked 100% without interrupts and without DMA, so
using polling, and just made the instructions so the timing exactly fitted.
A modern processor would have no problem executing a few IO instructions.
But it would suck resources.
Actually you may be right if we look at the max IO speed that can be done
via the PCI bus from the processor POV, I dunno exactly what that is...
New PCI bus is here, things are getting more and more complicated, faster
and faster, and you need a thousands of $$ scope to even be able to measure
something. I really dunno where it will go, perhaps optical.

Which problems? Magnetic density? Head/amplifier bandwidth? Surface
flatness? Yep, those problems are still the limiters. Electronics
certainly isn't.

Oh? head amplifier bandwidth, sensor technology, servo system is not
electronics?
Not to mention the whole uc in the drive....
It is all tightly interconnected.
You are dreaming if you think it is not.

Keith · Dec 28, 2005

Not exactly, the cache RAM in the disk drive would come into play here
from a design POV, any 'disk buffer' in the OS is the next level.
Also the drive will have to do bad sector management (remapping)...

You really don't think the drive can buffer more than the OS -
faster. A disk "cache" isn't. That's why it's called a "buffer"
and not a "cache". Read ahead/behind is a win, write-buffering is
a rather questionable strategy. "Cacheing", I think not. You'll
never *hit* that cache.

DMA has actually nothing to do with it.

It has a *lot* to do with it. The processor just tells the stupid
DMA controller what to do and it does it. Hardware is faster than
software, dontchaknow.

Although DMA will free up the processor when putting the disk data (from the
disk drive cache memory usually)

Wrong. Data is *rarely*, if ever, in the drive's "cache".

into the computer memory, even without DMA
this could be done, but of course it would take a lot more processor cycles.

*TA-DA*, enter DMA.

So DMA frees up the processor from doing IO, it does not really speed up the
transfer as the (overall) transfer is set by mechanical parameters in the
drive.

Bullshit. This is *exactly* why drives were interleaved; the
processor didn't have the power to rub its belly and shake hands at
the same time. DMA relieved it of the shakeing-of-hands.

(seek time comes into play, rotation speed).

You're talking apples and orangutans.

This is a funny misconception, I once designed (good old days) a small
embedded system that worked 100% without interrupts and without DMA, so
using polling, and just made the instructions so the timing exactly fitted.
A modern processor would have no problem executing a few IO instructions.
Whoopie!!

But it would suck resources.

Delete "resources" from the above.

Actually you may be right if we look at the max IO speed that can be done
via the PCI bus from the processor POV, I dunno exactly what that is...
New PCI bus is here, things are getting more and more complicated, faster
and faster, and you need a thousands of $$ scope to even be able to measure
something. I really dunno where it will go, perhaps optical.

Disks aren't interleaved anymore. The hardware is fast enough to
read sequential sectors. Sheesh!

Oh? head amplifier bandwidth, sensor technology, servo system is not
electronics?

Servo systems aren't the speed limiters. The head and read/write
amplifiers are the limiting factors.

Not to mention the whole uc in the drive....

So what? The UC on the drive is simply a cost reduction. Hardware
*could* do the entire job.

It is all tightly interconnected.
You are dreaming if you think it is not.

The snoring is coming from your end.

New hard disk architectures

J. Clarke

Folkert Rienstra

Arno Wagner

Arno Wagner

J. Clarke

Arno Wagner

Rob Stow

Arno Wagner

hackbox.info

daytripper

hackbox.info

daytripper

The little lost angel

hackbox.info

daytripper

hackbox.info

Jan Panteltje

Keith

Jan Panteltje

Keith