DMA issue while writing data to SATA hard disk

M

maverick

Hi there,
Before I post my problem, let me present the overall picture.
We have developed a realtime embedded data acquisition system based on
A/D, FPGA, SATA controller, SATA HDD, microcontroller and Ethernet.
The purpose of the system is to record seismic data from the sensors,
digitize them and write them on the SATA HDD. Later, the recorded data
can be retrieved through Ethernet.
We have used Spartan 3 FPGA, Silicon Image SATA controller (Sil3512)
and SATA HDD. There is no OS in our system, we have developed our own
simple FAT, the FPGA acts as the host device which talks to the SATA
controller through a dedicated PCI interface. There is no PCI slot on
our system, we have used opencores PCI bridge on the FPGA which talks
to the SATA controller through its PCI interface. There are two ping
pong buffers in the FPGA, each 512 Kbytes. The digitized data is
stored in these buffers in a ping pong fashion at 36 Mbytes/s. The
FPGA programs the SATA controller DMA. The data is read out from the
buffers at 66 MHz and transported on the PCI bus at 33 MHz towards
SATA controller when the SATA controller initiates the read DMA. The
system works perfect when the incoming data rate is 24 Mbytes/sec. But
as we increase the data rate to 36 Mbytes/s (which is what we
require), we see loss of data occurring. Here is the algorithm for
data recording:

1. Continue filling buffer 1 and buffer 2 with digitized data in a
round robin fashion.
2. As soon as buffer 1 is full, program the DMA for SATA controller so
that it comes and reads out the filled buffer while buffer 2 is being
filled.
3. Wait for the DMA done from the SATA controller from the first DMA.
4. On finding DMA done, wait for buffer2 to be filled.
5. As soon as buffer 2 is filled, program the DMA for SATA controller
so that it comes and reads out the filled buffer 2 while buffer 1 is
being filled.
6. Wait for the DMA done from the SATA controller from the second DMA.
7. On finding DMA done, wait for buffer1 to be filled.
8. Repeat steps 2-7 till the data acquisition is stop.


The data loss occurs due to delay in step 3 and step 6 where the FPGA
waits for the DMA done signal from the SATA controller. At times, the
SATA controller takes longer than anticipated and due to this, data
buffers overflow. 36Mbytes/s should not be an aggressive data rate for
the SATA controller + SATA HDD specially when the PCI link is
dedicated only between the host and the SATA controller.. We have
tried out different numbers with PCI bridge configuration and SATA
controller PCI configuration but things have not improved. We are
using UDMA 6 mode. Anyone out there to guide us how to handle this
problem? Let me know if more information is required. Thanks in
advance.

Best Regards
Farhan
 
O

Ofnuts

maverick said:
Hi there,

If I get it right, at 36MB/s, you have to write a buffer every 1/72th of
a second, which is around 14ms. That gets close to the various seeks
times (even to the track-to-track seek).

What disk are you using? Has it got a cache? Is the cache enabled?
 
M

maverick

If I get it right, at 36MB/s, you have to write a buffer every 1/72th of
a second, which is around 14ms. That gets close to the various seeks
times (even to the track-to-track seek).

What disk are you using? Has it got a cache? Is the cache enabled?
 
M

maverick

If I get it right, at 36MB/s, you have to write a buffer every 1/72th of
a second, which is around 14ms. That gets close to the various seeks
times (even to the track-to-track seek).

What disk are you using? Has it got a cache? Is the cache enabled?


We are using Western Digital's WD800BD 7400 RPM SATA HD, with 2 MB
cache. BTW, how to enable the cache. This is something I did not know
and I thought the cache is already enabled by default. Actually I used
a similar HD with 8 MB cache and I did not get any improvement in the
situation, might be because of the cache not enabled. Kindly tell me
how to do this.
As far as the calculations you have done, let me give you some more
information. Our ADC is running at 18 MHz, it is a 16-bit resolution
ADC so every sample is actually two bytes. The buffer memories are
256Kx16, that is each location is 16-bits wide. That makes a total of
512Kbytes of each memory bank. So the memories are filled at the rate
of 18 Mega samples/second. The time to fill one memory bank is around
14.563 ms so as you said, we have this much time to program the DMA
and initiate the SATA DMA. But here comes one more important
information. The memory is read out at a faster clock of 66 MHz, two
samples are read and packed into a 32-bit Dword (effectively 33 Mega
DWORDS/s)and transported over the PCI interface towards SATA at 33 MHz
(PCI bus is 32-bits wide).So we are reading out the memory buffers 3.6
times faster than the rate the memory buffers are filled. The PCI link
between the SATA controller and the FPGA is dedicated so there is no
arbitration required here. The 2MB cache on the HD will be filled in
15.15 ms if the incoming data rate from the SATA controller to the HD
is 132 Mbytes/s (33 MHz x 4). The first thing I would like to check is
whether the on-disk cache is enabled or not. If it is enabled then
what could be the reason? ( I hope no mistakes in the mathematics)

Regards
 
G

Gumby

We are using Western Digital's WD800BD 7400 RPM SATA HD, with 2 MB
cache. BTW, how to enable the cache. This is something I did not know
and I thought the cache is already enabled by default. Actually I used
a similar HD with 8 MB cache and I did not get any improvement in the
situation, might be because of the cache not enabled. Kindly tell me
how to do this.

It is enabled by default. I've never heard of any such thing as being able
to enable/disable the cache on a HDD either.
 
O

Ofnuts

maverick said:
We are using Western Digital's WD800BD 7400 RPM SATA HD, with 2 MB
cache. BTW, how to enable the cache. This is something I did not know
and I thought the cache is already enabled by default. Actually I used
a similar HD with 8 MB cache and I did not get any improvement in the
situation, might be because of the cache not enabled. Kindly tell me
how to do this.

Sorry but I'm not a disk hardware specialist. RTFM :) If I had to look
at this, I would grab the source code of the hdparm utility for Linux
(has an option "-W: Disable/enable the IDE drive's write-caching")
and/or the disk drivers for same.
between the SATA controller and the FPGA is dedicated so there is no
arbitration required here. The 2MB cache on the HD will be filled in
15.15 ms if the incoming data rate from the SATA controller to the HD
is 132 Mbytes/s (33 MHz x 4). The first thing I would like to check is
whether the on-disk cache is enabled or not. If it is enabled then
what could be the reason? ( I hope no mistakes in the mathematics)

OTOH I would not put too much faith on the write cache. It's efficient
when you have short write bursts, but if you are writing continuously
faster than what the disk can take (transfer rate, plus various seek
times) the cache doesn't help. It just makes things harder to diagnose.
I would also question the assumption that the bytes go directly from RAM
to the write head. It's not impossible that it's a two-step process:
transfer 512K from RAM to HDD cache, then transfer 512K from cache to
write head, so there is no overlap in the delays and you have to add
them. Maybe running without a cache would be faster.

But this is just my best guess, just to make you look at it with a
different point of view, you surely know a lot more about this than I do.
 
S

Squeeze

maverick wrote in news:2fc85a45-e996-4ee3-b056-7cc8f7a94c31@c65g2000hsa.googlegroups.com
We are using Western Digital's WD800BD 7200 RPM SATA HD, with 2 MB
cache.

Which you have shortstroked to ~20-30 GB?
Because that is about how long it will sustain more than or equal to 36MB/s.
BTW, how to enable the cache. This is something I did not know
and I thought the cache is already enabled by default. Actually I used
a similar HD with 8 MB cache and I did not get any improvement in the
situation, might be because of the cache not enabled. Kindly tell me
how to do this.
As far as the calculations you have done, let me give you some more
information. Our ADC is running at 18 MHz, it is a 16-bit resolution
ADC so every sample is actually two bytes. The buffer memories are
256Kx16, that is each location is 16-bits wide. That makes a total of
512Kbytes of each memory bank. So the memories are filled at the rate
of 18 Mega samples/second. The time to fill one memory bank is around
14.563 ms so as you said, we have this much time to program the DMA
and initiate the SATA DMA. But here comes one more important
information. The memory is read out at a faster clock of 66 MHz, two
samples are read and packed into a 32-bit Dword (effectively 33 Mega
DWORDS/s)and transported over the PCI interface towards SATA at 33 MHz
(PCI bus is 32-bits wide).So we are reading out the memory buffers 3.6
times faster than the rate the memory buffers are filled. The PCI link
between the SATA controller and the FPGA is dedicated so there is no
arbitration required here.
The 2MB cache on the HD will be filled in 15.15 ms

It does? What makes you think that cache is the same as buffer?
The buffer is 128kB, the size of a maximum allowable transfer.
if the incoming data rate from the SATA controller to the HD is 132
Mbytes/s (33 MHz x 4). The first thing I would like to check is whether
the on-disk cache is enabled or not. If it is enabled then what could be
the reason?

STR of the drive?
( I hope no mistakes in the mathematics)

What about mistakes in your assumptions.
 
S

Squeeze

Ofnuts wrote in news:[email protected]
Sorry but I'm not a disk hardware specialist. RTFM :) If I had to look
at this, I would grab the source code of the hdparm utility for Linux
(has an option "-W: Disable/enable the IDE drive's write-caching")
and/or the disk drivers for same.


OTOH I would not put too much faith on the write cache. It's efficient
when you have short write bursts, but if you are writing continuously
faster than what the disk can take
(transfer rate,

That would be the platter to buffer rate then, as the STR
already includes the various seek and head switch times.
plus various seek times) the cache doesn't help.
It just makes things harder to diagnose.
I would also question the assumption that the bytes go directly from RAM
to the write head. It's not impossible that it's a two-step process:
transfer 512K from RAM to HDD cache,

buffer, which requires at least 4 write commands.
then transfer 512K from cache to write head, so there is no overlap in
the delays and you have to add them.
Maybe running without a cache would be faster.

Usually that makes a drive very very slow.
 
S

Squeeze

Ofnuts wrote in news:[email protected]
If I get it right, at 36MB/s, you have to write a buffer every 1/72th of
a second, which is around 14ms.
That gets close to the various seeks
times (even to the track-to-track seek).

So what?
What disk are you using? Has it got a cache? Is the cache enabled?

Do you know what a cache is.
Do you know what the difference between a cache and a buffer is?
 
S

Squeeze

Bob Willard wrote in news:[email protected]
I don't think that your simple design approach using ping-pong buffering
has much hope of working with real HDs, due to error handling in the HD.
36 MB/s can be achieved as an STR (at least in the outer zones), but that
is a long-term average; you are expecting *every* 512KB write to be
completed in your ~15 ms window,
which does not leave any time for the HD to detect and recover from an error.

The 'error' would have to be a servo error where the drive is unable
to find the start sector of a tranfer or a previously marked candidate
bad sector. While this can certainly happen this should be very rare.

What may be less rare is that logically consecutive sectors may not
be physically consecutive on the platters, causing a hickup in the STR.
This would be the case after some 'bad' sectors have been reassigned.
You might want to try replacing that WD800BD HD with the much faster
WD1500ADFD (10K RPM 150GB SATA Raptor), just because nothing
else would need to be changed in your design. But frankly, I doubt if any
HD will solve your problem; all HDs do error detection and recovery be-
hind your back, and they all stop transfering data to take care of errors.

Error correction can be switched off if you use the streaming features.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top