NTFS block allocation policy (Windows XP)

A

Arne Ludwig

I am writing large files onto an empty NTFS partition using Windows XP
SP1 and I see results in the Computer Management/Disk Defragmenter
display that seem a bit strange. The file is written using
CreateFile/WriteFile. The file is 2GB on a 40GB disk with one primary
partition. The file is written using 1000 calls of WriteFile writing
2MB with each call. This is the only activity on this disk, except for
the presence of a contiguous swap file right at the beginning of the
disk.

Now what I end up in the graphic display of dfrgntfs most often is a
file with 11 fragments that are scattered literally all over the 40GB
disk, or 4 visually separable chunks followed by a free space then 2
more chunks then a big free space and then one big chunk at about 75%
capacity of the disk. (all chunks red, swap file green, one green line
after the first red chunk, all readings from left to right)

Next I defragmented the disk, leaving me with one big blue chunk at
about 25% capacity of the disk. The green line is gone.

I deleted that file and I wrote the file again using the same method
as above. Result: One file with 9 fragments, four on the left as
before, one big chunk where the blue chunk was, thin red line at 75%
capacity of the disk, green line after the first red chunk as before.

Delete and write again, Result: One file with 4 fragments, two big red
chunks in the middle, thin green line on the left.

Again, Result: One file with 10 fragments, fours small red chunks as
in the beginning, thin green line after the first chunk as before, two
big red chunks at 40% close together, one thin line at 75%.

What is going on?

I know that logical disk blocks do not necessarily have anything to do
with physical location on the disk (what with cylinders and LBA and
all that), but is XP NTFS that smart? And if so, why would it be so
non-reproducible, but semi reproducible to some extent (4 small chunks
on the left)?

Strangely enough, with FILE_FLAG_NO_BUFFERING I get a fairly
consistent write speed even with the arbitrary fragmentation but will
it stay that way once the disk gets full?

Could somebody explain the block allocation policy for writing files
with NTFS on XPSP1/2? How is the free list maintained, i.e. when I
remove a big file and then reallocate a file of the same size, does it
end up in the same space? Do I have to reformat the disk to get
contiguous files?

Thanks!
 
S

Slobodan Brcin \(eMVP\)

Hi Arne,

Try giving FS a hint by writing at the end of file (growing the file to desired max size). After that you can freely move inside of
file and write data.

Regards,
Slobodan
 
S

Stephan Wolf [MVP]

See also (freeware)

"Contig v1.52
Wish you could quickly defragment your frequently used files? Use
Contig to optimize individual files, or to create new files that are
contiguous."

-> http://www.sysinternals.com/ntw2k/freeware/contig.shtml

Stephan
---
I am writing large files onto an empty NTFS partition using Windows XP
SP1 and I see results in the Computer Management/Disk Defragmenter
display that seem a bit strange.
[..]
 
A

Arne Ludwig

Thanks Steven and Slobodan. I knew about contig, but unfortunately I
do not know in advance how much data I am going to write. I do know
however, that the data files are usually large and read sequentially
and deleted fairly soon afterward potentially in a different order.

I need something like "soft real time" for FS performance, i.e. I need
to rely on the FS delivering consistent and reproducible WRITE
performance on average, but not guaranteed performance. If NTFS
behaves erratically even on an emtpy disk, that concerns me a lot,
because it means that even once all files are deleted I do not start
with a clean slate and things are going to get progressively worse.

In the meantime I have used the ProcessIdleTasks incantation and
rebooted the system which made NTFS slightly less erratic, i.e. the
files now have 2 to 4 fragments and sometimes they are even
contiguous, but the pieces are still spread out apparently randomly
all over the disk.

Overall this is a positive sign because it means that "the slate" is
cleaned sometimes to some extent, but I may want to have a say in when
to clean the slate. Oh, and I only need to straighten up the free
list, not defragment some 2GB file that happens to be still there.

Does the indexing service have anything to do with it? The layout.ini
seems to have no information about files on my test disk. However,
there are some mysterious invisible 14 files with several hundred
megabyte of undocumented content.

Or is there an "in core" copy of the free list that gets automatically
rebuilt on reboot from the free block bit map on disk?

What I need to get is an understanding if I can trust the NTFS
allocator (instead of writing my own filesystem on a raw partition) or
perhaps if I can give the allocator some hints to make it more
trustworthy in my sense.

I know it is risky to rely on current behaviour but this thing only
needs to work on XPe.

Anything helps! Thanks.

PS. Hi Dan, if you're listening, can you shed any light on this?
 
S

Slobodan Brcin \(eMVP\)

Hi Arne,

I do not know solution to this problem if there is any direct solution at all.

But why do try to achieve with these files? How long they will last?
Why do you worry about small fragmentation so much?

You could create one large file and then trim it down to size that you need to have.
Or you can defrag file(s) so it become compacted.

If you need it for one time creation and then for many reads defragmentation might be acceptable.

BTW:
If you are making this for XPe and you have strict file data requirements then why don't you use some RAW partition and write data
directly there?

Regards,
Slobodan
 
P

Pat [MSFT]

If you know at least the approximate size of the file, the best thing to do
would be to pre-allocate the file, then go back and write to it. NTFS
always zeros blocks on file extension, so you would get better performance
(single zero'ing function, fewer seeks, more write streaming, etc.) and you
would be far less likely to fragment. Also, you should look to have the
largest block sizes you can get away with. This will further decrease
fragmentation possibilities.

NTFS does cache the free list at boot and allocations will be serviced from
that list.

Pat
 
G

George M. Garner Jr.

Pat,
If you know at least the approximate size of the file, the best thing to
do would be to pre-allocate the file, then go back and write to it.<

Could you be more specific as to how to allocate the file. I see the
function SetFileValidData. But that assumes that you already have allocated
the file. Elsewhere I have read that one should not use SetFilePointer(Ex)
because of the added overhead of writing zeros to the file. Is there a way
to allocate the file without incurring the overhead of immediately zeroing
out the clusters. I know the pagefile does this but I do not see an api
that permits it.

Regards,

George.
 
P

Pat [MSFT]

NTFS zeros the file whenever you extend it. If you just use WriteFile() to
extend the file, the same zero'ing occurs. This is for C2 security
requirements, to prevent folks from extending their files over other peoples
files and taking control of the data.

If you extend a file a little at a time, then for every extension:
Seek to cluster location
Zero out enough space for the addition
Seek back to the start of the write
Write the file data

(so, 2 seeks for the write)

Also, if something else has written to the adjoining cluster in the interim
since the last write, then the file will fragment. This happens all the
time in logging applications.

So, if you pre-allocate the file, you only pay for the zeroing 1 time. This
will prevent any other files from intruding on your nicely sequential
allocation and optimize write speeds. If you only pay the price once, then
the the runtime performance is the same as bypassing the zeroing. The
clusters do not get zero'd until you call SetEndOfFile(). So the order is:

CreateFile
SetFilePointer (size of file you intend to create)
SetEndOfFile (commit the file)
SetFilePointer(beginning)
start writing

No more zeroing will occur unless you overrun the file size allocated.
Also, fragmentation will be minimized. The zeroing happens very, very fast
b/c there is no seeking. A single seek to the start of the file, then
sequential writes so the writes occur at the disk speed.

You can bypass the zeroing of files by using the SetFileValidData(). This
requires an elevated privilege to run (SE_MANAGE_VOLUME_NAME), so depending
on the security context of your application you may have problems; also that
is only available on WinXP and later so if you are targeting Win2k systems
it won't work. Whereas the SetFilePointer can be executed by any user
context that has access to the file. Since (from your description) the
process creates the file as well, this won't be a problem.

Pat
 
G

George M. Garner Jr.

Pat,

Since I see that this thread is hopelessly crossposted I am triming it a
little. Hopefully it will be posted in a newsgroup that you are monitoring.
CreateFile
SetFilePointer (size of file you intend to create)
SetEndOfFile (commit the file)
SetFilePointer(beginning)
start writing

Thanks for your detailed explanation. I think that the use of
SetFileValidData() is what I don't (or at least until now) didn't
understand. If I understand you correctly, SetFileValidData() would
substitute for SetEndOfFile (commit the file) in the sequence which you
describe.above. I do not think that it is suitable for my present purposes
but I keep this information for future reference.

Thanks again.

Regards,

George.
 
A

Alexander Grigoriev

Your real concern should not be such insignificant file fragmentation, but
the overhead of the file expansion. Even with 2 MB chunks you may be losing
quite a bit of throughput. Time to move the heads from one fragment to
another is much less than full transaction time to expand the file by 2 MB.

Preallocate the file (set file length), then write to it using
FILE_FLAG_NO_BUFFERING. Without FFNB, you will get extreme file cache bloat
(ugly "feature" of NT cache manager), and CPU use will be higher, too.
 
A

Alexander Grigoriev

I observed that in XP the actual zeroing is delayed until you actually try
to access the file part beyound written. This is what Valid Data Size is
for.
SetFileValidData can be used to avoid zeroing altogeter, even if the file is
not written over.
 
P

Pat [MSFT]

When you call SetEndOfFile() the blocks get zero'd. Just calling
SetFilePointer() beyond the end of the file won't trigger it and you will
see the behavior that you saw. You will also see that if you if the file is
marked as Sparse or (I think) Compressed.

Pat
 
A

Arne Ludwig

The question to me remains when the actual zeroing occurs. If it
occured immediately after SetEndOfFile then the SetFileValidData call
would be quite useless since the 2nd argument to the call needs to be
between the current valid data length (0 on a new file) and the file
size.

So I would interpret Alexander to mean that the actual zeroing is
delayed until after the next access to "new space" after current valid
data length.

Correct?

PS. For whoever cares: My original problem seems to be improved by a
strategy of exponential file extension, e.g. allocating chunks in
powers of two. This often creates a file with a managable number of
fragments. The drawback of course is the increased complexity of the
writing code. A hint to NTFS would have been much handier.
 
A

Alexander Grigoriev

What you describe is correct for Windows 2000 and earlier systems.
What I saw in XP that the blocks don't get zeroed immediately. Otherwize it
would not make sense in XP to introduce the concept of valid data size: the
whole file would be zeroed anyway before any access to it was possible.
 
M

Maxim S. Shatskih

Preallocate the file (set file length), then write to it using
FILE_FLAG_NO_BUFFERING. Without FFNB, you will get extreme file cache bloat
(ugly "feature" of NT cache manager), and CPU use will be higher, too.

Yes. Even "write though" mode is slower then FFNB if we are speaking about
large ( > 100MB ) files.

FFNB does not allow the EOF to be placed at non-sector-aligned value though.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top