Hardlinks

R

Ron

Question for a hardlink guru: Does a hardlink actually occupy the same disk
space as the file to which it is linked, as appears to be the case in
Windows Explorer? If not, how does one identify which entries in Explorer
are actually hardlinks? If so, why would one use a hardlink as opposed to
simply copying the original file?

Background: In an attempt to save some disk space, I created hardlinks to
some large (audio sample) library files. I used FUSTILY Hardlink Create in
Win XP Home SPA. But the links appear in Explorer to have been allocated
the same space as the original files. I've read what I can find about
hardlinks, but nowhere found a definitive way to have Explorer identify
which entries are hardlinks, or even an assertion that a hardlink does NOT
require the same amount of disk space as the original file.

Thx for any help. -Ron
 
L

Lem

Ron said:
Question for a hardlink guru: Does a hardlink actually occupy the same
disk space as the file to which it is linked, as appears to be the case
in Windows Explorer? If not, how does one identify which entries in
Explorer are actually hardlinks? If so, why would one use a hardlink as
opposed to simply copying the original file?

Background: In an attempt to save some disk space, I created hardlinks
to some large (audio sample) library files. I used FUSTILY Hardlink
Create in Win XP Home SPA. But the links appear in Explorer to have
been allocated the same space as the original files. I've read what I
can find about hardlinks, but nowhere found a definitive way to have
Explorer identify which entries are hardlinks, or even an assertion that
a hardlink does NOT require the same amount of disk space as the
original file.

Thx for any help. -Ron

A hardlink is simply a pointer to the data:

"A hard link is a directory entry for a file. Every file can be
considered to have at least one hard link. On NTFS volumes, each file
can have multiple hard links, and thus a single file can appear in many
directories (or even in the same directory with different names).
Because all of the links reference the same file, programs can open any
of the links and modify the file. A file is deleted from the file system
only after all links to it have been deleted. After you create a hard
link, programs can use it like any other file name."
http://www.microsoft.com/resources/.../proddocs/en-us/fsutil_hardlink.mspx?mfr=true

See also http://en.wikipedia.org/wiki/Hard_links#Example

Why would you think that giving a file a second name saves any space?

Try this:

1. look at the total space used on the drive
2. create a hard link to some large file on the drive
3. look at the total space used on the drive -- it should be almost the
same as in step 1.

The space used by adding the hard link is the space occupied by the
directory entry, not the space occupied by the data.
 
R

Ron

Remarks below
A hardlink is simply a pointer to the data:
So I understand.
"A hard link is a directory entry for a file. Every file can be considered
to have at least one hard link. On NTFS volumes, each file can have
multiple hard links, and thus a single file can appear in many directories
(or even in the same directory with different names). Because all of the
links reference the same file, programs can open any of the links and
modify the file. A file is deleted from the file system only after all
links to it have been deleted. After you create a hard link, programs can
use it like any other file name."
http://www.microsoft.com/resources/.../proddocs/en-us/fsutil_hardlink.mspx?mfr=true

See also http://en.wikipedia.org/wiki/Hard_links#Example
Yes, thanks, I had read those links.
Why would you think that giving a file a second name saves any space?
Because it is, just as you say, a pointer. By "saving space," I mean
relative to *copying* the original files. (I have two apps that read the
same libraries, but each app hard codes the path to its library. Hence the
need to either copy the library from the first to the path of the second, or
do effectively the same thing with a hardlink.)
Try this:

1. look at the total space used on the drive
2. create a hard link to some large file on the drive
3. look at the total space used on the drive -- it should be almost the
same as in step 1.
OK, guess I'll do that in order to convince myself. But...
The space used by adding the hard link is the space occupied by the
directory entry, not the space occupied by the data.

Then why does the new explorer entry (or even using the DIR command in a
command prompt window) show an allocation of the FULL number of bytes of the
original file? That's really my question. Along with this one: months from
now when I forget which entries are hardlinks, how will I be able to tell?
(I'm told this is easy to do in Unix, BTW.)

Thx, Ron
 
L

Lem

Ron said:
Remarks below

So I understand.

Yes, thanks, I had read those links.

Because it is, just as you say, a pointer. By "saving space," I mean
relative to *copying* the original files. (I have two apps that read
the same libraries, but each app hard codes the path to its library.
Hence the need to either copy the library from the first to the path of
the second, or do effectively the same thing with a hardlink.)

OK, guess I'll do that in order to convince myself. But...

Then why does the new explorer entry (or even using the DIR command in a
command prompt window) show an allocation of the FULL number of bytes of
the original file? That's really my question. Along with this one:
months from now when I forget which entries are hardlinks, how will I be
able to tell? (I'm told this is easy to do in Unix, BTW.)

Thx, Ron

Nobody ever said that understanding NTFS was easy! And sorry, but the
experiment I suggested won't help you understand anything.

NTFS stores file and folder information in the Master File Table (MFT).
Each MFT record is 1 KB in size.

NTFS views each file (or folder) as a set of file attributes. A file's
elements such as its name, its security information, and even its data
are file attributes. If the data for a given file is small enough (1 KB
less the space required by certain other file attributes), the data is
stored in the MFT itself. Otherwise the "data attribute" of the MFT
entry for the file points to the disk clusters where the data actually
resides.

The "file name attribute" in an MFT record is a repeatable attribute for
both long and short file names. The long name of the file can be up to
255 Unicode characters. The short name is the 8.3, case-insensitive name
for the file. Additional names, or hard links, required by POSIX, can be
included as additional file name attributes.

When you create a hardlink, a "file name attribute" entry is added to
the MFT record for the file. This entry will occupy some of the 1 KB
space allocated to the MFT record, but for multi-megabyte files in which
the data is stored externally to the MFT, it (usually) won't actually
increase the amount of disk space used.

Your confusion comes from how the DIR command (and Windows Explorer)
appear to interpret NTFS file attribute information. Although DIR /X
will correctly show both a long file name and its corresponding 8.3 name
as a single entry, DIR doesn't seem to understand other multiple file
name attributes. As you have seen, DIR lists a hardlink as an
additional entry and, to make matters worse, includes the "file size"
associated with the hardlink when it sums up all of the file entries for
the folder.

Because of this behavior, it's not clear what will happen if, because of
the creation of multiple hardlinks, the total size of files reported by
DIR and Windows Explorer reached (or exceeded) the size of the disk,
even though the actual allocated clusters occupied less than 100% of the
disk. I certainly wouldn't try this on a production system.

As for your second question, it doesn't matter which one you delete.
The data remains until all of the names are deleted.

For more fun with NTFS and hidden information, Google for Alternate Data
Streams.
 
R

Ron

Thanks very much for the detailed comments. They help fill the fine points
I wasn't clear on. Couple remarks below.
Your confusion comes from how the DIR command (and Windows Explorer)
appear to interpret NTFS file attribute information. Although DIR /X will
correctly show both a long file name and its corresponding 8.3 name as a
single entry, DIR doesn't seem to understand other multiple file name
attributes. As you have seen, DIR lists a hardlink as an additional entry
and, to make matters worse, includes the "file size" associated with the
hardlink when it sums up all of the file entries for the folder.

Because of this behavior, it's not clear what will happen if, because of
the creation of multiple hardlinks, the total size of files reported by
DIR and Windows Explorer reached (or exceeded) the size of the disk, even
though the actual allocated clusters occupied less than 100% of the disk.
I certainly wouldn't try this on a production system.
Ok, so there's a distinction between "reported file size" and "allocated
disk space." I'm not too concerned about filling disk to capacity, but I do
wonder about the effect of hardlinks on the "disk efficiency" issues that
can develop as you approach capacity. One would expect, despite the
"mis-reporting" of hardlinks' space, that hardlinks should have no effect on
disk performance. However, by implication, it seems that even an expert is
unsure of this.
As for your second question, it doesn't matter which one you delete. The
data remains until all of the names are deleted.
Understood. But I wasn't so much concerned with deletion per se. I could
make an argument - however convoluted - as to why I'd like Explorer to
*identify* which file references are hardlinks and which the original.
Relates to using (music) software that is routinely updated with replacement
versions and knowing which associated (sample library) file, whose update
schedule may not be the same as for the program series, was new with which
program update.

Anyway, thank you for the response. -Ron
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top