Many many files

W

WindyGeorge

Hello all:

Is it possible to overwhelm the file system and or operating system if too
many files are created. I have some data to store and, from the data
maintenance point of view it would be better to store the information
packages as individual files. We are talking about 700,000 files or so
likely separated into five or six directories.

I am using a Vista Home OS with lots of hard drive space on two disk drives.

Any one have experience with a large data base containing many many files???
These files will be searched by a separate search engine and not indexed by
the system. They will reside on a separate partition of the hard drive and
not be on the system partition. Thanks for any suggestions or ideas.

George
 
J

John John (MVP)

You should ask about this on the Vista help groups.

Your hard disk is most likely using the NTFS file system, you can store
more than 4.2 billion files on an NTFS disk (space permitting) and you
can stuff the files in folders as you please, you could stuff them all
in the same folder if you so pleased. However, there are performance
issues when you have a large number of files in a single folder, you may
want to limit the number of files to a more manageable number or you may
want to disable certain NTFS features like Short Filename creation and
Last Access Time Stamps.

John
 
P

Patrick Keenan

WindyGeorge said:
Hello all:

Is it possible to overwhelm the file system and or operating system if too
many files are created.

It is, but this mostly applies to older OS's and file systems. With what
you describe, you shouldn't run into this.

I have some data to store and, from the data
maintenance point of view it would be better to store the information
packages as individual files. We are talking about 700,000 files or so
likely separated into five or six directories.

If you're using NTFS, as long as you don't dump all the files in the root,
you'll be OK as far as the OS and filesystem are concerned.

If you get into significant nesting of directories, this can lead to
problems with excessively long filenames.

I am using a Vista Home OS with lots of hard drive space on two disk
drives.


Any one have experience with a large data base containing many many
files???
These files will be searched by a separate search engine and not indexed
by
the system. They will reside on a separate partition of the hard drive
and
not be on the system partition. Thanks for any suggestions or ideas.

You may run into some issues regarding performance in terms of hardware, but
probably not with the file system or OS.

HTH
-pk
 
T

Tim Slattery

Patrick Keenan said:
If you're using NTFS, as long as you don't dump all the files in the root,
you'll be OK as far as the OS and filesystem are concerned.

That wouldn't matter. In NTFS all directories, including root, have no
limit on how many files/subdirectories they can contain. The limit is
per volume, and is 4,294,967,295 (see
http://www.microsoft.com/technet/prodtechnol/windows2000serv/reskit/core/fncc_fil_tvjq.mspx?mfr=true
)

NTFS stores directory entries in a BTree format, therefore it can
search huge directories much faster than FAT32. (It's still going to
slow down as the directory grows.) But retrieving a list of
files/subdirectories in a directory will be slower.
 
W

WindyGeorge

Tim Slattery said:
That wouldn't matter. In NTFS all directories, including root, have no
limit on how many files/subdirectories they can contain. The limit is
per volume, and is 4,294,967,295 (see
http://www.microsoft.com/technet/prodtechnol/windows2000serv/reskit/core/fncc_fil_tvjq.mspx?mfr=true
)

NTFS stores directory entries in a BTree format, therefore it can
search huge directories much faster than FAT32. (It's still going to
slow down as the directory grows.) But retrieving a list of
files/subdirectories in a directory will be slower.

--
Tim Slattery
MS MVP(Shell/User)
(e-mail address removed)
http://members.cox.net/slatteryt

I plan use this on an XP system but the development will be on a Vista
system. I just wanted to be sure that a file organization on a Vista system
would work on an XP system. Both systems use NTFS so I expect there will be
no difference. Thanks for your comments. The concern here was to avoid
compromising the file system by having too many individual files. Thanks
for your ideas.

george
 
W

WindyGeorge

Paul E Collins said:
Out of interest: why, "from the data maintenance point of view", is this
more appropriate than using a database?

Eq.
Hello Paul:

Keeping in mind that this is from our point of view, the data in this data
base has come from many different sources. It is a collection of newspaper
announcements; obituaries memorials milestones etc. It has been growing for
over ten years. My wife uses this data base as a research tool for her hobby
of genealogy. Many of these items we have never read, some were group
harvested from various daily newspaper announcements, some are down loads
from people who offered us a copy their source data and so on.

The minimal level of organization for any item in the data base is that the
item be search-able and recoverable with our search engine. The search
engine we use is DTSearch. DTSearch is an absolutely marvelous software
package.

Often the very first time we view an article is as the result of a search.
There are some groups of items that I have gone through one by one to weed
out the very worst problems but I won't live long enough to get to all of the
items. The common format we use for a document is text or html as a search
target. Once a document is found DTSearch will allow us to open the document
and correct it. Until now I have these announcements arranged in 35 files
and DTSearch allows us to identify the individual items with a document
separater (one of our choice I chose >>) and will display portions of very
large files item by separated items. Presently our announcement files are
quite large, 65 to 70 megabytes each.

The maintenance rational is that if we have the items in separate files(one
announcement one file.txt ), when an item is displayed as the result of a
search hit and we see a problem DTSearch will allow us to open it in Wordpad
or some other text editor and correct problems with that item. Obviously we
need to sync the live data base with a backup copy to keep any changes made.
and automate re-indexing by DTSearch.

These are text records of variable length; 100 to 100,000 characters. We
have attempted to standardize to a simple tab delimited format; Source, date,
Content. All of what we have done so far seems to work reasonably well and
the data is easily recovered with DTSearch. Putting the items each to its
own file offers additional possible future benefits in that a simple but
informative naming scheme for each announcement might offer further
enhancement of the data base.

I'll bet that Data Base persons will have other views on the organization
and use of this date. For us what we have is relatively simple, we have a
great deal of data packed into a relatively small space and with DTSearch we
are able to find the individual data packages we need to find on demand.

Back in the beginning of all this I tried several data base software
approaches starting with Access. The file bloat was enormous and we quickly
came to the limits of Access. Other data base approaches seemed to be overly
complicated and seemed to require a server approach. We have developed it
into a large simple data base searched by DTSearch. I build the data base on
my system and keep my wife's system updated by copying updated parts of the
data base to her system.

George
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top