Paul E Collins said:
Out of interest: why, "from the data maintenance point of view", is this
more appropriate than using a database?
Eq.
Hello Paul:
Keeping in mind that this is from our point of view, the data in this data
base has come from many different sources. It is a collection of newspaper
announcements; obituaries memorials milestones etc. It has been growing for
over ten years. My wife uses this data base as a research tool for her hobby
of genealogy. Many of these items we have never read, some were group
harvested from various daily newspaper announcements, some are down loads
from people who offered us a copy their source data and so on.
The minimal level of organization for any item in the data base is that the
item be search-able and recoverable with our search engine. The search
engine we use is DTSearch. DTSearch is an absolutely marvelous software
package.
Often the very first time we view an article is as the result of a search.
There are some groups of items that I have gone through one by one to weed
out the very worst problems but I won't live long enough to get to all of the
items. The common format we use for a document is text or html as a search
target. Once a document is found DTSearch will allow us to open the document
and correct it. Until now I have these announcements arranged in 35 files
and DTSearch allows us to identify the individual items with a document
separater (one of our choice I chose >>) and will display portions of very
large files item by separated items. Presently our announcement files are
quite large, 65 to 70 megabytes each.
The maintenance rational is that if we have the items in separate files(one
announcement one file.txt ), when an item is displayed as the result of a
search hit and we see a problem DTSearch will allow us to open it in Wordpad
or some other text editor and correct problems with that item. Obviously we
need to sync the live data base with a backup copy to keep any changes made.
and automate re-indexing by DTSearch.
These are text records of variable length; 100 to 100,000 characters. We
have attempted to standardize to a simple tab delimited format; Source, date,
Content. All of what we have done so far seems to work reasonably well and
the data is easily recovered with DTSearch. Putting the items each to its
own file offers additional possible future benefits in that a simple but
informative naming scheme for each announcement might offer further
enhancement of the data base.
I'll bet that Data Base persons will have other views on the organization
and use of this date. For us what we have is relatively simple, we have a
great deal of data packed into a relatively small space and with DTSearch we
are able to find the individual data packages we need to find on demand.
Back in the beginning of all this I tried several data base software
approaches starting with Access. The file bloat was enormous and we quickly
came to the limits of Access. Other data base approaches seemed to be overly
complicated and seemed to require a server approach. We have developed it
into a large simple data base searched by DTSearch. I build the data base on
my system and keep my wife's system updated by copying updated parts of the
data base to her system.
George