Which is more efficient?

J

JString

The general situation: the law office that I'm working for is converting
their Lotus DB to access. It already has approx. 200k records that are
several years old, all of which are completely lacking unique identifiers.
Accounts are organized by PLAINTIFF, DEFENDANT, CASE#. There should be one
record per account, but it's all fairly disorganized and it's common to see
one account listed as several different records. So, after I clean it all
up, I want to create a feature that guards against this sort of situation.
Here's what I'm thinking about and I need to know what the best option is:

1) Create a find duplicates query based on the fields that I listed above
that is programatically launched by all data entry forms in their before
update events. If it finds another duplicate, alert the user that it exists.

2) Create an ID field (Indexed, duplicates yes) based on the same fields as
above. Data entry forms populate this field on before update. DCount
searches only the ID field for duplicates.

So, which sould be the quickest? Because I need this feature to be run
every time records are changed or new records are entered, I hope you can see
why efficiency is an issue.

Thanks in advance.
 
J

JString

Unfortunately that won't work because the case# is not truly unique. The
case# can sometimes be the same for unrelated cases
 
J

John W. Vinson

The general situation: the law office that I'm working for is converting
their Lotus DB to access. It already has approx. 200k records that are
several years old, all of which are completely lacking unique identifiers.
Accounts are organized by PLAINTIFF, DEFENDANT, CASE#. There should be one
record per account, but it's all fairly disorganized and it's common to see
one account listed as several different records. So, after I clean it all
up, I want to create a feature that guards against this sort of situation.
Here's what I'm thinking about and I need to know what the best option is:

1) Create a find duplicates query based on the fields that I listed above
that is programatically launched by all data entry forms in their before
update events. If it finds another duplicate, alert the user that it exists.

2) Create an ID field (Indexed, duplicates yes) based on the same fields as
above. Data entry forms populate this field on before update. DCount
searches only the ID field for duplicates.

So, which sould be the quickest? Because I need this feature to be run
every time records are changed or new records are entered, I hope you can see
why efficiency is an issue.

Thanks in advance.

You've got worse problems than you may realize.

What if you have a case for plaintiff Joe Jones, and another for Joseph Jones,
and a third for Joe Jones (no, not the car dealer in the first case but the
other Joe Jones, the state representative) - who is actually the Joseph Jones
in the second case?

Name are DREADFUL as unique identifiers. They *WILL* be misspelled, or
variantly spelled; they're not unique (when I was in college there was a
professor John Vinson at the same school); they simply don't WORK as unique
identifiers for a person.

And if your CASE# is ambiguous, I can't imagine how ANY system, Lotus, paper,
human brain or Access can use it. "Janet! Get me the file on Case# 3102!"
"Which one, boss? there are three..."

You unfortunately may need to do a whole lot of manual work creating a People
table with a meaningless numeric identifier for People, with first name, last
name, *and whatever other identifying information you have available* to
uniquely identify the people in your system (who may be defendents,
plaintiffs, attorneys, judges, etc.)

You'll also need to come up with some way to unambiguously identify each Case
and each Account. What you have in your Lotus file will *feed* such a system,
but it will require a lot of help to do so I fear! Good luck!
 
J

JString

I think that I may have seen the light.

If I split the Plaintiff and Defendant fields between first and last names,
set the last name fields to required (first names will only exist for
additional info), then the two last name fields + case# should be enough to
create a unique reference.

Of course that doesn't protect against misspelled values but it should be
enough to significantly reduce the number of duplicates.
 
J

John W. Vinson

If I split the Plaintiff and Defendant fields between first and last names,
set the last name fields to required (first names will only exist for
additional info), then the two last name fields + case# should be enough to
create a unique reference.

Of course that doesn't protect against misspelled values but it should be
enough to significantly reduce the number of duplicates.

Names can be repeated.
You say Case# can be repeated.

*You still have a problem*. With your constraints, it would be legitimate to
have

Case# 123; "Jones"; "Barry"
Case# 123; "Jones", "Barry"

referring to two different cases, Joe Jones being sued by Robert Barry and
Sylvia Jones being sued by Al Barry, and happening to have the same Case#.

Unless you can get the Case# to be unique for a Case you will need other
information to define uniqueness.
 
J

john.fuller

He does have a mess and I know this isn't a good solution but as a quick fix
for a guy that obviously doesn't want to make the needed changes:

You could make a combobox for the user to select the correct person/case # -
show as many fields as you think necessary for them to pick from.  Now you
have put the responsibility on the user to make sure they are checking for
the correct name/Case#/address, etc.

John - you are absolutely correct - just offering a lazy way out!  And you
know how I respect your opinion!

Bonniehttp://www.dataplus-svc.com




If I split the Plaintiff and Defendant fields between first and last names,
set the last name fields to required (first names will only exist for
[quoted text clipped - 3 lines]
Of course that doesn't protect against misspelled values but it should be
enough to significantly reduce the number of duplicates.  
Names can be repeated.
You say Case# can be repeated.
*You still have a problem*. With your constraints, it would be legitimate to
have
Case# 123; "Jones"; "Barry"
Case# 123; "Jones", "Barry"
referring to two different cases, Joe Jones being sued by Robert Barry and
Sylvia Jones being sued by Al Barry, and happening to have the same Case#.
Unless you can get the Case# to be unique for a Case you will need other
information to define uniqueness.

OK, so this may not exactly be helpful, but I'm still baffled by the
fact that there isn't a unique identifier for the case, like I don't
know, Case#. I would think it would be fairly easy to create a unique
case# for every case. Either have it be a sequential number or use
the date and a 2 or 3 or 4 digit case number (depending on how many
are opened every day/month/year). IE...2008-07-001 would be the first
case opened in July of 2008. I personally would push for a unique
identifier for the case, not just for your database but to cut down on
confusion everywhere.
 
J

JString

That is true... I suppose if I added in the DateFiled field then the
occurrence of a duplicate would become even more unlikely, but then as I add
more fields to this search the chance of a misspelled duplicate would
increase. I'm no mathematician but I would guess that after a few hundred
thousand entries both of these possibilities would occur at least a handful
of times. It's beginning to sound more and more like a catch-22.

Even so, what about this...

I could build a procedure that counts the records that match these values,
and in the event that a duplicate is found, a popup form could list these
records for user verification/modification. If the user sees that they are
duplicates, he or she could modify them there on the spot, or if they are not
duplicates a button could be pressed that updates some MatchCount field so
that:

Joe Jones vs. Bob Spencer (case 123) might be the first occurrence and have
a value of 1 updated to MatchCount and,

Mary Jones vs. Mark Spencer (case 123), being the second occurrence would
have value 2 updated to MatchCount.

Of course, the original duplicate count procedure would have to include this
new MatchCount field along with all the others.
 
J

JString

Case# comes from the different court systems in different counties. I don't
know exactly how their numbering systems work and it even looks like these
systems might have varying rules depending on where the case was filed.

He does have a mess and I know this isn't a good solution but as a quick fix
for a guy that obviously doesn't want to make the needed changes:

You could make a combobox for the user to select the correct person/case # -
show as many fields as you think necessary for them to pick from. Now you
have put the responsibility on the user to make sure they are checking for
the correct name/Case#/address, etc.

John - you are absolutely correct - just offering a lazy way out! And you
know how I respect your opinion!

Bonniehttp://www.dataplus-svc.com




If I split the Plaintiff and Defendant fields between first and last names,
set the last name fields to required (first names will only exist for
[quoted text clipped - 3 lines]
Of course that doesn't protect against misspelled values but it should be
enough to significantly reduce the number of duplicates.
Names can be repeated.
You say Case# can be repeated.
*You still have a problem*. With your constraints, it would be legitimate to
have
Case# 123; "Jones"; "Barry"
Case# 123; "Jones", "Barry"
referring to two different cases, Joe Jones being sued by Robert Barry and
Sylvia Jones being sued by Al Barry, and happening to have the same Case#.
Unless you can get the Case# to be unique for a Case you will need other
information to define uniqueness.

OK, so this may not exactly be helpful, but I'm still baffled by the
fact that there isn't a unique identifier for the case, like I don't
know, Case#. I would think it would be fairly easy to create a unique
case# for every case. Either have it be a sequential number or use
the date and a 2 or 3 or 4 digit case number (depending on how many
are opened every day/month/year). IE...2008-07-001 would be the first
case opened in July of 2008. I personally would push for a unique
identifier for the case, not just for your database but to cut down on
confusion everywhere.
 
J

John W. Vinson/MVP

Case# comes from the different court systems in different counties. I don't
know exactly how their numbering systems work and it even looks like these
systems might have varying rules depending on where the case was filed.

In that case, a composite number consisting of the court identifier
(you should certainly have a table of all the courts involved!!) and
their case# should be unique (if you trust the courts to maintain
their own case numbers uniquely).
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top