Which is more efficient?

JString · Jul 13, 2008

The general situation: the law office that I'm working for is converting
their Lotus DB to access. It already has approx. 200k records that are
several years old, all of which are completely lacking unique identifiers.
Accounts are organized by PLAINTIFF, DEFENDANT, CASE#. There should be one
record per account, but it's all fairly disorganized and it's common to see
one account listed as several different records. So, after I clean it all
up, I want to create a feature that guards against this sort of situation.
Here's what I'm thinking about and I need to know what the best option is:

1) Create a find duplicates query based on the fields that I listed above
that is programatically launched by all data entry forms in their before
update events. If it finds another duplicate, alert the user that it exists.

2) Create an ID field (Indexed, duplicates yes) based on the same fields as
above. Data entry forms populate this field on before update. DCount
searches only the ID field for duplicates.

So, which sould be the quickest? Because I need this feature to be run
every time records are changed or new records are entered, I hope you can see
why efficiency is an issue.

Thanks in advance.

JString · Jul 13, 2008

Unfortunately that won't work because the case# is not truly unique. The
case# can sometimes be the same for unrelated cases

John W. Vinson · Jul 14, 2008

The general situation: the law office that I'm working for is converting
their Lotus DB to access. It already has approx. 200k records that are
several years old, all of which are completely lacking unique identifiers.
Accounts are organized by PLAINTIFF, DEFENDANT, CASE#. There should be one
record per account, but it's all fairly disorganized and it's common to see
one account listed as several different records. So, after I clean it all
up, I want to create a feature that guards against this sort of situation.
Here's what I'm thinking about and I need to know what the best option is:

1) Create a find duplicates query based on the fields that I listed above
that is programatically launched by all data entry forms in their before
update events. If it finds another duplicate, alert the user that it exists.

2) Create an ID field (Indexed, duplicates yes) based on the same fields as
above. Data entry forms populate this field on before update. DCount
searches only the ID field for duplicates.

So, which sould be the quickest? Because I need this feature to be run
every time records are changed or new records are entered, I hope you can see
why efficiency is an issue.

Thanks in advance.

You've got worse problems than you may realize.

What if you have a case for plaintiff Joe Jones, and another for Joseph Jones,
and a third for Joe Jones (no, not the car dealer in the first case but the
other Joe Jones, the state representative) - who is actually the Joseph Jones
in the second case?

Name are DREADFUL as unique identifiers. They *WILL* be misspelled, or
variantly spelled; they're not unique (when I was in college there was a
professor John Vinson at the same school); they simply don't WORK as unique
identifiers for a person.

And if your CASE# is ambiguous, I can't imagine how ANY system, Lotus, paper,
human brain or Access can use it. "Janet! Get me the file on Case# 3102!"
"Which one, boss? there are three..."

You unfortunately may need to do a whole lot of manual work creating a People
table with a meaningless numeric identifier for People, with first name, last
name, *and whatever other identifying information you have available* to
uniquely identify the people in your system (who may be defendents,
plaintiffs, attorneys, judges, etc.)

You'll also need to come up with some way to unambiguously identify each Case
and each Account. What you have in your Lotus file will *feed* such a system,
but it will require a lot of help to do so I fear! Good luck!

JString · Jul 14, 2008

I think that I may have seen the light.

If I split the Plaintiff and Defendant fields between first and last names,
set the last name fields to required (first names will only exist for
additional info), then the two last name fields + case# should be enough to
create a unique reference.

Of course that doesn't protect against misspelled values but it should be
enough to significantly reduce the number of duplicates.

John W. Vinson · Jul 14, 2008

If I split the Plaintiff and Defendant fields between first and last names,
set the last name fields to required (first names will only exist for
additional info), then the two last name fields + case# should be enough to
create a unique reference.

Of course that doesn't protect against misspelled values but it should be
enough to significantly reduce the number of duplicates.

Names can be repeated.
You say Case# can be repeated.

*You still have a problem*. With your constraints, it would be legitimate to
have

Case# 123; "Jones"; "Barry"
Case# 123; "Jones", "Barry"

referring to two different cases, Joe Jones being sued by Robert Barry and
Sylvia Jones being sued by Al Barry, and happening to have the same Case#.

Unless you can get the Case# to be unique for a Case you will need other
information to define uniqueness.

john.fuller · Jul 14, 2008

He does have a mess and I know this isn't a good solution but as a quick fix
for a guy that obviously doesn't want to make the needed changes:

You could make a combobox for the user to select the correct person/case # -
show as many fields as you think necessary for them to pick from. Now you
have put the responsibility on the user to make sure they are checking for
the correct name/Case#/address, etc.

John - you are absolutely correct - just offering a lazy way out! And you
know how I respect your opinion!

Bonniehttp://www.dataplus-svc.com

If I split the Plaintiff and Defendant fields between first and last names,
set the last name fields to required (first names will only exist for

Click to expand...

[quoted text clipped - 3 lines]

Of course that doesn't protect against misspelled values but it should be
enough to significantly reduce the number of duplicates.

Click to expand...

Click to expand...

Names can be repeated.
You say Case# can be repeated.

Click to expand...

*You still have a problem*. With your constraints, it would be legitimate to
have

Click to expand...

Case# 123; "Jones"; "Barry"
Case# 123; "Jones", "Barry"

Click to expand...

referring to two different cases, Joe Jones being sued by Robert Barry and
Sylvia Jones being sued by Al Barry, and happening to have the same Case#.

Click to expand...

Unless you can get the Case# to be unique for a Case you will need other
information to define uniqueness.

Click to expand...

OK, so this may not exactly be helpful, but I'm still baffled by the
fact that there isn't a unique identifier for the case, like I don't
know, Case#. I would think it would be fairly easy to create a unique
case# for every case. Either have it be a sequential number or use
the date and a 2 or 3 or 4 digit case number (depending on how many
are opened every day/month/year). IE...2008-07-001 would be the first
case opened in July of 2008. I personally would push for a unique
identifier for the case, not just for your database but to cut down on
confusion everywhere.

JString · Jul 14, 2008

That is true... I suppose if I added in the DateFiled field then the
occurrence of a duplicate would become even more unlikely, but then as I add
more fields to this search the chance of a misspelled duplicate would
increase. I'm no mathematician but I would guess that after a few hundred
thousand entries both of these possibilities would occur at least a handful
of times. It's beginning to sound more and more like a catch-22.

Even so, what about this...

I could build a procedure that counts the records that match these values,
and in the event that a duplicate is found, a popup form could list these
records for user verification/modification. If the user sees that they are
duplicates, he or she could modify them there on the spot, or if they are not
duplicates a button could be pressed that updates some MatchCount field so
that:

Joe Jones vs. Bob Spencer (case 123) might be the first occurrence and have
a value of 1 updated to MatchCount and,

Mary Jones vs. Mark Spencer (case 123), being the second occurrence would
have value 2 updated to MatchCount.

Of course, the original duplicate count procedure would have to include this
new MatchCount field along with all the others.

JString · Jul 14, 2008

Case# comes from the different court systems in different counties. I don't
know exactly how their numbering systems work and it even looks like these
systems might have varying rules depending on where the case was filed.

[email protected] said:
He does have a mess and I know this isn't a good solution but as a quick fix
for a guy that obviously doesn't want to make the needed changes:

You could make a combobox for the user to select the correct person/case # -
show as many fields as you think necessary for them to pick from. Now you
have put the responsibility on the user to make sure they are checking for
the correct name/Case#/address, etc.

John - you are absolutely correct - just offering a lazy way out! And you
know how I respect your opinion!

Bonniehttp://www.dataplus-svc.com

If I split the Plaintiff and Defendant fields between first and last names,
set the last name fields to required (first names will only exist for
[quoted text clipped - 3 lines]
Of course that doesn't protect against misspelled values but it should be
enough to significantly reduce the number of duplicates.

Click to expand...

Names can be repeated.
You say Case# can be repeated.

Click to expand...

*You still have a problem*. With your constraints, it would be legitimate to
have

Click to expand...

Case# 123; "Jones"; "Barry"
Case# 123; "Jones", "Barry"

Click to expand...

referring to two different cases, Joe Jones being sued by Robert Barry and
Sylvia Jones being sued by Al Barry, and happening to have the same Case#.

Click to expand...

Unless you can get the Case# to be unique for a Case you will need other
information to define uniqueness.

Click to expand...

Click to expand...

OK, so this may not exactly be helpful, but I'm still baffled by the
fact that there isn't a unique identifier for the case, like I don't
know, Case#. I would think it would be fairly easy to create a unique
case# for every case. Either have it be a sequential number or use
the date and a 2 or 3 or 4 digit case number (depending on how many
are opened every day/month/year). IE...2008-07-001 would be the first
case opened in July of 2008. I personally would push for a unique
identifier for the case, not just for your database but to cut down on
confusion everywhere.

John W. Vinson/MVP · Jul 16, 2008

Case# comes from the different court systems in different counties. I don't
know exactly how their numbering systems work and it even looks like these
systems might have varying rules depending on where the case was filed.

In that case, a composite number consisting of the court identifier
(you should certainly have a table of all the courts involved!!) and
their case# should be unique (if you trust the courts to maintain
their own case numbers uniquely).

Unique Index / Need advice	7	Jul 10, 2008
IIF to check values in more than one table	1	Jul 19, 2007
blank fields issue	2	Sep 9, 2009
Not able to enter Chinese 4 byte characters(GB18030)in MSAccess 20	0	Oct 6, 2009
AutoNumber Field (Access 2007)	5	Oct 28, 2008
Access How to check that the value of a field in a new record is not present in 2 fields of the table?	0	Apr 3, 2019
Deleting duplicate parents and grouping children to single parent	3	Jun 11, 2008
2007 Being Weird with AutoNumber	3	Sep 18, 2008

Which is more efficient?

JString

JString

John W. Vinson

JString

John W. Vinson

john.fuller

JString

JString

John W. Vinson/MVP

Ask a Question

Similar Threads