X
xirx
When dealing with real live data, you often have some
variation of minor errors in your data. E.g. I have
two lists (databases) in which Names sligthly differ.
Examples:
"Clark Kent" vs "Clark Kent"
"John P. Smith" vs "John Paul Smith"
"Miller Limited" vs "Miller Ltd."
"Peter Hammer" vs "Petre Hammer"
I am looking for a way to handle this (semi-) automatic.
My idea is to have a function f, that takes two strings
and delivers a measure on how much the are alike. E.g.
f should be 1, if both arguments are identical and it
should be 0 if they are "completely" different.
I am pretty sure that a lot of ppl have been thinking
abouut such a thing already and there should be more
than one solution for this.
Any pointers?
variation of minor errors in your data. E.g. I have
two lists (databases) in which Names sligthly differ.
Examples:
"Clark Kent" vs "Clark Kent"
"John P. Smith" vs "John Paul Smith"
"Miller Limited" vs "Miller Ltd."
"Peter Hammer" vs "Petre Hammer"
I am looking for a way to handle this (semi-) automatic.
My idea is to have a function f, that takes two strings
and delivers a measure on how much the are alike. E.g.
f should be 1, if both arguments are identical and it
should be 0 if they are "completely" different.
I am pretty sure that a lot of ppl have been thinking
abouut such a thing already and there should be more
than one solution for this.
Any pointers?