Identifying ASCII characters in strings

P

Peter Hallett

I have encountered a rather interesting problem. I have a large file of
badly-entered names and addresses – nothing to do with me, I hasten to add.
There are many instances of multiple spaces and other spuriously repeated
characters. Before outputting this data, I therefore put the file through a
VBA filter, which is largely successful in removing this ‘debris’. It simply
takes each field in turn and examines it, character by character, using the
Mid function, in a nested set of Do loops. For specified characters,
particularly the space, if two adjacent characters are found to be the same
then one of them is ignored. Oddly, although this removes by far the
majority of the unwanted padding, a few instances of repeated spaces still
remain. Debugging has shown that, in these cases, although the Mid function
returns “ “ for both adjacent characters, the test:–
If (stTestChar1 = “ “) And (stTestChar2 = “ “) Then …..
fails to identify a match. I conclude from this that two different ASCII
characters are being interpreted as spaces.

I cannot immediately see how such characters can be conveniently identified.
One way would be to test whether the cardinal value of each corresponding
ASCII character lies within one of the acceptable ranges. This would
certainly impose a much smaller processing overhead than would individually
testing each character, but I can’t immediately see how to do this. Can
anyone enlighten me?
 
D

Dirk Goldgar

Peter Hallett said:
I have encountered a rather interesting problem. I have a large file of
badly-entered names and addresses – nothing to do with me, I hasten to
add.
There are many instances of multiple spaces and other spuriously repeated
characters. Before outputting this data, I therefore put the file through
a
VBA filter, which is largely successful in removing this ‘debris’. It
simply
takes each field in turn and examines it, character by character, using
the
Mid function, in a nested set of Do loops. For specified characters,
particularly the space, if two adjacent characters are found to be the
same
then one of them is ignored. Oddly, although this removes by far the
majority of the unwanted padding, a few instances of repeated spaces still
remain. Debugging has shown that, in these cases, although the Mid
function
returns “ “ for both adjacent characters, the test:–
If (stTestChar1 = “ “) And (stTestChar2 = “ “) Then …..
fails to identify a match. I conclude from this that two different ASCII
characters are being interpreted as spaces.

I cannot immediately see how such characters can be conveniently
identified.
One way would be to test whether the cardinal value of each corresponding
ASCII character lies within one of the acceptable ranges. This would
certainly impose a much smaller processing overhead than would
individually
testing each character, but I can’t immediately see how to do this. Can
anyone enlighten me?


One possibility for your spurious space is ASCII 160, which is used
sometimes as a "non-breaking space". In versions of Access up to 2003,
there's an ASCII chart in the Miscellaneous section of the "Visual Basic
Language Reference Manual" help topic.

You can get the ASCII value of a character by passing it to the Asc()
function. Here's an example of a function that removes all characters
except alphabetics characters and numeric digits, which you could adapt to
your needs:

'----- start of code -----
Function StripAllButAlphanumerics(varOldNumber As Variant) As String

'Removes all but alphabetic and numeric characters in a string

Dim I As Integer
Dim intLength As Integer
Dim strThisCharacter As String
Dim strOldNumber As String
Dim strNewNumber As String

strOldNumber = varOldNumber & vbNullString

intLength = Len(strOldNumber)

strNewNumber = vbNullString

For I = 1 To intLength
strThisCharacter = Mid(strOldNumber, I, 1)
Select Case Asc(strThisCharacter)
Case 48 To 57, 65 To 90, 97 To 122
strNewNumber = strNewNumber & strThisCharacter
End Select
Next I

StripAllButAlphanumerics = strNewNumber

End Function
'----- end of code -----
 
P

Peter Hallett

Thank you, Dirk. That will do very nicely. I entered ASCII, in VBA Help,
but did not discover the Asc() function, for some reason. However, it seems
to be just what I looking for. I am grateful. It will probably enable me to
improve my existing filter.
 
P

Peter Hallett

Dirk,

The results of the ASCII character search are as you predicted. Both
characters 32 and 160 are being used as spaces. Although VBA displays both
as “ “, it clearly uses their ASCII values when doing comparisons.

As I mentioned in my earlier reply, had I looked more closely, I should have
been able to find the Asc function myself, before coming into the forum.
Meanwhile, however, with your help I have now extended the multiple-space
filter to eliminate repeated instances of character 32, 160, 32 followed by
160 and 160 followed by 32. The processed raw data now looks a lot tidier.
Next time I’ll study the alphabetical listing of VBA function rather more
closely before asking for assistance, however welcome the responses usually
are.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top