Does it have unicode?

  • Thread starter Thread starter Mike Labosh
  • Start date Start date
M

Mike Labosh

I need to determine if a string contains double-byte (unicode) characters.

In SQL, it was easy. Cast it from NVARCHAR to VARCHAR and back again, and
see if it got lossage.

But in VB.NET, all strings are stored as unicode, so I'm not sure what to
do. I'd like to do something like this: [p-code]

Dim s1, s2 As String

s1 = [my db value]
s2 = CType(CType(s1, AnsiString), String)

If s1 = s2 Then
All characters are ANSI ones
Else
Some characters are double byte
End If

What I certainly don't want to do is loop over the characters to see if
AscW(Mid(s1, i, 1)) > 255 because I'm already looping over 100's of
thousands of records and I don't want to have to sniff individual characters
in a batch like this.

--
Peace & happy computing,

Mike Labosh, MCSD

"Mr. McKittrick, after very careful consideration, I have
come to the conclusion that this new system SUCKS."
-- General Barringer, "War Games"
 
Mike Labosh said:
I need to determine if a string contains double-byte (unicode)
characters.

In SQL, it was easy. Cast it from NVARCHAR to VARCHAR and back
again, and see if it got lossage.

But in VB.NET, all strings are stored as unicode, so I'm not sure
what to do. I'd like to do something like this: [p-code]

Dim s1, s2 As String

s1 = [my db value]
s2 = CType(CType(s1, AnsiString), String)

If s1 = s2 Then
All characters are ANSI ones
Else
Some characters are double byte
End If

What I certainly don't want to do is loop over the characters to see
if AscW(Mid(s1, i, 1)) > 255 because I'm already looping over 100's
of thousands of records and I don't want to have to sniff individual
characters in a batch like this.


I don't understand what you're trying to achieve. Strings are Unicode
already, as you wrote. Maybe your actual question is whether the String can
be ANSI encoded and back to Unicode without data loss, right? You can have a
look @ system.text.encoding.convert, but I am interested in what's actual
goal.

Armin
 
Mike,
| What I certainly don't want to do is loop over the characters to see if
| AscW(Mid(s1, i, 1)) > 255 because I'm already looping over 100's of
| thousands of records and I don't want to have to sniff individual
characters
| in a batch like this.
AscW doesn't return Ansi char codes, it returns Unicode char codes, if you
want Ansi char codes you need to use Asc. However non-ansi char codes (codes
255) will be returned as a place holder ansi char...

As Armin stated you can use System.Text.Encoding.Default to convert a String
to/from an array of bytes in your current ansi code page as defined by
Windows Control Panel.

Something like:

Dim s1, s2 As String
Dim bytes() As Byte

bytes = Encoding.Default.GetBytes(s1)
s2 = Encoding.Default.GetString(bytes)

I would expect a loop that early outs to perform better then the converting
the entire string to Ansi & back again. Something like:

Public Function IsAnsi(ByVal s As String) As Boolean
For Each ch As Char In s
If Chr(Asc(ch)) <> ch Then Return False
Next
Return True
End Function

Hope this helps
Jay

|I need to determine if a string contains double-byte (unicode) characters.
|
| In SQL, it was easy. Cast it from NVARCHAR to VARCHAR and back again, and
| see if it got lossage.
|
| But in VB.NET, all strings are stored as unicode, so I'm not sure what to
| do. I'd like to do something like this: [p-code]
|
| Dim s1, s2 As String
|
| s1 = [my db value]
| s2 = CType(CType(s1, AnsiString), String)
|
| If s1 = s2 Then
| All characters are ANSI ones
| Else
| Some characters are double byte
| End If
|
| What I certainly don't want to do is loop over the characters to see if
| AscW(Mid(s1, i, 1)) > 255 because I'm already looping over 100's of
| thousands of records and I don't want to have to sniff individual
characters
| in a batch like this.
|
| --
| Peace & happy computing,
|
| Mike Labosh, MCSD
|
| "Mr. McKittrick, after very careful consideration, I have
| come to the conclusion that this new system SUCKS."
| -- General Barringer, "War Games"
|
|
 
Back
Top