Does it have unicode?

M

Mike Labosh

I need to determine if a string contains double-byte (unicode) characters.

In SQL, it was easy. Cast it from NVARCHAR to VARCHAR and back again, and
see if it got lossage.

But in VB.NET, all strings are stored as unicode, so I'm not sure what to
do. I'd like to do something like this: [p-code]

Dim s1, s2 As String

s1 = [my db value]
s2 = CType(CType(s1, AnsiString), String)

If s1 = s2 Then
All characters are ANSI ones
Else
Some characters are double byte
End If

What I certainly don't want to do is loop over the characters to see if
AscW(Mid(s1, i, 1)) > 255 because I'm already looping over 100's of
thousands of records and I don't want to have to sniff individual characters
in a batch like this.

--
Peace & happy computing,

Mike Labosh, MCSD

"Mr. McKittrick, after very careful consideration, I have
come to the conclusion that this new system SUCKS."
-- General Barringer, "War Games"
 
A

Armin Zingler

Mike Labosh said:
I need to determine if a string contains double-byte (unicode)
characters.

In SQL, it was easy. Cast it from NVARCHAR to VARCHAR and back
again, and see if it got lossage.

But in VB.NET, all strings are stored as unicode, so I'm not sure
what to do. I'd like to do something like this: [p-code]

Dim s1, s2 As String

s1 = [my db value]
s2 = CType(CType(s1, AnsiString), String)

If s1 = s2 Then
All characters are ANSI ones
Else
Some characters are double byte
End If

What I certainly don't want to do is loop over the characters to see
if AscW(Mid(s1, i, 1)) > 255 because I'm already looping over 100's
of thousands of records and I don't want to have to sniff individual
characters in a batch like this.


I don't understand what you're trying to achieve. Strings are Unicode
already, as you wrote. Maybe your actual question is whether the String can
be ANSI encoded and back to Unicode without data loss, right? You can have a
look @ system.text.encoding.convert, but I am interested in what's actual
goal.

Armin
 
J

Jay B. Harlow [MVP - Outlook]

Mike,
| What I certainly don't want to do is loop over the characters to see if
| AscW(Mid(s1, i, 1)) > 255 because I'm already looping over 100's of
| thousands of records and I don't want to have to sniff individual
characters
| in a batch like this.
AscW doesn't return Ansi char codes, it returns Unicode char codes, if you
want Ansi char codes you need to use Asc. However non-ansi char codes (codes
255) will be returned as a place holder ansi char...

As Armin stated you can use System.Text.Encoding.Default to convert a String
to/from an array of bytes in your current ansi code page as defined by
Windows Control Panel.

Something like:

Dim s1, s2 As String
Dim bytes() As Byte

bytes = Encoding.Default.GetBytes(s1)
s2 = Encoding.Default.GetString(bytes)

I would expect a loop that early outs to perform better then the converting
the entire string to Ansi & back again. Something like:

Public Function IsAnsi(ByVal s As String) As Boolean
For Each ch As Char In s
If Chr(Asc(ch)) <> ch Then Return False
Next
Return True
End Function

Hope this helps
Jay

|I need to determine if a string contains double-byte (unicode) characters.
|
| In SQL, it was easy. Cast it from NVARCHAR to VARCHAR and back again, and
| see if it got lossage.
|
| But in VB.NET, all strings are stored as unicode, so I'm not sure what to
| do. I'd like to do something like this: [p-code]
|
| Dim s1, s2 As String
|
| s1 = [my db value]
| s2 = CType(CType(s1, AnsiString), String)
|
| If s1 = s2 Then
| All characters are ANSI ones
| Else
| Some characters are double byte
| End If
|
| What I certainly don't want to do is loop over the characters to see if
| AscW(Mid(s1, i, 1)) > 255 because I'm already looping over 100's of
| thousands of records and I don't want to have to sniff individual
characters
| in a batch like this.
|
| --
| Peace & happy computing,
|
| Mike Labosh, MCSD
|
| "Mr. McKittrick, after very careful consideration, I have
| come to the conclusion that this new system SUCKS."
| -- General Barringer, "War Games"
|
|
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

Subtle String Question 5
Function Declaration 14
simple unicode question 3
char or varchar? 6
Regex Issues 7
identifying unicode ranges in a managed string 1
C# and encodings 30
DataTable Bizarreness 4

Top