.Net Codepage 850 problem with uppercase accented characters

G

Guest

Hi. I have a problem using ASP.Net or VB.Net (Winforms) accessing an Oracle database that uses WE8PC850 (which is the IBM-PC codepage 850) character set. Our older ASP apps work fine with this database, but the three different ASP.Net and VB.Net Winforms test apps we have developed so far do not display uppercase French accented A and E characters correctly

I have tried all the potential codepages that seem appropriate: IBM850, IBM863, Windows-1252, ISO-8859-x, utf-8, utf-16 and so forth. The closest seems to be IBM850 reponse and request encoding with the response.charset set to Windows-1252, or ISO-8859-1. As all of the ASP apps work fine and display the codepage 850 uppercase accented characters correctly on our three different desktops we have tried so far, it has to be a problem with the .Net managed provider for Oracle / the .Net codepage 850 implementation, as the same Oracle 8.1.7 client software is used for both the ASP and .Net applications on the client

Because of this Oracle refuses to look into the matter, gleefully pointing out that it must be a Microsoft problem! :

We have tried bland, vanilla test apps in .Net and the problem is always there. None of the older ASP apps (that just use defaults for character sets and codepages etc.) have any problems. All of the desktops have the NLS_LANG parameters set correctly and it DOES seem to be a Microsoft, .Net-specific problem

I have tried MSDN and web searches to no avail up to yet. Having wasted two to three days trying different solutions, I am now desperate for "official" assistance from Microsoft. I am working for AMITA Corp (Ottawa, Canada) and they are a Microsoft Solution Provider, so hopefully we can get some support on this, as we are now almost certain that it IS a .Net bug in the IBM850 codepage support in .Net. Any and all help is greatly appreciated. The Oracle DBA's have washed their hands of the matter and are suggesting we forget about .Net and revert to using Oracle whatever's! Of course we do NOT want to accede to that suggestion!!! ;

Can anyone from Microsoft (or anywhere else for that matter) help out here

Barrie Gray
 
J

Jon Skeet [C# MVP]

BarrieGray said:
Hi. I have a problem using ASP.Net or VB.Net (Winforms) accessing an
Oracle database that uses WE8PC850 (which is the IBM-PC codepage 850)
character set. Our older ASP apps work fine with this database, but
the three different ASP.Net and VB.Net Winforms test apps we have
developed so far do not display uppercase French accented A and E
characters correctly.

I have tried all the potential codepages that seem appropriate:
IBM850, IBM863, Windows-1252, ISO-8859-x, utf-8, utf-16 and so forth.
The closest seems to be IBM850 reponse and request encoding with the
response.charset set to Windows-1252, or ISO-8859-1.

I suggest that you should actually separate the issues - make sure you
can get the data out of the database as a string correctly (which can
be tested in a console app) and then work on getting it to the client
correctly (which needn't involve a database at all for test purposes).

See http://www.pobox.com/~skeet/csharp/debuggingunicode.html for more
information.
 
G

Guest

I have just tried a console app. Here is the code (in case I am doing something wrong). The result is backwards question marks and other incorrect characters. However, while I know how to set the encoding in ASP.Net, I am not certain of the correct way to do the same thing in VB.Net, and this console app specifically, to test it properly? In ASP.Net, you set the Response.Encoding property to the codepage you need and the Request.Charset to a corresponding Charset. Do I need to do something specific in this console app to get to the IBM850 codepage (which should then display the lowercase characters etc. correctly but not the uppercase A and E characters)? At the moment, the display is completely incorrect for all accented characters. In our ASP.Net apps with the codepage set to IBM850 it is only the uppercase A and E characters that are displaying incorrectly. Here is the code I am using

Sub Main(
Dim dt As Data.DataTable = New Data.DataTabl
Dim MyCommand As New OracleComman
Dim OracleConnection1 As System.Data.OracleClient.OracleConnectio
Dim OracleDataAdapter1 As System.Data.OracleClient.OracleDataAdapte

OracleConnection1 = New System.Data.OracleClient.OracleConnectio
OracleDataAdapter1 = New System.Data.OracleClient.OracleDataAdapte
OracleConnection1.ConnectionString = "user id=bgray;data source=IMDS;password=REMOVED

Tr
System.Threading.Thread.CurrentThread.CurrentCulture =
System.Globalization.CultureInfo.CreateSpecificCulture("fr-CA"
MyCommand = OracleConnection1.CreateComman
MyCommand.CommandText = "SELECT DESCRIPTION_FR FROM COM_AREA WHERE (AREA_ID = 92)
MyCommand.CommandType = CommandType.Tex
OracleConnection1.Open(
OracleDataAdapter1.SelectCommand = MyComman
OracleDataAdapter1.Fill(dt

'Display the text using the overloaded version that takes an objec
Console.WriteLine(dt.Rows(0)("Description_fr")
Console.ReadLine(

Finall
'set the variables to nothin
OracleDataAdapter1 = Nothin
OracleConnection1.Close(
MyCommand = Nothin
OracleConnection1 = Nothin
End Tr

End Su

The currentculture stuff shown above seems to make absolutely no difference. So in answer to your submission, it looks like the data is coming from the database incorrectly. If this helps: I can use Oracle's SQL*PLUS, or other development utilities to script the same data and it looks fine, even in Notepad. I can also access the same data through an ASP page and the data also displays correctly. Is there some way I can tell the Managed provider that the data is Codepage 850
 
J

Jon Skeet [C# MVP]

BarrieGray said:
I have just tried a console app. Here is the code (in case I am doing
something wrong). The result is backwards question marks and other
incorrect characters. However, while I know how to set the encoding
in ASP.Net, I am not certain of the correct way to do the same thing
in VB.Net, and this console app specifically, to test it properly?

See the page I pointed you at before:

http://www.pobox.com/~skeet/csharp/debuggingunicode.html

In short, you don't convert the characters in the string into glyphs at
all.

You may find that the connection string is the place to specify the
character set, although I would have thought you shouldn't need to do
it.
 
G

Guest

I have just read your link. I added the following code

Dim c As Cha
For Each c In CStr(dt.Rows(0)("Description_fr")
Console.Write("{0:x4} ", AscW(c)
Console.WriteLine(
Nex
for the string in the database, éÉèÈàÀâÂçÇôÔ (which BTW, I just simply cut and pasted here from the Oracle development tool I am currently using), here is the output in the console window

U?_?O?O?_A and 3 glyphs I am not familiar with (a backwards P, Capital I and Uppercase E

The hex dump is

00d
fff
00d
fff
00d
fff
00d
fff
00f
00c
00b
00c

I will check into the connection string, but our ASP apps do not specify anything specific and they do work as they should. i.e the above data displays as it is here in an ASP page

Anything else I can do in the test app to help diagnose what is going on? Thx. --- Barri

BTW, if you want to switch to email to avoid cluttering the thread with too much detail (we can always post an answer if we get one), please let me know. --- B.
 
J

Jon Skeet [C# MVP]

BarrieGray said:
I have just read your link. I added the following code:

Dim c As Char
For Each c In CStr(dt.Rows(0)("Description_fr"))
Console.Write("{0:x4} ", AscW(c))
Console.WriteLine()
Next

Why are you calling AscW? That seems to be a bad idea to me. Just cast
it to an int directly. The call may well be screwing things up -
certainly the values you've shown don't look healthy to me.
I will check into the connection string, but our ASP apps do not
specify anything specific and they do work as they should. i.e the
above data displays as it is here in an ASP page.

Yes, but they'll be using a different driver, I suspect.
Anything else I can do in the test app to help diagnose what is going
on? Thx. --- Barrie

No, I think when you've changed to direct casting of the character to
an int, we'll be making progress.
BTW, if you want to switch to email to avoid cluttering the thread
with too much detail (we can always post an answer if we get one),
please let me know. --- B.

Nope - I think it's probably fine to keep here for future reference of
people with a similar problem.
 
G

Guest

In an effort to ensure that you have as much info as I can get to you, the situation in the ASP.Net app is as follows

The same accented character string in the previous post, éÉèÈàÀâÂçÇôÔ, when inserted directly (from the Datareader field) into the InnerText of a <Div> label (Runat=Server), is close, but displays as: é?è?à?â?çÇôÔ. As you can see, lowercase characters seem fine, but uppercase characters A and E do not work, yet C cedilla and O circumflex work OK in uppercase??? However, exactly the same field data gets loaded into a combo box through data binding to the datareader object. It then displays as (in the combo text area and dropdown list)

Ú?Þ?Ó?Ô?þöÃ

Note that this is very close to the string as displayed in the Console window of the test app

What is also disconcerting (at least to me) is that, the former (and almost correct) string, is persisted in a Cookie in the app and on the next page (which retrieves the cookie) it displays, as in the combo box, Ú?Þ?Ó?Ô?þÃ¶È Nothing I have tried to date has fixed this discontinuity between the pages, which is why I have now turned to expert assistance for advice. :

The ASP.Net app is using the following code at the very top of the page_load of every post (including postbacks)

'Use the best match to WE8PC850
Response.ContentEncoding = Encoding.GetEncoding("IBM850"
Response.Charset = "ISO-8859-1
Request.ContentEncoding = Encoding.GetEncoding("IBM850"
 
G

Guest

OK, Jon. I cannot cast the char to an int. Here is the extract on eth char datatype from the VB.Net char help overview

Char variables are stored as unsigned 16-bit (2-byte) numbers ranging in value from 0 through 65535. Each number represents a single Unicode character. Direct conversions between the Char data type and the numeric types are not possible, but you can use the AscW and ChrW functions for this purpose

The example for displaying unicode characters in the console window uses the AscW function, so I used it also. the only other thiong I can do in VB, is to treat teh char as an int directly

For Each c In CStr(dt.Rows(0)("Description_fr")
Console.Write("{0:x4} ", c
Console.WriteLine(
Nex

This works OK, but the output is a list of the same characters in a vertical, single-character list, U?_?O?O?_A and the Paragraph mark glyph and an uppercase E (which has a grave accent in ASP.Net, but not in the console app). the AscW function seems to be the way to go. Here is the help on it

AscW returns the Unicode code point for the input character. This can be 0 through 65535. The returned value is independent of the culture and code page settings for the current thread

What else can I do to help you now? -- B.
 
J

Jon Skeet [C# MVP]

BarrieGray said:
In an effort to ensure that you have as much info as I can get to
you, the situation in the ASP.Net app is as follows:

<snip>

I really wouldn't worry *at all* about the ASP.NET part of things until
you've got the console test app working. Once you've got that correct,
the rest is likely to fall into place.

I would, however, recommend that you then use UTF-8 for both the
charset and content encoding of the response. It's much more portable
than code page 850.
 
G

Guest

I changed the encoding and charset as you suggested and the results are

Ú�Þ�Ó�Ô�þÃ¶È for what was previously a close attempt with only the uppercase A and E characters incorrect

and exactly the same thing in the comb box and second ASP page (which I suppose is at least consistent, even if it is much more incorrect in appearance). I have of course tried utf-8 before. What do you want me to try in the console app? Thx. -- B.
 
J

Jon Skeet [C# MVP]

BarrieGray said:
OK, Jon. I cannot cast the char to an int. Here is the extract on eth
char datatype from the VB.Net char help overview:

That's truly bizarre.

Okay, try using Convert.ToInt32 instead - that should work. I'm shocked
if you really can't cast from char to int though...
Char variables are stored as unsigned 16-bit (2-byte) numbers ranging
in value from 0 through 65535. Each number represents a single
Unicode character. Direct conversions between the Char data type and
the numeric types are not possible, but you can use the AscW and ChrW
functions for this purpose.

From the documentation of AscW, that really isn't what you want to do.
AscW returns the Unicode code point for the input character. This can
be 0 through 65535. The returned value is independent of the culture
and code page settings for the current thread.

No it's not - from the docs of AscW:

<quote>
The returned value depends on the code page for the current thread
</quote>
 
J

Jon Skeet [C# MVP]

BarrieGray said:
I changed the encoding and charset as you suggested and the results are:

That still won't work yet if you've got problems reading from the
database in the first place though
???????????? for what was previously a close attempt with only the
uppercase A and E characters incorrect,

I'm afraid I'm only seeing question marks for any of your posts - but I
wouldn't worry about this side of things yet at all.
and exactly the same thing in the comb box and second ASP page (which
I suppose is at least consistent, even if it is much more incorrect
in appearance). I have of course tried utf-8 before. What do you want
me to try in the console app? Thx. -- B.

Try using Convert.ToInt32 in the console app - don't bother trying to
display the string itself at all.
 
G

Guest

Convert.toint32 worked fine. (You learn something new everyday in the .Net world :)

However, the console output was exactly the same as in my third post previously, when the AscW function was used. So the output is definitely incorrect. What next? Thx. --- B.
 
G

Guest

Jon, I have to stop now - my family is insisting I take them out to eat. :) I am in a meeting all morning tomorrow. Can we resume this around 1:30pm tomorrow (Monday) EST? I really appreciate your time on this. I just hope we can find a solution. Have a good evening! Thanks again. :) --- Barrie
 
J

Jon Skeet [C# MVP]

BarrieGray said:
Convert.toint32 worked fine. (You learn something new everyday in the
.Net world :)

However, the console output was exactly the same as in my third post
previously, when the AscW function was used. So the output is
definitely incorrect. What next? Thx. --- B.

I guess the next thing is to find out whether the problem is with the
data in the database or the provider. I've seen situations where an ASP
page was putting incorrect data into the database to start with, but
retrieving it in the same way. That would leave you with "bad" data in
the database, but the ASP application would never know it was doing
anything wrong.

Unfortunately, I don't know the best way of checking that - does the
data appear correctly in the database if you examine it in the Oracle
database admin program?
 
J

Jon Skeet [C# MVP]

BarrieGray said:
Jon, I have to stop now - my family is insisting I take them out to
eat. :) I am in a meeting all morning tomorrow. Can we resume this
around 1:30pm tomorrow (Monday) EST? I really appreciate your time on
this. I just hope we can find a solution. Have a good evening! Thanks
again. :) --- Barrie

No problem - as for timing, I can't remember what the difference
between GMT and EST is, but I'll keep an eye out for your posts :)
 
G

Guest

Yes, the data looks absoutely fine in the various Oracle tools and has done for at least a year or two or even more probably. Also,it looks fine in various ASP applications that we have in IE on Windows 2000. I assume from this that the data is fine in the database. The fact that,on the same desktop, using the same Oracle client 8.1.7 connection software, an ASP app will display the data correctly (I will try to find out what driver those apps use), yet the .Net apps, using the MS .Net Managed Provider for Oracle (on my laptop at c:\winnt/microsoft.net\framework\v1.1.4322\system.data.oracleclient.dll,version 1.1.4322.573) display the A and E accented characters incorrectly and also corrupt all of the accented characters if you persist them through a Cookie or even in the dropdown list / text area of a combo box that binds to the exact same datareader source. It certainly sounds like a provider bug to me

BTW, if you do not see the acceented characters I am using as a test, they are, éÉèÈàÀâÂçÇôÔ,which are: lowercase (lc), followed immediately by uppercase (uc), e acute, e grave, a grave, a circomflex, c cedilla and o circumflex. If you right-click in a blank area of your IE browser,select encoding,then Western European, which is ISO-8859-1,you should be able to see the characters correctly (I do on all the desktops here in the client office)

Also, we have tried 3 different desktops and my laptop and the results are totally consistent,which eliminates machine=specific issues, right

I am in a meeting for the next couple of hours,but will try to get back to you on the driver that the ASP apps use. Thx.--- Barri
 
J

Jon Skeet [C# MVP]

Barrie Gray said:
Hi Jon. I have discovered that the ASP apps use the MSDAORA.1 OLEDB
driver for Oracle (provided by Oracle I think, but it might be
Microsoft). I will try that driver in my app some time in the next
day or so and will get back to you with the result. I will have to
download the driver from Oracle as I do not have it on my laptop at
the moment.
Right.

However, from Microsoft's point of view, switching to an older and
less efficient connection driver is not really a good idea. If there
is a problem with the .Net Managed Provider for Oracle with codepage
850 data, then it need to be fixed. Is there anything else I can do
with the console app to help in that regard? --- B.

Well, you could try putting in some data *from* your console app, and
see if you can get it back again. Everything *should* be basically
transparently Unicode.

What is the column type in the database? Is it nvarchar or varchar?
 
G

Guest

Hi Jon. The data type is varchar2 not nvarchar2. However we have now tried a test app using the OLEDB provider that the ASP apps use and it works fine (the same app that did not work using the .Net managed provider - so we are pretty certain the problem is there). I have to get on with the development, but will monitor this thread in case you need me to do any more stuff to help diagnose what is wrong. unfortunately, as I work for an MSP (AMITA) we do not have the luxury to burn too much time on this now that we have a workable solution. I assume you agree that the problem must be in the .Net Managed Provider, as simply replacing that makes everything work like a charm? --- Barri

I will send you just one email so that you can contact me that way if you wish, as I probably will not be checking the thread anywhere nearly so often now that I have a solution. Thank you again for your time, though. It was very helpful. Regards, Barri
 
J

Jon Skeet [C# MVP]

Barrie Gray said:
Hi Jon. The data type is varchar2 not nvarchar2.

Right - that probably won't have been helping matters, although it
obviously still *should* have worked.
However we have now tried a test app using the OLEDB provider that
the ASP apps use and it works fine (the same app that did not work
using the .Net managed provider - so we are pretty certain the
problem is there).

Right. Which .NET managed provider are you using? I believe Oracle
provide their own, and if that isn't working then it's definitely
*their* fault.
I have to get on with the development, but will
monitor this thread in case you need me to do any more stuff to help
diagnose what is wrong. unfortunately, as I work for an MSP (AMITA)
we do not have the luxury to burn too much time on this now that we
have a workable solution. I assume you agree that the problem must be
in the .Net Managed Provider, as simply replacing that makes
everything work like a charm? --- Barrie

Well, it certainly sounds like it's a problem there, yes. I suggest
that if it becomes a problem again, you set up a *very* simple database
(one table, one column) and a *very* simple test app (the kind we've
been talking about) and try to find the appropriate person to send that
to... if you've tried Oracle's own provider, and that hasn't worked,
I'd certainly contact Oracle support - they're obviously likely to be
more keen on fixing it than MS, who would prefer you to use SQL Server
in the first place.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top