wirting special characters out to excel in HTML

M

Matthew Shaw

We have a web-based reporting application written in J2EE
that writes out to excel using response.setContentType
("application/vnd.ms-excel; ")….

The problem is that where we have any special characters
in our report result set E.g umlauts and accents ( ASCII
values 128 to 165 ) this data is corrupted, and does not
appear correctly.

The standard font family used throughout our Web Reports
is Arial,
I have seen this handled by using Verdana, however we are
reluctant to change the fonts in all our reports.

I believe this relates to Excel performing a unicode
translation, unfortunately we require Excel functionality
to enable users to perform operations on the finished
reports.

Is there a font family similar in appearance to Arial that
will handle the unicode character set?
Or is there a mechanism to tell Excel not to perform this
conversion?

thanks.
 
J

Jon Skeet [C# MVP]

Matthew Shaw said:
We have a web-based reporting application written in J2EE
that writes out to excel using response.setContentType
("application/vnd.ms-excel; ")….

The problem is that where we have any special characters
in our report result set E.g umlauts and accents ( ASCII
values 128 to 165 ) this data is corrupted, and does not
appear correctly.

There *are* no ASCII values 128-165. ASCII is 7-bit.

Now, when you say the data is "corrupted", what exactly happens?
Perhaps it's writing it out in UTF-8 or something similar? How are you
writing out the data in the first place, exactly? (i.e what file
format, etc.)
The standard font family used throughout our Web Reports
is Arial,
I have seen this handled by using Verdana, however we are
reluctant to change the fonts in all our reports.

I believe this relates to Excel performing a unicode
translation, unfortunately we require Excel functionality
to enable users to perform operations on the finished
reports.

Is there a font family similar in appearance to Arial that
will handle the unicode character set?

I'd be very surprised if it were the font which was at fault here -
althoguh I could certainly be wrong.
Or is there a mechanism to tell Excel not to perform this
conversion?

How exactly are you exporting from Excel? Or are you only *importing*
into Excel? If you can specify somewhere which character encoding to
use, and make sure you use the same one everywhere, you should be okay.
 
M

Matthew Shaw

We are only importing into Excel. You can explicitly provide a character
encoding...

E.G application/vnd.ms-excel;charset=ISO-8859-1

which I believe is default, others include charset=windows-1251.

They do appear to be producing slightly different results, however none
of the ones I have tried can handle umlauts...

thanks.
 
J

Jon Skeet [C# MVP]

Matthew Shaw said:
We are only importing into Excel. You can explicitly provide a character
encoding...

E.G application/vnd.ms-excel;charset=ISO-8859-1

Right - but if you explicitly provide the charset there, do you also
make sure your J2EE app is actually *using* that character set?
which I believe is default, others include charset=windows-1251.

Ah - if you're using 1251 that may well give different results to
ISO-8859-1. If you can get both sides to use UTF-8 I believe that's the
most likely to work for everything in a simple fashion.
They do appear to be producing slightly different results, however none
of the ones I have tried can handle umlauts...

Hmm... well, I hope the above is helpful...
 
M

Matthew Shaw

I have tried the following
"application/vnd.ms-excel;charset=windows-1251",1250,1252

I believe the default is charset=ISO-8859-1

they do look as though they are altering the imported characters,
although they appear either as . , or ? , or just those wierd square
things that you get when you open a file using an editor that doesn't
support the file format.
 
J

Jon Skeet [C# MVP]

Matthew Shaw said:
I have tried the following
"application/vnd.ms-excel;charset=windows-1251",1250,1252

I believe the default is charset=ISO-8859-1

they do look as though they are altering the imported characters,
although they appear either as . , or ? , or just those wierd square
things that you get when you open a file using an editor that doesn't
support the file format.

Hmm. Thing is, if it's really writing an Excel spreadsheet then it's a
binary file to start with, which is part of what confuses me - unless
it's actually just writing CSV data and using the content-type to
direct it to Excel...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top