How to write a degree sign into an XML file created with _wfopen()?

  • Thread starter Thread starter Bogdan
  • Start date Start date
B

Bogdan

Hi,

In my app I get a text from a database that contains a degree sign (U+00B0,
Alt+176). The app is Unicode (i.e. all chars are wchar_t). What would be
a correct way to create an XML file with _wfopen() so a degree sign is
written properly.

I tried to _wfopen_s(&fp, "test.xml", "w") and fputws(str, fp) but the
resulting xml file was not of a correct format. That is, IE - for example -
complained that "An invalid character was found....".

Thanks,
Bogdan
 
In my app I get a text from a database that contains a degree sign
(U+00B0, Alt+176). The app is Unicode (i.e. all chars are wchar_t).
What would be a correct way to create an XML file with _wfopen() so a
degree sign is written properly.

I tried to _wfopen_s(&fp, "test.xml", "w")

Try using the "ccs=UNICODE" in mode string:

_wfopen_s( &fp, L"text.xml", L"w,ccs=UNICODE" )

Giovanni
 
Giovanni Dicanio said:
Try using the "ccs=UNICODE" in mode string:

_wfopen_s( &fp, L"text.xml", L"w,ccs=UNICODE" )

Giovanni

Thanks for pointing me in the right direction. I ended up using UTF-8 and
creating a file in two steps as follows:

errno_t nErrno = _wfopen_s(&fp, pszPath, L"w");
if (0 == nErrno) {
fputws(L"<?xml version=\"1.0\" encoding=\"utf-8\"?>\n", fp);
fclose(fp);
nErrno = _wfopen_s(&fp, pszPath, L"a, ccs=UTF-8");
}

Thanks again,
Bogdan
 
Try using the "ccs=UNICODE" in mode string:

_wfopen_s( &fp, L"text.xml", L"w,ccs=UNICODE" )

I would rather recommend
_wfopen_s( &fp, L"text.xml", L"w,ccs=UTF-8" )
The default encoding of XML is utf-8.

And in fact I would really-really recommend using an existing XML library.
 
Thanks for pointing me in the right direction. I ended up using UTF-8 and
creating a file in two steps as follows:

errno_t nErrno = _wfopen_s(&fp, pszPath, L"w");
if (0 == nErrno) {
fputws(L"<?xml version=\"1.0\" encoding=\"utf-8\"?>\n", fp);
fclose(fp);
nErrno = _wfopen_s(&fp, pszPath, L"a, ccs=UTF-8");
}

I really don't see the need for two steps.
nErrno = _wfopen_s(&fp, pszPath, L"w, ccs=UTF-8");
if (0 == nErrno) {
fputws(L"<?xml version=\"1.0\" encoding=\"utf-8\"?>\n", fp);
// here you can do everything you want
fclose(fp);
}
should do.
 
Mihai N. said:
I really don't see the need for two steps.
nErrno = _wfopen_s(&fp, pszPath, L"w, ccs=UTF-8");
if (0 == nErrno) {
fputws(L"<?xml version=\"1.0\" encoding=\"utf-8\"?>\n", fp);
// here you can do everything you want
fclose(fp);
}
should do.

I actually I did try the one-step solution first after Giovanni's reply.
The file always ended up with BOM which - according to docs - was correct.
Unfortunately some of XML parsers that I tested the file with had problems
with BOM in front of XML declaration.

Bogdan
 
Unfortunately some of XML parsers that I tested the file with had problems
with BOM in front of XML declaration.

This would be enough for me to discard them as non-compliant.

"All XML processors MUST be able to read entities in both the UTF-8 and UTF-
16 encodings.
....
Entities encoded in UTF-16 MUST and entities encoded in UTF-8 MAY begin with
the Byte Order Mark described by Annex H of [ISO/IEC 10646:2000], section
16.8 of [Unicode] (the ZERO WIDTH NO-BREAK SPACE character, #xFEFF). This is
an encoding signature, not part of either the markup or the character data of
the XML document. XML processors MUST be able to use this character to
differentiate between UTF-8 and UTF-16 encoded documents.

If the replacement text of an external entity is to begin with the character
U+FEFF, and no text declaration is present, then a Byte Order Mark MUST be
present, whether the entity is encoded in UTF-8 or UTF-16."
http://www.w3.org/TR/2008/REC-xml-20081126/#charencoding

Note the "XML processors MUST be able to use this character" part.
A parser that does not respect the MUST parts in a standard is not compliant.

Also good to read:
http://www.w3.org/TR/2008/REC-xml-20081126/#sec-guessing
 
Back
Top