How do I copy the content of a string to a char array?

G

Guest

How do I copy the content of a string in one encoding (in my case big5) to a
char array (unmanaged) of the same encoding?

I try the following

String line[] = S"123æ°´æ³¥";
char buffer[200];

for(int i=0; i<line->get_length(); i++)
{
buffer = (char) line->Chars;
}

It works fine for the first 3 Ascii characters, but gets messed up for the
next 2 Chinese characters. What is wrong here?
 
J

Jochen Kalmbach [MVP]

Hi Kueishiong!
How do I copy the content of a string in one encoding (in my case big5) to a
char array (unmanaged) of the same encoding?
String line[] = S"123æ°´æ³¥";

..NET strings have no special encoding!!! They are always stored in UTF-16.
char buffer[200];

You need to convert the UTF-16 string to the "big5" string!
for(int i=0; i<line->get_length(); i++)
{
buffer = (char) line->Chars;
}

It works fine for the first 3 Ascii characters, but gets messed up for the
next 2 Chinese characters. What is wrong here?



You can use the following:

System::Text::Encoding *e = System::Text::Encoding::GetEncoding("big5");
System::Byte big5 __gc[] = e->GetBytes(S"123æ°´æ³¥");
char *szBig5 = new char[big5->get_Count()+1];
System::Byte __pin *b = &big5[0];
strncpy(szBig5, (char*) b, big5->get_Count());
szBig5[big5->get_Count()] = 0;

--
Greetings
Jochen

My blog about Win32 and .NET
http://blog.kalmbachnet.de/
 
C

Carl Daniel [VC++ MVP]

Jochen said:
Hi Kueishiong!
How do I copy the content of a string in one encoding (in my case
big5) to a char array (unmanaged) of the same encoding?
String line[] = S"123??";

.NET strings have no special encoding!!! They are always stored in
UTF-16.

Actually, I believe it's UCS2. It's not UTF16 since there's no multi-word
characters in the .NET representation and code points above 0xffff are
simply not representable.
char buffer[200];

You need to convert the UTF-16 string to the "big5" string!

Or store it in wchar_t buffer[200] instead of char to preserve the UCS2
format.

-cd
 
G

Guest

Thank you very much for replying.

"> System::Text::Encoding *e = System::Text::Encoding::GetEncoding("big5");
System::Byte big5 __gc[] = e->GetBytes(S"123æ°´æ³¥");

However the source is something I read from a text file which is in a String.

FileStream* fs = new FileStream(path, FileMode::Open);
StreamReader* sr = new StreamReader(fs, Encoding::GetEncoding("big5"));
String *line = sr->ReadLine();

As your suggestion, I have to convert a String to a Byte array.
How do I do that?
char *szBig5 = new char[big5->get_Count()+1];
System::Byte __pin *b = &big5[0];
strncpy(szBig5, (char*) b, big5->get_Count());
szBig5[big5->get_Count()] = 0;

Kueishiong Tu
 
G

Guest

Thank you very much for replying. I change buffer to wchar_t and the coping
works fine.
However the ultimate object I need is a char array becuase the code
following it
requires that. How do I convert a wchar_t array to a char array? From my
experience I know a char array can store both a one-byte ASCII character and
two-byte Chinese character.

Kueishiong Tu
 
J

Jochen Kalmbach [MVP]

Hi Carl!
Actually, I believe it's UCS2. It's not UTF16 since there's no multi-word

In fact there is no multi-word, but there are high/loh-surrogates...
And this _is_ UTF-16 (everything in windows is using UTF-16).

See: http://www.unicode.org/notes/tn12/
<quote>
Most major software with good Unicode support uses UTF-16 (or 16-bit
Unicode strings). Note that much of the software listed below runs on
Unix/Linux systems as well as Windows and others.

- Everything Microsoft — Windows (including Pocket PC) and application
characters in the .NET representation and code points above 0xffff are
simply not representable.

This would be very bad, then .NET would not support unicode!!!
(and by the way: .NET *is* fully unicode enabled).

At least with .NET 2.0, they added some classes to query all the
necessary infos...

See: StringInfo Class
http://msdn2.microsoft.com/en-us/library/c4hkht93(en-us,VS.80).aspx

See: StringInfo.ParseCombiningCharacters
http://msdn2.microsoft.com/en-us/library/2wayc3ak(en-us,vs.80).aspx


--
Greetings
Jochen

My blog about Win32 and .NET
http://blog.kalmbachnet.de/
 
J

Jochen Kalmbach [MVP]

Hi Kueishiong!
However the source is something I read from a text file which is in a String.

FileStream* fs = new FileStream(path, FileMode::Open);
StreamReader* sr = new StreamReader(fs, Encoding::GetEncoding("big5"));
String *line = sr->ReadLine();

This does _not_ matter!!!
If you have a "string" then it _is_ unicode. The encoding was only used
while reading the file (and translating the big5-encoding to unicode).

As your suggestion, I have to convert a String to a Byte array.
How do I do that?
char *szBig5 = new char[big5->get_Count()+1];

My example works very well. What is your problem?


--
Greetings
Jochen

My blog about Win32 and .NET
http://blog.kalmbachnet.de/
 
C

Carl Daniel [VC++ MVP]

Jochen said:
Hi Carl!


In fact there is no multi-word, but there are high/loh-surrogates...
And this _is_ UTF-16 (everything in windows is using UTF-16).

Consider myself educated :) I didn't realize that support for code points
above 0xffff was in fact included in .NET. I'm sure I've missed something,
but I don't recall any very useful character sets in the code points at
10000 and above (e.g. Klingon, Elvish), but I'm happy to see that they're
representable.

-cd
 
J

Jochen Kalmbach [MVP]

Hi Carl!
Consider myself educated :) I didn't realize that support for code points
above 0xffff was in fact included in .NET. I'm sure I've missed something,
but I don't recall any very useful character sets in the code points at
10000 and above (e.g. Klingon, Elvish), but I'm happy to see that they're
representable.

Some might be usefull (but you are right: most of them will never be used):

10000..1007F; Linear B Syllabary
10080..100FF; Linear B Ideograms
10100..1013F; Aegean Numbers
10140..1018F; Ancient Greek Numbers
10300..1032F; Old Italic
10330..1034F; Gothic
10380..1039F; Ugaritic
103A0..103DF; Old Persian
10400..1044F; Deseret
10450..1047F; Shavian
10480..104AF; Osmanya
10800..1083F; Cypriot Syllabary
10A00..10A5F; Kharoshthi
1D000..1D0FF; Byzantine Musical Symbols
1D100..1D1FF; *Musical Symbols*
1D200..1D24F; Ancient Greek Musical Notation
1D300..1D35F; Tai Xuan Jing Symbols
1D400..1D7FF; *Mathematical Alphanumeric Symbols*
20000..2A6DF; *CJK Unified Ideographs Extension B*
2F800..2FA1F; *CJK Compatibility Ideographs Supplement*
E0000..E007F; Tags
E0100..E01EF; Variation Selectors Supplement
F0000..FFFFF; Supplementary Private Use Area-A
100000..10FFFF; Supplementary Private Use Area-B


--
Greetings
Jochen

My blog about Win32 and .NET
http://blog.kalmbachnet.de/
 
N

Norman Diamond

Kueishiong Tu said:
Thank you very much for replying.

"> System::Text::Encoding *e =
System::Text::Encoding::GetEncoding("big5");
System::Byte big5 __gc[] = e->GetBytes(S"123æ°´æ³¥");

However the source is something I read from a text file which is in a
String.

FileStream* fs = new FileStream(path, FileMode::Open);
StreamReader* sr = new StreamReader(fs, Encoding::GetEncoding("big5"));
String *line = sr->ReadLine();

As your suggestion, I have to convert a String to a Byte array.
How do I do that?

If the file contains a Byte array (ANSI string) and you need to pass the
same byte array to another routine, then don't read a String (Unicode
string). Read a byte array in the first place.
 
G

Guest

In your example
System::Text::Encoding *e = System::Text::Encoding::GetEncoding("big5");
System::Byte big5 __gc[] = e->GetBytes(S"123æ°´æ³¥");
char *szBig5 = new char[big5->get_Count()+1];
System::Byte __pin *b = &big5[0];
strncpy(szBig5, (char*) b, big5->get_Count());
szBig5[big5->get_Count()] = 0;

you copy the content of Byte array pointed at by b to a char array szBig5.
However what I need is to copy the content of a String to a char array.
(said String *b = S"123æ°´æ³¥" to szBig5)

Jochen Kalmbach said:
Hi Kueishiong!
However the source is something I read from a text file which is in a String.

FileStream* fs = new FileStream(path, FileMode::Open);
StreamReader* sr = new StreamReader(fs, Encoding::GetEncoding("big5"));
String *line = sr->ReadLine();

This does _not_ matter!!!
If you have a "string" then it _is_ unicode. The encoding was only used
while reading the file (and translating the big5-encoding to unicode).

As your suggestion, I have to convert a String to a Byte array.
How do I do that?
char *szBig5 = new char[big5->get_Count()+1];

My example works very well. What is your problem?


--
Greetings
Jochen

My blog about Win32 and .NET
http://blog.kalmbachnet.de/
 
G

Guest

Thank you very much for replying.

If the file contains a Byte array (ANSI string) and you need to pass the
same byte array to another routine, then don't read a String (Unicode
string). Read a byte array in the first place.
How do I read the content of a text file in as a Byte array instread of a
String which a StreamReader *sr->ReadLine() return?
 
J

Jochen Kalmbach [MVP]

Hi Kueishiong!
System::Text::Encoding *e = System::Text::Encoding::GetEncoding("big5");
System::Byte big5 __gc[] = e->GetBytes(S"123æ°´æ³¥");
char *szBig5 = new char[big5->get_Count()+1];
System::Byte __pin *b = &big5[0];
strncpy(szBig5, (char*) b, big5->get_Count());
szBig5[big5->get_Count()] = 0;


you copy the content of Byte array pointed at by b to a char array szBig5.
However what I need is to copy the content of a String to a char array.
(said String *b = S"123æ°´æ³¥" to szBig5)

Maybe we are talking about different things...

I though you wanted a char-array in big5-encoding? Isn´t this what you
wanted???

And excactly this does my example...
It converts a "string" into an char-array which is encoded in "big5".


--
Greetings
Jochen

My blog about Win32 and .NET
http://blog.kalmbachnet.de/
 
G

Guest

Hi Jochen!

What I want is to copy the content of a String

(
as the source is read from a text file using the following StreamReader
sr->ReadLine() call and stored in the String class *line
FileStream* fs = new FileStream(path, FileMode::Open);
StreamReader* sr = new StreamReader(fs, Encoding::GetEncoding("big5"));
String *line = sr->ReadLine();
)

to a char array (said buffer declared as char buffer[200]), i.e.

move the contents in *line to buffer[].


Jochen Kalmbach said:
Hi Kueishiong!
System::Text::Encoding *e = System::Text::Encoding::GetEncoding("big5");
System::Byte big5 __gc[] = e->GetBytes(S"123æ°´æ³¥");
char *szBig5 = new char[big5->get_Count()+1];
System::Byte __pin *b = &big5[0];
strncpy(szBig5, (char*) b, big5->get_Count());
szBig5[big5->get_Count()] = 0;


you copy the content of Byte array pointed at by b to a char array szBig5.
However what I need is to copy the content of a String to a char array.
(said String *b = S"123æ°´æ³¥" to szBig5)

Maybe we are talking about different things...

I though you wanted a char-array in big5-encoding? Isn´t this what you
wanted???

And excactly this does my example...
It converts a "string" into an char-array which is encoded in "big5".


--
Greetings
Jochen

My blog about Win32 and .NET
http://blog.kalmbachnet.de/
 
J

Jochen Kalmbach [MVP]

Hi Kueishiong!
What I want is to copy the content of a String

(
as the source is read from a text file using the following StreamReader
sr->ReadLine() call and stored in the String class *line
FileStream* fs = new FileStream(path, FileMode::Open);
StreamReader* sr = new StreamReader(fs, Encoding::GetEncoding("big5"));
String *line = sr->ReadLine();
)

to a char array (said buffer declared as char buffer[200]), i.e.

What is "char" ? 8-bit?
move the contents in *line to buffer[].

There is no difference between buffer[] and *buffer


System::String *line = S"123æ°´æ³¥";

System::Text::Encoding *e = System::Text::Encoding::GetEncoding("big5");
System::Byte big5 __gc[] = e->GetBytes(line);
char *buffer[ = new char[big5->get_Count()+1];
System::Byte __pin *b = &big5[0];
strncpy(buffer[, (char*) b, big5->get_Count());
buffer[big5->get_Count()] = 0;

// now the buffer contains the char-array encoded in "big5"
// after you have used the buffer, you need to destroy it...

delete [] buffer;


(and this was exactly my 1st reply...)

--
Greetings
Jochen

My blog about Win32 and .NET
http://blog.kalmbachnet.de/
 
G

Guest

Dear Jochen:
System::Byte big5 __gc[] = e->GetBytes(line);

It is the above line that converts from a String to a Byte array that I want.
I put that in, and the whole program works fine. Thank you very much for help.

Kueishiong Tu
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top