How to make string align by string.Format() in CJK?

D

Dancefire

Hi,

string.Format("{0,-20} : {1,-10}", a,b);

The above code will output string left aligned, 20 or 10 for string a
and b. It true for English, but not for any CJK language.

If a or b are CJK string, it will never aligned. The reason is that
for Chinese (or Japanese, Korean), each character will occupy 2
alphabeta space (not only 2 byte for MBCS in memory). For example:

"$BCfJ8(B" will have same visual length with "abcd", however,
string.Format() will treat "$BCfJ8(B" as 2 chars, and "abcd" as 4 chars, so
string.Format() will padding "$BCfJ8(B" for 18 space, and 16 space for
"abcd" if we aligned by 20. So if we see the result, the "$BCfJ8(B" will
longer than "abcd" since over padding.

This is the problem I am facing. Is it a bug in .Net framework? or is
there anything wrong with me, and I could overcome it be setting any
culture info?

Thanks.
 
P

Peter Duniho

[...]
This is the problem I am facing. Is it a bug in .Net framework? or is
there anything wrong with me, and I could overcome it be setting any
culture info?

I'm not sure I see the problem. That is, why would you expect the
String class to deal with this? Even using English-only, or even some
arbitrary alphabetic language, there's no guarantee that padding
strings will align text, except when you are using a mono-spaced font.

IMHO, the String class should be completely ignorant of fonts and
visual representations of text. It's all about the characters, and
while there are indeed simple formatting methods such as those you
present and the "Pad" methods, those are strictly character-based.

If you need more elaborate formatting then that, then you need to go
outside the String class, using fonts and graphical methods to actually
measure the size of a string and lay it out according to your needs.

So, no...I don't think this is a bug in .NET, and I don't think it's
something you're likely to be able to change with a culture setting.

I don't have any reason to believe there's anything wrong with you
either, for what it's worth. :)

Pete
 
G

Guest

Dancefire said:
Hi,

string.Format("{0,-20} : {1,-10}", a,b);

The above code will output string left aligned, 20 or 10 for string a
and b. It true for English, but not for any CJK language.

If a or b are CJK string, it will never aligned. The reason is that
for Chinese (or Japanese, Korean), each character will occupy 2
alphabeta space (not only 2 byte for MBCS in memory). For example:

"中文" will have same visual length with "abcd",

That is only true with the specific font that you are using. When I see
them here, they have the same visual length as "abc".
however,
string.Format() will treat "中文" as 2 chars, and "abcd" as 4 chars, so
string.Format() will padding "中文" for 18 space, and 16 space for
"abcd" if we aligned by 20. So if we see the result, the "中文" will
longer than "abcd" since over padding.

This is the problem I am facing. Is it a bug in .Net framework?

No, the string class is treating your characters completely correct. Two
characters are two characters, regardless of the size of the glyphs in
the font that is used to display the characters. The string class can
not know how and where the string might be displayed, so it can not (and
should not) take into account the display size of characters.

A string can not be formatted so that it will be aligned, unless you are
using a monospaced font so that all characters truly are the same width.
You have to align the strings when you are displaying them instead.
or is
there anything wrong with me, and I could overcome it be setting any
culture info?

If there really is something wrong with you (which I doubt), changing
the culture info will not fix that... ;)
 
D

Dancefire

Thanks Peter and Göran.

Göran Andersson wrote:
Dancefire wrote:

No, the string class is treating your characters completely correct. Two
characters are two characters, regardless of the size of the glyphs in
the font that is used to display the characters. The string class can
not know how and where the string might be displayed, so it can not (and
should not) take into account the display size of characters.

So, the alignment in string.Format() does mean the memory length
alignment for 2 strings, rather than the width alignment? However the
example and the description in MSDN Libaray for string.Format() told a
different story.


In [http://msdn2.microsoft.com/en-us/library/txafckwd.aspx], it said:

Alignment Component

The optional alignment component is a signed integer indicating the
preferred formatted *field width*. If the value of alignment is less
than the length of the formatted string, alignment is ignored and the
length of the formatted string is used as the field width. The
formatted data in the field is right-aligned if alignment is positive
and left-aligned if alignment is negative. If padding is necessary,
white space is used. The comma is required if alignment is specified.

A string can not be formatted so that it will be aligned, unless you are
using a monospaced font so that all characters truly are the same width.
You have to align the strings when you are displaying them instead.

I do use the monospace font in such case, however, for CJK fonts,
monospace have different space for alphabetic language character and
CJK character, each CJK character will occupy 2 alphabetic language
character space. The "monospace" of any CJK fonts means same space for
any alphabetic language characters and same space for any CJK
characters, but not same for both.

My program output the result to Console, I just hope the string can be
simply aligned by string.Format({0,-xx}), to make the output more
readable, however, it seems doesn't work for CJK string, although each
CJK character have same width (but not share the same width with
alphabetic character), especially there are alphabetic characters and
CJK characters in same string. It's general problem for any CJK
string, rather than special fonts or special locale. I can't use the
alignment of string.Format() if I want to output string to Console,
but other alphabetic language does able to do that. It does look like
a cultual-relatived problem.

Thanks
 
P

Peter Duniho

Thanks Peter and Göran.

Göran Andersson wrote:

So, the alignment in string.Format() does mean the memory length
alignment for 2 strings, rather than the width alignment?

No. First, String uses _character_ length to format, not memory.
Second, saying "rather than the width" doesn't make much sense, as
"width" has a variety of meanings. In this context, "width" does not
mean a literal pixel size, but rather the width of the string measured
in characters.
However the
example and the description in MSDN Libaray for string.Format() told a
different story.

No, you're just misreading it. "Field width" is referring to the width
of the field in characters. This is a fairly common and old
interpretation of "width", dating well back to when all computer output
was only based on characters. There's nothing in that documentation
that should be interpreted as suggesting that string formatting will
take into account the graphical representation of the string.
I do use the monospace font in such case, however, for CJK fonts,
monospace have different space for alphabetic language character and
CJK character, each CJK character will occupy 2 alphabetic language
character space.

Göran is telling you one necessary requirement for alignment. That
doesn't mean it's a sufficient requirement. A monospace font is
required in order to use only characters for aligning strings, but
simply using a monospace font does not guarantee you can use only
characters.
The "monospace" of any CJK fonts means same space for
any alphabetic language characters and same space for any CJK
characters, but not same for both.

Which means you will have to deal with the difference yourself. If you
want to mix alphabetic and non-alphabetic characters and have them
aligned, you will have to take into account the difference in on-screen
representation yourself.

If what you write is correct, and the non-alphabetic characters are
indeed always exactly twice as wide as an alphabetic character, then
this should be trivial. Just use half the character width when
formatting non-alphabetic characters as you use when formatting
alphabetic characters.

Pete
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top