string.Trim() and White spaces list?

A

adi

Hi

I'm working on a documentation on my application.
I need to explain the reader that the white spaces will be removed from
a text.
I use string.Trim() method. Note: no arguments passed to the method.
It is not enough to tell this to an untrained person; I need to tell
him the complete list of white spaces, like:
1. space: ' '
2. tab: '\t'

My knowledge of what "whitespace" means stops here: space character and
tab character. What else?
May I dynamically query the framework the complete list of whitespaces?
I'm only able to test a particular character if it's a whitespace or
not (using char.IsWhiteSpace(...))

Thanks.
 
A

adi

Thanks Morten

The list is very useful.
Now, for the second part of my question: is there a possibility to get
this list in runtime?
Note: I'm (still) using the 1.1 version of the framework, but solutions
for later versions are welcome.

Thanks.
Adi.


Morten Wennevik a scris:
 
M

Morten Wennevik

The list is the same for any .Net 1.0, 1.1 or 2.0 or possibly above too.

As for getting this list at runtime I don't see how you can do that other
than testing for Char.IsWhiteSpace for a whole range of numbers, which may
take some time to compute. I did a few tests and I ended up with a list
with far more characters than listed under String.Trim when using
Char.IsWhiteSpace.

Why do you need this list programmatically anyway?

Thanks Morten

The list is very useful.
Now, for the second part of my question: is there a possibility to get
this list in runtime?
Note: I'm (still) using the 1.1 version of the framework, but solutions
for later versions are welcome.

Thanks.
Adi.


Morten Wennevik a scris:
 
M

Morten Wennevik

Actually, you can't use IsWhiteSpace to determine which caracter is
trimmed or not as there are whitespace characters that are not trimmed.
Furthermore, there are characters that are trimmed but still not listed in
the documentation.

In the end, to get the proper list you may need to try to trim every
single character to determine if it will be trimmed with String.Trim()

The code below will display which characters are considered whitespace and
which will be trimmed.

StringBuilder sb = new StringBuilder();
for (int i = 0; i < 65535; i++)
{
char c = (char)i;
string s = c.ToString();

if (char.IsWhiteSpace(c) || s.Trim().Length == 0)
{
sb.Append(i.ToString("X").PadLeft(4, '0'));
if (char.IsWhiteSpace(c))
sb.Append("\tWhiteSpace");
else
sb.Append("\t\t");
if (s.Trim().Length == 0)
sb.Append("\tTrimmed");
sb.AppendLine(); // use sb.Append("\r\n"); for .Net1.1
}
}
MessageBox.Show(sb.ToString());

Compared to the documentatet list this indicates that U+0085, U+1680,
U+2028, U+2029 will also be trimmed, despite not being listed, while
whitespace characters U+180E, U+202F, U+205F will not be trimmed.
Characters U+200B and U+FEFF is not considered whitespace characters but
will be trimmed anyway.

Upon even further research, in .Net 1.1 the list is correct and only
documented characters will be trimmed, but the documentations have not
been updated for .Net 2.0



The list is the same for any .Net 1.0, 1.1 or 2.0 or possibly above too.

As for getting this list at runtime I don't see how you can do that
other than testing for Char.IsWhiteSpace for a whole range of numbers,
which may take some time to compute. I did a few tests and I ended up
with a list with far more characters than listed under String.Trim when
using Char.IsWhiteSpace.

Why do you need this list programmatically anyway?
 
A

adi

Many thanks


Morten Wennevik a scris:
Actually, you can't use IsWhiteSpace to determine which caracter is
trimmed or not as there are whitespace characters that are not trimmed.
Furthermore, there are characters that are trimmed but still not listed in
the documentation.

In the end, to get the proper list you may need to try to trim every
single character to determine if it will be trimmed with String.Trim()

The code below will display which characters are considered whitespace and
which will be trimmed.

StringBuilder sb = new StringBuilder();
for (int i = 0; i < 65535; i++)
{
char c = (char)i;
string s = c.ToString();

if (char.IsWhiteSpace(c) || s.Trim().Length == 0)
{
sb.Append(i.ToString("X").PadLeft(4, '0'));
if (char.IsWhiteSpace(c))
sb.Append("\tWhiteSpace");
else
sb.Append("\t\t");
if (s.Trim().Length == 0)
sb.Append("\tTrimmed");
sb.AppendLine(); // use sb.Append("\r\n"); for .Net 1.1
}
}
MessageBox.Show(sb.ToString());

Compared to the documentatet list this indicates that U+0085, U+1680,
U+2028, U+2029 will also be trimmed, despite not being listed, while
whitespace characters U+180E, U+202F, U+205F will not be trimmed.
Characters U+200B and U+FEFF is not considered whitespace characters but
will be trimmed anyway.

Upon even further research, in .Net 1.1 the list is correct and only
documented characters will be trimmed, but the documentations have not
been updated for .Net 2.0
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top