Detect non-standard characters in string

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Hi

I have a project to take a MS Word doc and reformat the text into text files
that are
built into my App.

The only issue I have is some time there are some characters in MS Word that
are not printable when viewed in Notepad. I usually catch by looking at the
text in my App. Usually the problem is
an extra long hyphen --
a dagger +

Usually when I debug the string I see a squareblock in the string

Is there someway to trap the characters that will be not printable/viewable
in say notepad????

Thanks
 
I would just check against each numeric character value to see if the
character is outside the range of ASCII characters. Most likely, what is
happening is that the text is being placed on the clipboard as unicode, but
then when you try to paste it into notepad (which is using ASCII), it does
it's best by using the square character to indicate that it couldn't perform
a conversion.
 
I don't know how the OP has configured notepad or Word ; but notepad supports
Unicode.

The "square character" could be the glyph that is displayed for a Unicode
character not supported by the current font. Char.IsSymbol should still
catch it, at least in the case of dagger and em dash. I don't know what most
fonts are like for support of "printable" characters; but it does depend on
the font what is "printable/viewable".

--
Browse http://connect.microsoft.com/VisualStudio/feedback/ and vote.
http://www.peterRitchie.com/blog/
Microsoft MVP, Visual Developer - Visual C#


Nicholas Paldino said:
I would just check against each numeric character value to see if the
character is outside the range of ASCII characters. Most likely, what is
happening is that the text is being placed on the clipboard as unicode, but
then when you try to paste it into notepad (which is using ASCII), it does
it's best by using the square character to indicate that it couldn't perform
a conversion.

--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Peter Ritchie said:
You could probably use Char.IsSymbol() in this case
 
sippyuconn said:
Hi

I have a project to take a MS Word doc and reformat the text into text files
that are
built into my App.

The only issue I have is some time there are some characters in MS Word that
are not printable when viewed in Notepad. I usually catch by looking at the
text in my App. Usually the problem is
an extra long hyphen --
a dagger +

Usually when I debug the string I see a squareblock in the string

Is there someway to trap the characters that will be not printable/viewable
in say notepad????

You need to use an Encoding object obtained via the Encoding.GetEncoding
static method. This method allows you to specify the EncoderFallBack class
to use (this defaults to the EncoderReplacementFallback which simply
replaces un-encodable chars with ?).

By supplying the EncoderExceptionFallback object instead then when using the
Encoding to convert your content any out-of-band characters will cause an
EncoderFallbackException to be thrown.

The EncoderFallbackException has properties that you can use to discover
what character caused the problem and where it is.
 
I agree with Anthony here.

Some more references:

#I'm not a Klingon : Best Fit in WideCharToMultiByte and
System.Text.Encoding Should be Avoided
http://blogs.msdn.com/shawnste/archive/2006/01/19/515047.aspx

#Fallback Encoding Application Sample
http://msdn2.microsoft.com/en-us/library/tt6z1500(VS.80).aspx

Hope this helps.


Regards,
Walter Wang ([email protected], remove 'online.')
Microsoft Online Community Support

==================================================
When responding to posts, please "Reply to Group" via your newsreader so
that others may learn and benefit from your issue.
==================================================

This posting is provided "AS IS" with no warranties, and confers no rights.
 
Back
Top