Clipboard Problems with HTML

J

James Hancock

I'm trying to intercept the clipboard so that I can fix Word's disasterous
HTML that comes through before pasting it into a text box. I just pull the
fragment information from the result that comes back when you ask for the
Text that's of type HTML.

Everything works just great, except when there are accents:
- è - è - ê - ë - à - â - ô - ö - î - ' - ? - ! - ;

If you paste that into word and then paste it into a text box and intercept
the clipboard I get all kinds of weird characters none of which are those
above. If I leave it alone of course it works just fine. It looks like the
Clipboard in .NET is not handling unicode right or something. I'm at a loss.
Anyone have any ideas on how I can get the clipboard of HTML code correctly?

Thanks!
James Hancock
 
T

ThunderMusic

Hi,
it's probably an encoding problem... As I see it, Word must encode it's
unicode caracters in an Ansi encoding (I can be wrong)... you get this
content and edit it with your .NET app which is by default (I can be wrong
again) in UTF8, which means that special characters do not have the same
values when they are encoded in ANSI or in UTF8, so they end up all messed
up in your result... Try to use this from a .NET app to the clipboard right
back to a .NET and the problem is most likely not occuring... So try to
determine which encoding Word is using and try to encode your things using
the same encoding using the System.Text.Encoding namespace...

I hope it helps

ThunderMusic
 
J

James Hancock

That was my thought too but I can't find any documentation at all about
this, and oddly most of those characters actually come out as jsut one
character in the clipboard if read from .NET...
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top