decoding =E5, =F8

  • Thread starter Thread starter Peter K
  • Start date Start date
P

Peter K

Hi

in the processing of some text files, I have found I have strings like:

f=E5t
pr=F8ve

where the strings "=E5" and "=F8" are danish characters "å" and "ø". I can
work this out myself, but how can my program know - or at least what
"decoding" do I need to do to get the correct characters in my string?

Thanks,
Peter
 
Peter K said:
in the processing of some text files, I have found I have strings like:

f=E5t
pr=F8ve

where the strings "=E5" and "=F8" are danish characters "å" and "ø". I can
work this out myself, but how can my program know - or at least what
"decoding" do I need to do to get the correct characters in my string?

It's not entirely clear what you mean. Does the text file contain
"=E5" but it *should* contain a Danish character, or were you doing
some replacement for us?

What's the source of the text file?
 
It's not entirely clear what you mean. Does the text file contain
"=E5" but it *should* contain a Danish character, or were you doing
some replacement for us?

What's the source of the text file?

The text files are actually xml (if that makes a difference).

And in actual fact the xml nodes I am interested in contain a "base64
encoded" string. So I extract this string, which is a long piece of text
starting:
"DQo8YnI+PGZvbnQgc2l6ZT0xIGNvbG9yPXdoaXRlIGZhY2U9IlZlcmRhbmEL".

I un-encode this base64 string, and get another text string which is in
"html" format. (With all sorts of html tags in it).

Some of this html contains the 3 symbols
= E 5
for example. (No spaces between the symbols).

They represent obviously the Danish letter 'å' (obviously because a
danish speaker can see it from the word it appears in).

So for example, the text file may contain strings like:

"f=E5et"

(I don't know how you see this string in your news reader - but I see it
as 6 characters: f = E 5 e t).


As a part of processing, my program needs to convert these strings (=E5)
to real text (å).

But I don't know what this encoding is, or how to decode it.

Thanks,
Peter
 
* Allan Ebdrup wrote, On 15-6-2007 16:01:
Looks like encoding for SMTP, is it a string from an email?
You can see how this encoding is done here:
http://www.lesnikowski.com/mail/Rfc/rfc2047.txt
I don't know if there is a freeware component that can decode this kind of
thing.

Kind Regards,
Allan Ebdrup

Indeed it looks like quoted printable encoding of text messages sent
through mail or otherwise. Maybe the System.Net.Mail namespace can help
you out here. But I'd have to dig just as far as you would.

Jesse
 
* Allan Ebdrup wrote, On 15-6-2007 16:01:

Indeed it looks like quoted printable encoding of text messages sent
through mail or otherwise. Maybe the System.Net.Mail namespace can
help you out here. But I'd have to dig just as far as you would.

Ah - this could be right, because some of the other texts I get are not
encoded at all and come with all sorts of "multipart" metadata like
content-type and content-transfer-encoding. And some of them say "quoted-
printable".

The base64-encoded html I am currently having trouble with has no such
encoding information however.

The xml files I am processing are not containing emails as far as I know,
but all sorts of data which has been extracted from a "Domino-Notes"
database and transformed to xml. Horrible xml if you ask me.

Thanks for the help/pointers.
Peter
 
Peter said:
Ah - this could be right, because some of the other texts I get are not
encoded at all and come with all sorts of "multipart" metadata like
content-type and content-transfer-encoding. And some of them say "quoted-
printable".

The base64-encoded html I am currently having trouble with has no such
encoding information however.

The xml files I am processing are not containing emails as far as I know,
but all sorts of data which has been extracted from a "Domino-Notes"
database and transformed to xml. Horrible xml if you ask me.

A hack and a real Quoted Printable decode (note that real quoted
printable sometimes skips newlines as well - you will need to add that
if needed):

public static string FromQP1(string s)
{
return s.Replace("=E6", "æ").Replace("=F8", "ø").Replace("=E5",
"å").Replace("=C6", "Æ").Replace("=D8", "Ø").Replace("=C5", "Å");
}
public static string FromQP2(string s)
{
StringBuilder sb = new StringBuilder("");
int ix = 0;
while(ix < s.Length)
{
if(s[ix] == '=')
{
sb.Append((char)int.Parse(s.Substring(ix + 1, 2),
NumberStyles.HexNumber));
ix += 3;
}
else
{
sb.Append(s[ix]);
ix++;
}
}
return sb.ToString();
}


Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Back
Top