File Name invalid Characters

S

shapper

Hello,

On a web project I am returning a file for download as follows:

return File(document.File, "application/pdf", "Bons Pão.pdf");

I get the following error:
An invalid character was found in the mail header.

If I change the file name to "Bons.pdf" it works fine.

So I think some characteres are not allowed for the file name ...

However the file name it will be taken from the Document Title and I
will always get such characters.

Is there a way to maybe convert ã, à, Á, etc to a or A; õ, ò, Ó, etc
to o and O; ... ?

I think there might be a way to get a valid string for the filename
somehow.

Thanks,
Miguel
 
J

Jeff Johnson

On a web project I am returning a file for download as follows:
return File(document.File, "application/pdf", "Bons Pão.pdf");

Are you missing a "new" in that statement? Or is File() a method?
I get the following error:
An invalid character was found in the mail header.
If I change the file name to "Bons.pdf" it works fine.
So I think some characteres are not allowed for the file name ...

That's correct, or at least I assume so since your error mentions "mail
header." Standard RFC822 mail headers (and MIME headers, which conform to
RFC822) can only use characters from the 127-byte ASCII range.
However the file name it will be taken from the Document Title and I
will always get such characters.
Is there a way to maybe convert ã, à, Á, etc to a or A; õ, ò, Ó, etc
to o and O; ... ?
I think there might be a way to get a valid string for the filename
somehow.

This is probably a total hack and I haven't tested it thoroughly, but try
this:

private string ConvertToAscii(string source)
{
StringBuilder result = new StringBuilder();
string intermediate = source.Normalize(NormalizationForm.FormKD);

foreach (char c in intermediate)
{
if ((int)c < 128)
{
result.Append(c);
}
}

return result.ToString();
}
 
S

shapper

I also found this interesting article:
http://blogs.msdn.com/michkap/archive/2007/05/14/2629747.aspx

Which contains the following code:

static string RemoveDiacritics(string stIn) {
string stFormD = stIn.Normalize(NormalizationForm.FormD);
StringBuilder sb = new StringBuilder();

for(int ich = 0; ich < stFormD.Length; ich++) {
UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD
[ich]);
if(uc != UnicodeCategory.NonSpacingMark) {
sb.Append(stFormD[ich]);
}
}

return(sb.ToString().Normalize(NormalizationForm.FormC));
}

What do you think?

I tested it and it works just as expected and solves my problem.

Thank You,
Miguel
 
M

Mihai N.

Is there a way to maybe convert ã, à, Á, etc to a or A; õ, ò, Ó, etc
to o and O; ... ?

Removing the accented characters is definitely the wrong thing to do.
Even if you find an article from Michael Kaplan on *how* to do it,
you still have to know *when* do do it.

In this case, you should not do this.
Just convert the string to bytes using UTF-8, then percent-escape the bytes.
(ok, that's the short story, the long one is in RFC 2047, for instance)
 
S

shapper

In this case, you should not do this. Why?

Just convert the string to bytes using UTF-8, Ok

then percent-escape the bytes.
How to do it?
Will I not end with a lot of % on the file name?
(ok, that's the short story, the long one is in RFC 2047, for instance)

Yes, it is true. I have this problem because there is a problem with
ASP.NET MVC 1.0.
I found this on Preview 4 I think but they confirmed the problem still
exist and they are planning a way to solve it in a next version.

I tried a few approaches but I wasn't able to solve the problem.
So I opted to make this character replacement for now.

Thanks,
Miguel
 
M

Mihai N.

In this case, you should not do this.

Because you are loosing information.
In some languages the accents make for a different letter,
not just "decorations"
How would you feel about someone implementing somethign to
remove "decorations" from R to make it a P and Q to make it a O?

And because there is a standard (a RFC) that allows tells you
how to do things right.
How to do it?
Will I not end with a lot of % on the file name?
Yes, you will. So what? Any email client knows how to decode that
show it in a user-friendly manner (basically restoring the original name).
Use google to search for something with accents, and check the URL
(after you click search). That is exactly what you see.

Is not that hard. Just use HttpUtility.UrlEncode.
 
S

shapper

Yes, you will. So what? Any email client knows how to decode that
show it in a user-friendly manner (basically restoring the original name)..

I did just that:

Byte[] filename = System.Text.Encoding.UTF8.GetBytes
(String.Format("{0} (Web Site Name).pdf",document.Title));
String name = HttpUtility.UrlEncode(filename);
return File(document.File, "application/pdf", name);

When downloading with Firefox or IE I end up with a file which name
contains a lot of % and +.

I think for a user, between that and the name with no ~ ´, it will be
easier to understand than with all the % and +.

Or maybe I am missing something?
 
J

Jeff Johnson

Yes, you will. So what? Any email client knows how to decode that
show it in a user-friendly manner (basically restoring the original name).

I was pretty sure the % thing was NOT going to work. What you really need to
do is use the RFC2047 method of escaping values like Mihai originally
mentioned.
 
M

Mihai N.

Byte[] filename = System.Text.Encoding.UTF8.GetBytes
(String.Format("{0} (Web Site Name).pdf",document.Title));
String name = HttpUtility.UrlEncode(filename);
return File(document.File, "application/pdf", name);

You can HttpUtility.UrlEncode with a code page directly.
When downloading with Firefox or IE I end up with a file which name
contains a lot of % and +.

But sorry, forget the % encoding, you should use base64, or quoted encoding
(rfc 2047).

But you have to declare what you use, with the proper MIME declarations.

Mime sections, with Content-Type, Content-Transfer-Encoding declarations,
etc.
Content-Disposition: attachment; filename="=?utf-8?B?..............==?="
(base 64 in this case)

See RFC 2183 for Content-Disposition.
See RFC 2047 for declaring encoding.
Read this http://www.mihai-nita.net/article.php?artID=20060806a to understand
what you are doing.
Use System.Convert.ToBase64String to get the encoded string.
Send somethign with Outlook, then look at the raw message to see what it
sends (I just did that with a file name containing a mixture of Japanes,
Arabic, Hindi, Russian, and Romanian, all well)

Or look for a 3rd party library (might even find some some free stuff
on codeproject.com or sourceforge.net)

Or take the easy way out and cripple you application to send just ascii,
like 20 years ago.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top