Querystring encoding utf-8

G

Guest

Hi,

Our sites need to support utf-8 so web.config has both request and response
encoding set to utf-8. The problem is that Request.Querystring doesn't
support this and we've tried to find a solution in the newsgroups but
couldn't find one. The suggestions invole HttpUtility functions but
Request.Querystring has already tried to decode the querystring. Utf-8 is the
detault encoding in ASP.NET so it's strange that a url such as

www.mysite.com/default.aspx?q=sök

doesn't work. Request.Querystring("q") returns "sk" in this case.
Request.RawUrl returns the correct Url. Is there any way we can use
Request.Querystring for url parameters and UTF-8 encoding enabled? We're
passing the querystring using Javascript and have tried to escape it before
passing it in but Request.Querystring simplycuts away the international
characters

Thanks,
Manso
 
J

Joerg Jooss

Brock said:
The spec for URLs mandates 8 bit character encoding, IIRC, so no
unicode. Sorry.

Not really. It just mandates certain characters, including all non
US-ASCII characters, to be URL-encoded, but it doesn't say anything
about the underlying character encoding. Anything is possible...

Cheers,
 
J

Joerg Jooss

Manso said:
Hi,

Our sites need to support utf-8 so web.config has both request and
response encoding set to utf-8. The problem is that
Request.Querystring doesn't support this and we've tried to find a
solution in the newsgroups but couldn't find one. The suggestions
invole HttpUtility functions but Request.Querystring has already
tried to decode the querystring. Utf-8 is the detault encoding in
ASP.NET so it's strange that a url such as

www.mysite.com/default.aspx?q=sök

doesn't work.

That's not a valid URL. That should be
http://www.mysite/default.aspx?q=sök

Request.Querystring("q") returns "sk" in this case.
Request.RawUrl returns the correct Url. Is there any way we can use
Request.Querystring for url parameters and UTF-8 encoding enabled?
We're passing the querystring using Javascript and have tried to
escape it before passing it in but Request.Querystring simplycuts
away the international characters

You have to make sure that it is properly URL-encoded, although doing
this in JavaScript seems a bit scary to me.

Cheers,
 
L

Lars-Erik Aabech

Hi!

I haven't read the first post of this thread, but it seems to me like the
JavaScript escape(someString)/unescape(someString) functions and the ASP.NET
Server.UrlEncode(someString)/Server.UrlDecode(someString) methods could be
of use here. ;)

location.href="www.mysite.com/default.aspx?q=" + escape("sök"); // Should
produce a safe url, no?

string q = Server.UrlDecode(Request.QueryString["q"]); // Should be the
entire value..

Lars-Erik
 
J

Joerg Jooss

Lars-Erik Aabech said:
Hi!

I haven't read the first post of this thread, but it seems to me like
the JavaScript escape(someString)/unescape(someString) functions and
the ASP.NET Server.UrlEncode(someString)/Server.UrlDecode(someString)
methods could be of use here. ;)

location.href="www.mysite.com/default.aspx?q=" + escape("sök"); //
Should produce a safe url, no?

string q = Server.UrlDecode(Request.QueryString["q"]); // Should be
the entire value..

Well, it sure protects any special characters. But what character
encoding is to be used here? This may or may not be UTF-8.

From some quick tests I did the results are hardly encouraging. IE6
seems to confuse UTF-8 and ISO-8859-1, escape() and encodeURI() return
different results...

Cheers,
 
L

Lars-Erik Aabech

Again, I lost the original first post (weird behavior from OE here, might
have to fetch a whole lot of headers)

I don't think URI's use any other encoding than ANSI. Someone might arrest
me, but I'm referring to the characters between code 32 and 128 in the ASCII
set. The result from %xx might be dependent upon the localization of the
client/server processing the values.

Lars-Erik


Joerg Jooss said:
Lars-Erik Aabech said:
Hi!

I haven't read the first post of this thread, but it seems to me like
the JavaScript escape(someString)/unescape(someString) functions and
the ASP.NET Server.UrlEncode(someString)/Server.UrlDecode(someString)
methods could be of use here. ;)

location.href="www.mysite.com/default.aspx?q=" + escape("sök"); //
Should produce a safe url, no?

string q = Server.UrlDecode(Request.QueryString["q"]); // Should be
the entire value..

Well, it sure protects any special characters. But what character
encoding is to be used here? This may or may not be UTF-8.

From some quick tests I did the results are hardly encouraging. IE6
seems to confuse UTF-8 and ISO-8859-1, escape() and encodeURI() return
different results...

Cheers,
 
M

Mike Sharp

RFC2396 allows about 60 characters. The last four paragraphs of section 2.1
(URI and non-ASCII characters) mention the problem, finishing with: "It is
expected that a systematic treatment of character encoding within URI will
be developed as a future modification of this specification."

Back in 1996, François Yergeau proposed the following process:

1. The first step is to convert the character string into UTF-8.

2. Then each octet in that sequence that is outside the approx. 60
allowable characters is encoded using the ISO Latin 1 equivalent hex codes.

A consequence of this approach means a Korean ideograph, for example, might
be encoded into 3 octets in UTF-8, and then each octet is further encoded
using the %HH form into three more octets, making a total of 9 octets
required to represent the single original character.

I think the goal here is to prevent the %HH codes from being impacted by
localization issues...but in any case, I think a number of groups have
"adopted" this general process.

Regards,
Mike Sharp

Lars-Erik Aabech said:
Again, I lost the original first post (weird behavior from OE here, might
have to fetch a whole lot of headers)

I don't think URI's use any other encoding than ANSI. Someone might arrest
me, but I'm referring to the characters between code 32 and 128 in the ASCII
set. The result from %xx might be dependent upon the localization of the
client/server processing the values.

Lars-Erik


Joerg Jooss said:
Lars-Erik Aabech said:
Hi!

I haven't read the first post of this thread, but it seems to me like
the JavaScript escape(someString)/unescape(someString) functions and
the ASP.NET Server.UrlEncode(someString)/Server.UrlDecode(someString)
methods could be of use here. ;)

location.href="www.mysite.com/default.aspx?q=" + escape("sök"); //
Should produce a safe url, no?

string q = Server.UrlDecode(Request.QueryString["q"]); // Should be
the entire value..

Well, it sure protects any special characters. But what character
encoding is to be used here? This may or may not be UTF-8.

From some quick tests I did the results are hardly encouraging. IE6
seems to confuse UTF-8 and ISO-8859-1, escape() and encodeURI() return
different results...

Cheers,
 
J

Joerg Jooss

Lars-Erik Aabech said:
Again, I lost the original first post (weird behavior from OE here,
might have to fetch a whole lot of headers)

I don't think URI's use any other encoding than ANSI.

ANSI? Actually, it's a subset of ASCII.

But again, you need to define what character encoding (i.e. what byte
sequence) represents special characters. This byte sequence will end up
in the URL as %xx sequence.

Cheers,
 
L

Lars-Erik Aabech

I'll keep my mouth shut about encoding ;)

L-E

Joerg Jooss said:
ANSI? Actually, it's a subset of ASCII.

But again, you need to define what character encoding (i.e. what byte
sequence) represents special characters. This byte sequence will end up
in the URL as %xx sequence.

Cheers,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top