Base64 question

J

Jim Brandley

I need to append a short ciphertext string as a query variable encoded so
it's valid for a URL. After encryption, I convert the bytes to Base64.
However, the result includes characters that are invalid for a URL, notably
'+' symbols. So, I have to cycle the output string through
HttpUtility.UrlEncode(). That takes time. I wrote my own URL-safe Base64
converter in C#, that's about as lean as I can make it. It is much slower
(about 6 times) than the the one provided. However, it runs in about 70% of
the time required to use the standard Base64 converter followed by a trip
through UrlEncode().

I am using .Net 2.0, and I have not found a way to coerce the built in
Base64 converter to use a character set that could avoid the trip through
UrlEncode. Am I missing anything? If not, is there any way to add this
capability to a future release?

Thanks,

Jim Brandley
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Jim said:
I need to append a short ciphertext string as a query variable encoded so
it's valid for a URL. After encryption, I convert the bytes to Base64.
However, the result includes characters that are invalid for a URL, notably
'+' symbols. So, I have to cycle the output string through
HttpUtility.UrlEncode(). That takes time. I wrote my own URL-safe Base64
converter in C#, that's about as lean as I can make it. It is much slower
(about 6 times) than the the one provided. However, it runs in about 70% of
the time required to use the standard Base64 converter followed by a trip
through UrlEncode().

I am using .Net 2.0, and I have not found a way to coerce the built in
Base64 converter to use a character set that could avoid the trip through
UrlEncode. Am I missing anything? If not, is there any way to add this
capability to a future release?

I find it difficult to believe that URL encoding could have a
noticeable impact on total performance.

Arne
 
J

Jim Brandley

According to the Stopwatch class it did.

Arne Vajhøj said:
I find it difficult to believe that URL encoding could have a
noticeable impact on total performance.

Arne
 
J

Jim Brandley

Thanks for the response. I already have code in place to convert byte arrays
to hex char arrays. It's fast too. The problem is that it increases the
length of the ciphertext by 50%, and increases the risk of exceeding the
length legal for URLs.
 
J

Jim Brandley

Peter - I did not notice your name when I responded to your previous post. I
have read many of your articles. I like the way you write, and I appreciate
your contribution to the knowledgebase available on the web.

Jim
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Jim said:
I need to append a short ciphertext string as a query variable encoded so
it's valid for a URL. After encryption, I convert the bytes to Base64.
However, the result includes characters that are invalid for a URL, notably
'+' symbols. So, I have to cycle the output string through
HttpUtility.UrlEncode(). That takes time. I wrote my own URL-safe Base64
converter in C#, that's about as lean as I can make it. It is much slower
(about 6 times) than the the one provided. However, it runs in about 70% of
the time required to use the standard Base64 converter followed by a trip
through UrlEncode().

I believe that + is the only non URL valid character in base64 output.

Why not a simple String Replace ?
I am using .Net 2.0, and I have not found a way to coerce the built in
Base64 converter to use a character set that could avoid the trip through
UrlEncode. Am I missing anything? If not, is there any way to add this
capability to a future release?

Base64 is a standard. It is not common to allow mocking with a standard.

Arne
 
J

Jim Brandley

I'll try that and see what it costs. I was hoping to avoid another iteration
through the characters in the string.
 
J

Jim Brandley

It is a new (to 2.0) class in System.Diagnostics. It is easy to use and is
useful when comparing the performance of different approaches to solving a
problem.
 
J

Jim Brandley

Arne - That was faster - Thanks for the idea. However, Base64 is also
sending out the slash '/' character - that means a second pass with
string.Replace().

BTW - I agree that altering something that complies with a standard is a bad
thing to do. I was on an ANSI committee years ago, and I know why they are
built the way they are. However, supplementing that method with an optimized
conversion is not a bad thing to do. Maybe call it UrlSafeBase64. The name
would convey the reason for the existance of the method, along with a pretty
good idea of what the output might be. Just a thought.

Jim
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Jim said:
It is a new (to 2.0) class in System.Diagnostics. It is easy to use and is
useful when comparing the performance of different approaches to solving a
problem.

Is it ?

Try think about this.

You can do about 1 million conversions to base64 of
a small string in 1 second

=>

If your web server is CPU bound at about 1000 requests/second,
then the base64 conversion is using 0.1% of your CPU and something
else is chewing the other 99.9%.

Arne
 
J

Jim Brandley

I did not mean to imply this was a bottleneck. I strive to prevent the
creation of bottlenecks - easier to do that than track them down later. I'm
working on a very large (to me anyway - approx 2M lines of C#, not counting
aspx and ascx pages) web app for intranets. Pages are generated with maybe
2% static text and 98% dynamic, and can have 1500 to 1700 users at any given
time. It is primarily presenting and recording real-time information in
large manufacturing environments.

Responsiveness is a big deal for our customers. I spend all my time in the
business objects, data layer and writing SQL. I very seldom do anything with
screens, except present the information they need for binding. Any time I
write a bit of code that gets executed with any frequency, I try to find the
time to analyze it carefully and shave whatever I can.
 
J

Jon Skeet [C# MVP]

Arne Vajhøj said:
I believe that + is the only non URL valid character in base64 output.

Depending on the exact context, it can be handy to get rid of / and =
too. In some cases it's just + that needs to be replaced though, yes.
Why not a simple String Replace ?

Indeed... possibly with a check to see whether a replacement is needed
to start with.
Base64 is a standard. It is not common to allow mocking with a standard.

I think it's pretty common to adapt base64 to only include URL-safe
characters. Put it this way - it's common enough to have made it into
Wikipedia:

http://en.wikipedia.org/wiki/Base64#URL_Applications
 
J

Jim Brandley

Thanks. That's similar to what I have written. I'll see if I can get mine to
perform better. I was using a StringBuilder to accept the encoded
characters. I'll see if it performs better using a character array, and save
the string construction until it's complete.

Arne Vajhøj said:
I believe that + is the only non URL valid character in base64 output.

Depending on the exact context, it can be handy to get rid of / and =
too. In some cases it's just + that needs to be replaced though, yes.
Why not a simple String Replace ?

Indeed... possibly with a check to see whether a replacement is needed
to start with.
Base64 is a standard. It is not common to allow mocking with a standard.

I think it's pretty common to adapt base64 to only include URL-safe
characters. Put it this way - it's common enough to have made it into
Wikipedia:

http://en.wikipedia.org/wiki/Base64#URL_Applications
 
J

Jon Skeet [C# MVP]

Arne Vajhøj said:
Hmm.

People seem already to have forgotten the nightmare of
incompatible uuencode versions.

This isn't usually for communicating between two applications though -
it's to allow a stateless application to communicate effectively with
itself. In other words, you're in complete control of both "ends" of
the conversation, so can be compatible with yourself appropriately.
Base64 happens to be a pretty simple format for representing arbitrary
binary data, and it just needs a little tweak for the sake of URL
encoding.
 
G

Guest

Jim,
As Jon Skeet pointed out, modifying the Framework System.Convert classes may
be the way to go here. A quick decompilation of the System.Convert Base64
methods reveals that :
1) they use unsafe code, which probably accounts for the speed factor.
2) There is a char[] Base64Table used.

So, you could decompile this, create your own (say,
Convert.ToBase64StringUrlSafe) method, and all you would need to do is change
the values in the Base64table char[] array.
Peter
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Jim said:
BTW - I agree that altering something that complies with a standard is a bad
thing to do. I was on an ANSI committee years ago, and I know why they are
built the way they are. However, supplementing that method with an optimized
conversion is not a bad thing to do. Maybe call it UrlSafeBase64. The name
would convey the reason for the existance of the method, along with a pretty
good idea of what the output might be. Just a thought.

If you insist in pursuing the idea, then there are some code
attached below which is the fastest code I can write without
unsafe code.

Arne

==================================================='

public class Base64
{
private static char[] EncVals =
"ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/".ToCharArray();
private static int[] DecVals;
static Base64()
{
DecVals = new int[128];
for(int i = 0; i < 64; i++)
{
DecVals[EncVals] = i;
}
}
public string Encode(byte[] b)
{
int len = (b.Length * 8 + 5) / 6;
int extra = 3 - (len + 3) % 4;
char[] res = new char[len + extra];
int p = b.Length - b.Length % 3;
int ix = 0;
int tmp;
for(int i = 0; i < p; i += 3)
{
tmp = (b << 16) | (b[i + 1] << 8) | b[i + 2];
res[ix + 3] = EncVals[tmp & 0x3F];
res[ix + 2] = EncVals[(tmp >> 6) & 0x3F];
res[ix + 1] = EncVals[(tmp >> 12) & 0x3F];
res[ix] = EncVals[tmp >> 18];
ix += 4;
}
if(extra == 1)
{
tmp = (b[p] << 16) | (b[p + 1] << 8);
res[ix + 3] = '=';
res[ix + 2] = EncVals[(tmp >> 6) & 0x3F];
res[ix + 1] = EncVals[(tmp >> 12) & 0x3F];
res[ix] = EncVals[tmp >> 18];
}
else if(extra == 2)
{
tmp = b[p] << 16;
res[ix + 3] = '=';
res[ix + 2] = '=';
res[ix + 1] = EncVals[(tmp >> 12) & 0x3F];
res[ix] = EncVals[tmp >> 18];
}
return new String(res);
}
public byte[] Decode(string s)
{
int len = s.Length;
while(s[len - 1] == '=') len--;
len = (len / 4 + 2) * 3;
byte[] res = new byte[len];
int ix = 0;
int tmp;
for(int i = 0; i < s.Length; i += 4)
{
tmp = (DecVals[s] << 18) | (DecVals[s[i + 1]] << 12) |
(DecVals[s[i + 2]] << 6) | DecVals[s[i + 3]];
res[ix] = (byte)(tmp >> 16);
res[ix + 1] = (byte)((tmp >> 8) & 0xFF);
res[ix + 2] = (byte)(tmp & 0xFF);
ix += 3;
}
return res;
}
}
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Jim said:
I did not mean to imply this was a bottleneck. I strive to prevent the
creation of bottlenecks - easier to do that than track them down later. I'm
working on a very large (to me anyway - approx 2M lines of C#, not counting
aspx and ascx pages) web app for intranets. Pages are generated with maybe
2% static text and 98% dynamic, and can have 1500 to 1700 users at any given
time. It is primarily presenting and recording real-time information in
large manufacturing environments.

Responsiveness is a big deal for our customers. I spend all my time in the
business objects, data layer and writing SQL. I very seldom do anything with
screens, except present the information they need for binding. Any time I
write a bit of code that gets executed with any frequency, I try to find the
time to analyze it carefully and shave whatever I can.

I still don't think it is worth it.

You should write 95%-98% of your code with priority of easy maintenance
and then optimize the 2%-5% of your code that has been proven to impact
performance for speed.

Writing clever code that optimizes stuff that does not need to be
optimized does not reduce hardware costs but will increase maintenance
costs dramatically.

Simple code is usually better than clever code when we talk business.

I used to do a lot that type of micro optimizations in the 1980's. But
not any more.

I think you should use the framework methods and just consider the
optimized code an interesting academic exercise.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top