ICSharpCode.SharpZipLib.GZip.GZipInputStream Use with HTTP Request

M

mwieder

I am using the .NET socket class to program a web client. When I come
upon gzip encoded conent, I am having trouble getting the decoded text.
I have found the ICSharpCode.SharpZipLib to help, but in using their
sample code, I keep encountering an error message about the first byte
not matching. Clearly it is expecting a certain byte value as the
first byte, which leads me to think that I need to split the encoded
body away from the unencoded header, and send only the encoded part to
the GZIPInputStream. While I know that division is determined by a
blank line, when I split it over that line (whether I include the
carriage return in the encoded part or not) I still get the same error.
Has anyone been able to use that library for decoding a webpage that
has a gzipped body?
thanks!
 
J

Jon Skeet [C# MVP]

I am using the .NET socket class to program a web client. When I come
upon gzip encoded conent, I am having trouble getting the decoded text.
I have found the ICSharpCode.SharpZipLib to help, but in using their
sample code, I keep encountering an error message about the first byte
not matching. Clearly it is expecting a certain byte value as the
first byte, which leads me to think that I need to split the encoded
body away from the unencoded header, and send only the encoded part to
the GZIPInputStream. While I know that division is determined by a
blank line, when I split it over that line (whether I include the
carriage return in the encoded part or not) I still get the same error.
Has anyone been able to use that library for decoding a webpage that
has a gzipped body?

I suggest you split the problem into two halves. You need to check
that:

1) You can zip something up and then unzip it
2) You can transfer binary data without loss in a web request

I suspect your problem is in the second part, but you should be able to
work on that entirely separately from the first part.
 
M

mwieder

I'm not sure how your post addresses my issue - are you suggesting that
I don't need to break the web page into header/body in order to use the
gzip library and that my problem is elsewhere?
 
J

Jon Skeet [C# MVP]

I'm not sure how your post addresses my issue - are you suggesting that
I don't need to break the web page into header/body in order to use the
gzip library and that my problem is elsewhere?

I'm suggesting that you should try to make sure you can successfully
transmit/receive arbitrary binary data to work out whether that's the
problem or whether it's the gzipping that's the problem.
 
M

mwieder

I'm getting the data back - it's just gobbeldy gook when I try and use
Encoding.UTF8 (which is what is in the header) to turn it into a
string. Non- gzipped requests come back fine. I have narrowed the
issue to what I originally asked.
thanks!
 
J

Jon Skeet [C# MVP]

I'm getting the data back - it's just gobbeldy gook when I try and use
Encoding.UTF8 (which is what is in the header) to turn it into a
string. Non- gzipped requests come back fine.

Of arbitrary binary data? Sorry, it's not entirely clear what's going
on here - probably my fault. Could you give more information about the
server, exactly what's doing the gzipping etc?
I have narrowed the issue to what I originally asked.

So you're getting back exactly the same data as you sent? In that case,
you should be able to ignore the web part, and concentrate on just the
gzip part; you should be unable to unzip some data directly after
zipping it.
 
M

mwieder

I'm making an http request to a web page - not my web server. I don't
know what's being sent, but what I'm getting backed has been encoded as
gzip - I need to decode it.
 
J

Jon Skeet [C# MVP]

I'm making an http request to a web page - not my web server. I don't
know what's being sent, but what I'm getting backed has been encoded as
gzip - I need to decode it.

Okay - save the content you get back (as a binary file) and then try to
make the same request using the .NET HTTP classes (even if you won't be
able to do that in your production code). Save that as well, then
compare the two files.

I suspect that you're somehow misinterpreting the data which is coming
back from the web server.

Another line of approach: can you persuade the web server to give you
back a binary file (uncompressed) whose contents you know exactly? An
image would be a good start. Again, download it with your class and
save the contents, then compare with the correct file.

Jon
 
M

mwieder

I get gobledy gook back from the socket class and the correct decoded
data in the http class. Httpwebrequest handles this already; my
question has been how to implement this with the sockets class. I
don't want to get into a discussion of why to use or not use the
sockets class, I just need help implementing this with the sockets
class.
thanks.
 
J

Jon Skeet [C# MVP]

I get gobledy gook back from the socket class and the correct decoded
data in the http class. Httpwebrequest handles this already; my
question has been how to implement this with the sockets class. I
don't want to get into a discussion of why to use or not use the
sockets class, I just need help implementing this with the sockets
class.

Have you tried my other suggestion? I'm sure the problem is that you're
not getting the binary data back correctly - try downloading an image
(not gzipped) and see whether you can correctly save it in your code.
 
J

Jon Skeet [C# MVP]

yes I can - I can also get non-gzipped pages back correctly.

So if you save the data back without gunzipping it yourself, then try
to run a "standard" gunzip tool, does that give you back the data
correctly?

Could you post the code you're using to gunzip?
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top