Parsing Multipart formdata

C

Cuong.Tong

Greeting,
I am writing my own web server and having some problme parsing the the
mulitpart/form-data
stream that is sent from the browsers.



I have a form looks something like this

<form action="process.dll>
<input type=file name=fileupload> </input>
</form>

So when I choose the local file from the browser, and click submit it
will take me to the process.dll file.

The browser will send a post request to the server with the Headers
looks something like this

-------------Start REQUEST Headers--------------
Content-Length : 28624
Content-Type : multipart/form-data;
boundary=---------------------------3765104465873
Connection : keep-alive
Cookie : SESSION=cPnKc7PmT8wdsy+:ccPnKlJF1Af1d
Host : localhost:9000
Referer : http://localhost:80/ajaxupload.html
User-Agent : Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1)
Gecko/20061010 Firefox/2.0
url : /backend/fileupload/test
Accept-Language : en-us,en;q=0.5
Accept-Charset : ISO-8859-1,utf-8;q=0.7,*;q=0.7
Accept :
text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Encoding : gzip,deflate
Keep-Alive : 300
method : POST
-----------------------------3765104465873
Content-Disposition: form-data; name="filename"; filename="review
form.doc"
Content-Type: application/msword

Some binary contetn blah blah

-----------------------------3765104465873--

I can get my stream reader to read up to the application/msword, or in
another word begning of the binary stream, however I have no way to
know how many bytes to read in.. or the length of the binary content of
the current part.

Please note I have no access to ASP.NET library as i am using my own
webserver.

Any hints and/or comments are appreciated.






Regard,
 
S

ssamuel

The other part of "multipart" is MIME. If you Google various MIME
details, you can find lots more information.

Basically, it works like this:

1. Read the header, look for the "boundary" tag.

2. Read the string out of the boundary tag.

3. Keep reading the header, until you found the boundary, the string in
the boundary tag (in this case, it's
"---------------------------3765104465873" and it's normally going to
be a whole pile of hyphen characters followed by a number, just like
that.

4. Start reading the part header, save that for future reference.

5. Keep reading part header until you find a newline. What a newline
looks like depends on your system, and there doesn't seem to be a
standard. It'll be some collection of \n (newline) and \r (carriage
return).

6. Start reading data into a string buffer.

7. Stop reading into the string buffer when you see the boundary again.

8. Un-Base64-encode the contents of your string buffer. This should
give you an array of bytes. The array of bytes is your binary data. I
seem to remember there being a framework Base64 codec.

As you see, your data isn't really binary. It's Base64, which
constitutes a text (I believe ASCII, but I'm a little rusty on that)
representation of the binary data. Rip is out of the headers and decode
it to get your stream of bytes, then you can write them to disk or
whatever.

HTH. Please ask questions if any of those steps don't make sense to
you. I have done this many times, likely as have many others who read
this board. There are lots of little nuances that can make or break
your application.


Stephan
 
C

Cuong.Tong

Hello Stephen,

I can get my code to do up to step 5 of your algorithm, which is to
readin in the headers eg: filename and contentType. After this, there
will be a \r and \n which consititue of 2 bytes.

Now, after reading this \r \n which seperate the part header and part
data, my stream position is now at the BEGINING of the binary stream or
base64 encoded stream as you mentioned.

And here is where I want to clarify something:
6. Start reading data into a string buffer.

So I can just have a string object then have buffer.readline ? And for
each of the line i compare it if it contains the boundary?

Is my data integrity will be broken, eg file corruption if i read the
stream in as string, then base64decode it, then convert it to bytes
array.

If reading the stream in doesnt break the data integrity, I think it
should be good because I can just read the wholething in with
stream.ReadToEnd(); then use regular expression or string.split to
split the multiparts to different part.

Kind regard,
 
S

ssamuel

I can get my code to do up to step 5 of your algorithm, which is to
readin in the headers eg: filename and contentType. After this, there
will be a \r and \n which consititue of 2 bytes.

In general, be a little wary about line breaks. If you're designing a
system that will read only from one source, you're fine. If you're
reading from more than one source, you may not always get \r\n, and, if
you do, they may not be in that order.

That's a small concern, though.
So I can just have a string object then have buffer.readline ? And for
each of the line i compare it if it contains the boundary?

Yes, but you probably want to use a StringBuilder.
Is my data integrity will be broken, eg file corruption if i read the
stream in as string, then base64decode it, then convert it to bytes
array.

That'll work fine. Just make sure you're feeding only the
Base64-encoded data to the decoder or it'll choke.
If reading the stream in doesnt break the data integrity, I think it
should be good because I can just read the wholething in with
stream.ReadToEnd(); then use regular expression or string.split to
split the multiparts to different part.

You can do that.

In fact, you can ReadToEnd() from the beginning if you want and regex
or split into pieces. Since System.Convert doesn't have stream
processing methods anyway, you're going to end up putting everything
into a big string anyway.

You may be able to specify a single regex that'll take the whole
message and split out just the Base64 stuff, or find such a regex on
the Internet somewhere. If I wasn't behind on my project deadline, I'd
write you one myself as it seems like an interesting problem.


Stephan
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top