How can I read a file on disk into a memory stream?

C

chance

Hello,
I have a file on disk called TEMP.ZIP and I would like to somehow get
this into a memory stream so I can eventually do this:

row["DOCUMENT"] = dataStream.ToArray()

However, I am not sure of the correct way to get it into a memory
stream.

Any help appreciated.

tia,
chance.
 
N

Nicholas Paldino [.NET/C# MVP]

chance,

You don't need a memory stream to load the file into a byte array. All
you have to do is open the filestream, and then read the contents into an
array. Since this is a file stream, you know the length in advance. You
can do this:

// The bytes.
byte[] bytes = null;


// The filestream.
using (FileStream fs = new FileStream(...))
{
// Allocate the bytes.
bytes = new byte[fs.Length];

// The number of bytes remaining, and the location to read from.
int bytesToRead = bytes.Length;
int readFrom = 0;

// The bytes read.
int bytesRead = 0;

// Read while there are bytes to be read.
while (bytesToRead > 0)
{
// Read the data.
bytesRead = fs.Read(bytes, readFrom, bytesToRead);

// Decrement the remaining bytes by the number of bytes read.
bytesToRead = bytesToRead - bytesRead;

// Increment the next read operation.
readFrom = readFrom + bytesRead;
}
}

// Can use your array here.

I think you will find this much more efficient.

Hope this helps.
 
C

colin

Nicholas Paldino said:
chance,

You don't need a memory stream to load the file into a byte array. All
you have to do is open the filestream, and then read the contents into an
array. Since this is a file stream, you know the length in advance. You
can do this:

I also have similar problem, but I have an array of structures (with 2
int16,3 int32)
is there a way to read this directly from file into the struct array rather
than a byte[]?

at the moment im using a seperate byte array to read from the file,
then using the marshal class to do an unsafe copy from the byte to the
struct array.
this might be useful to the OP as well.

works fine but is there a direct way ? the amount of memory needed is quite
large.
I can get an unsafe ptr to the struct array ok, but the file needs a byte[]
is there a lower level unsafe read ?

I was looking at serialize but ive not managed to find much out about it
yet,
ive spent quite some time looking (I realy need to fix my help so its
online)

Its just to cache the csv file wich takes much longer to load.

Colin =^.^=
 
J

Jon Skeet [C# MVP]

colin said:
You don't need a memory stream to load the file into a byte array. All
you have to do is open the filestream, and then read the contents into an
array. Since this is a file stream, you know the length in advance. You
can do this:

I also have similar problem, but I have an array of structures (with 2
int16,3 int32)
is there a way to read this directly from file into the struct array rather
than a byte[]?

Well, I'd make a static method (or constructor) in the struct which
reads it in from a stream. Then it's just a case of calling that
method/constructor the right number of times.
at the moment im using a seperate byte array to read from the file,
then using the marshal class to do an unsafe copy from the byte to the
struct array.

I'm personally wary of that kind of thing. I know this may be regarded
as heresy by some given how much less code is involved when using
automatic serialization or marshalling, but I prefer explict methods to
load or save objects. That makes the file format very clear, and it
makes the whole thing less fragile in the face of implementation
changes. On the other hand, if you have circular references etc, that
can be a significant issue which serialization takes care of but which
would need a lot of careful handling when serializing "manually".
 
N

Nicholas Paldino [.NET/C# MVP]

colin,

In this case, it might be easier to use unsafe code (if your solution
allows for that). Since your structures are simple (the structure itself
doesn't contain any pointers or references to anything else), you can read
the contents directly. Basically, you would open the file, and then make a
call to ReadFileEx (through the P/Invoke layer), passing a pointer to your
structure as the buffer.


--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

colin said:
Nicholas Paldino said:
chance,

You don't need a memory stream to load the file into a byte array.
All you have to do is open the filestream, and then read the contents
into an array. Since this is a file stream, you know the length in
advance. You can do this:

I also have similar problem, but I have an array of structures (with 2
int16,3 int32)
is there a way to read this directly from file into the struct array
rather than a byte[]?

at the moment im using a seperate byte array to read from the file,
then using the marshal class to do an unsafe copy from the byte to the
struct array.
this might be useful to the OP as well.

works fine but is there a direct way ? the amount of memory needed is
quite large.
I can get an unsafe ptr to the struct array ok, but the file needs a
byte[]
is there a lower level unsafe read ?

I was looking at serialize but ive not managed to find much out about it
yet,
ive spent quite some time looking (I realy need to fix my help so its
online)

Its just to cache the csv file wich takes much longer to load.

Colin =^.^=
 
C

colin

Jon Skeet said:
colin said:
You don't need a memory stream to load the file into a byte array.
All
you have to do is open the filestream, and then read the contents into
an
array. Since this is a file stream, you know the length in advance.
You
can do this:

I also have similar problem, but I have an array of structures (with 2
int16,3 int32)
is there a way to read this directly from file into the struct array
rather
than a byte[]?

Well, I'd make a static method (or constructor) in the struct which
reads it in from a stream. Then it's just a case of calling that
method/constructor the right number of times.

So using one call per member with IO.ReadInt16 etc ?
I'm personally wary of that kind of thing. I know this may be regarded
as heresy by some given how much less code is involved when using
automatic serialization or marshalling, but I prefer explict methods to
load or save objects. That makes the file format very clear, and it
makes the whole thing less fragile in the face of implementation
changes. On the other hand, if you have circular references etc, that
can be a significant issue which serialization takes care of but which
would need a lot of careful handling when serializing "manually".

Wel yes it does detract from the point of using c#,
however as I said this is just to cache the data read from the CSV
(comma seperated variabl = text)
wich is very slow, rather than try to speed that up I decided to store it
temporarily in a binary format.
so any change in structure shouldnt be a problem.

theres also a LOT of data, any significant overhead per entry with function
calls could make it quite slow,
in particular any memory management overhead.

Colin =^.^=
 
C

colin

Nicholas Paldino said:
colin,

In this case, it might be easier to use unsafe code (if your solution
allows for that). Since your structures are simple (the structure itself
doesn't contain any pointers or references to anything else), you can read
the contents directly. Basically, you would open the file, and then make
a call to ReadFileEx (through the P/Invoke layer), passing a pointer to
your structure as the buffer.

ah many thanks, is this what your refering to ?
http://msdn2.microsoft.com/en-us/library/2d9wy99d.aspx
I dont need it to be asynchronous as its in a seperate thread anyway.

I couldnt find ReadFIle/Ex in any of the name spaces.

seems quite involved by comparison to reading to a byte[] and doing an
unsafe copy to struct[].

Colin =^.^=
 
J

Jon Skeet [C# MVP]

colin said:
So using one call per member with IO.ReadInt16 etc ?
Yup.


Wel yes it does detract from the point of using c#,
however as I said this is just to cache the data read from the CSV
(comma seperated variabl = text)
wich is very slow, rather than try to speed that up I decided to store it
temporarily in a binary format.
so any change in structure shouldnt be a problem.

Fair enough. Just thought I'd give my 2p-worth.
theres also a LOT of data, any significant overhead per entry with function
calls could make it quite slow, in particular any memory management
overhead.

It's very unlikely that the processor cost would be particularly
significant compared with IO cost. You'd certainly need to profile it
before you could really say that having a lot of calls would make it
slow.
 
C

colin

Jon Skeet said:
It's very unlikely that the processor cost would be particularly
significant compared with IO cost. You'd certainly need to profile it
before you could really say that having a lot of calls would make it
slow.

I mean a lot of data, like > 4gb, so I cant keep it in memory all the time
and re reading the CSV file is very slow indeed,
even though the files arnt much larger, and it probably gets read many
times.

Colin =^.^=
 
J

Jon Skeet [C# MVP]

colin said:
I mean a lot of data, like > 4gb, so I cant keep it in memory all the time
and re reading the CSV file is very slow indeed,
even though the files arnt much larger, and it probably gets read many
times.

Sure - and that's a perfectly good reason to create a binary
representation, but it's not a good reason to avoid the "manual"
serialization style.
 
C

colin

Nicholas Paldino said:
Colin,

Attached you will find a sample file. It shows how to use unsafe code,
along with the ReadFile API function to read bytes directly from the
stream into your structure. This will not run, as there is no file stream
initialized, and there is no checking to see that the number of bytes read
from the file equals the number of bytes that the structure takes up in
memory (you really should put the call to ReadFile in a loop).

You can use ReadFileEx if you want to perform overlapped I/O, but I
doubt you are looking for this.

ah thats cool thank you very much :)
thats a fair bit simpler, saves going through the low level open and close
compared to the example I found.
il probably try to read the whole array at once, as theres ~ 100,000
entries.

Colin =^.^=
 
N

Nicholas Paldino [.NET/C# MVP]

Colin,

You could try and do that, but remember, since it is a stream, you need
to make sure you loop as you read from it.

For example, if you know your file is 100 bytes long, and you ask to
read 100 bytes, you might get back anywhere UP TO 100 bytes. You have to
issue a read again for the remaining bytes.
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top