P
Peter Duniho
[...]Here's a dumb question: is there any particular reason you're NOT mapping
the entire file at once? I've mentioned the possibility in previous
messages, making assumptions that you have your reasons for not doing so.
But if you could, all of these issues just go away. Are you genuinely
concerned that you won't have enough contiguous virtual address space to
map
the whole file?
Well, there are two issues involved, and I do not know which one are
you reffering to. Let me explain. Mapping is actually two-step process,
first of all you reserve VM for mapping and then you commit, which
result in bringing contents of file to memory.
That's not the process of memory-mapped file i/o I'm familiar with. That
is, while I know you can use MapViewOfFileEx() to provide a specific virtual
address at which to map the file, this isn't necessary, nor does it to my
knowledge require an explicit commit of the entire file.
The usual method of memory-mapping that I use is this:
* open the file (CreateFile)
* create the file mapping (CreateFileMapping)
* assign virtual address space to file mapping (MapViewOfFile)
When MapViewOfFile returns, the code now has a virtual address that
represents the beginning of the data of the file. Physical RAM is committed
only as the data is actually accessed, and can be reclaimed through the
usual page aging process (older pages get tossed as needed if something else
needs physical RAM that's not available).
So, when it comes to
reservation step, I map entire file at once, code I pasted does not
show this step. What is shown is the commitment step, and I commit only
small portion of reserved memory at once.
The code you posted calls only MapViewOfFile. This doesn't reserve any
physical RAM for the data. It just reserves room in the virtual address
space for it.
This app is not going to be
server app, running on high end machines with many gigs of ram. It is
rather intended to be desktop app. So, I do not want to reserve like
500 MB of memory for just one file because it could easily cause
constant swapping and overall performance degradation on user machine.
Negative. That's one of the nice benefits of memory mapping: you can map an
entire file, even a large one, and use only the physical RAM required to
process the parts you're looking at. In addition, because the physical RAM
being used is backed by the mapped file, it doesn't get swapped out to the
swap file...the file itself can be used for the backing store (this doesn't
necessarily help the physical RAM side of things, but it does ease the
pressure on the swap file itself).
There is no reason that I can think of that would cause mapping a large file
into virtual address space to cause any more swapping than processing that
file would cause in any case. The OS certainly does not read all 500MB of a
mapped 500MB file into physical RAM just because you've mapped the file.
[...]Specifically: what if you modified your code that maps the file, so that
it
maps a range *around* the starting point, the way I suggested with the
buffers? At certain points (perhaps only when you got right to the very
edge and attempted to read a byte outside your mapped range), you would
remap the file, shifting the window so that the bytes you want to deal
with
are within the mapped range.
It does. Well, I am sorry, because I stripped this code of mapping
logic, but when you see sth like firstBufferIndex it is, almost in all
cases, carefully computed index of a portion which contains requested
data but also its neighbourhood, so that near "jumps" should not cause
remapping. Actually user may as well use enumerated acces, in that case
I know in advance tha data is going to be read forward, then I can, if
I must, map from where previous mapping ends.
That's not what I mean. If you were doing what I was suggesting already,
then the only issue remaining for you would be figuring out when you need to
back up in the data. The actual backing up would be trivial...you'd just
decrement your pointer and read the byte you want to read. You would have
moments when the mapped section of the file would have to change, but that
would be a momentary diversion and you'd get right back to just reading the
bytes from the mapped address space.
[...] Then you translate
that to the actual offset within the mapped range as necessary. That
way,
you can be changing the mapped range on the fly without affecting how the
higher-level code that actually processes the data works.
Well, that is not going to work for me unfortunately. Interfaces I have
to implement imply that data access uses "string coordinates" - so
client code specifies - I want 5th char, not 5th byte, and reckoning
that encoding-hell I would not be able to compute that easily, so I
decided to use only "string coordinates".
I don't think you got my meaning. I don't mean that the highest level of
your code has to use a byte offset within the file. Just that the decoder
part need not concern itself with anything other than the byte offset. As
it read bytes, it would ask the file mapping layer of your code for a byte
offset within the file, and the file mapping layer would then translate that
into an offset within the mapped view you're using.
That said, so far I haven't seen an indication that you actually need to be
mapping sections of the file. You seem to be concerned about committing too
much physical RAM at once to the mapping, but unless you're doing something
really odd that you haven't posted in code, your concern is unfounded.
There are reasons that you might not be able to map an entire file into your
virtual address space, but 500MB ought to be within the usual limitations.
It seems to me that you should look at just mapping the entire file all at
once, and if you run into problems with that, then start worrying about
windowing the file.
The reason you might not be able to map the whole file at once is that you
don't have a contiguous range of virtual address space large enough for the
file. That can happen for two reasons: insufficient virtual address space
left or fragmented virtual address space. How much virtual address space
you might have will vary, but even the theoretical 2GB maximum (and of
course, this never comes close to being available) is smaller than some
files. Fragmentation is harder to predict, and could limit your available
virtual address space to something significantly smaller than the actual
virtual address space left. But IMHO, if 500MB is a typical file size for
you, you ought to be able to map that without problems.
Pete