Scrolling Large Text File Without Hogging Memory

T

Tom

I don't want to re-invent the wheel and am looking for a simple
implementation of a text viewer or RichTextBox in read only mode that
allows rapid file positioning within large data files without the time
consuming and memory hogging associated with loading the entire file.

Using the thumb to get close and then paging and scrolling to get
exact placement within the file. Perhaps with only a few pages of data
loaded into memory.

I can't seem to snap out of my unmanaged way of thinking and how I'd
tackle this problem with pointers in C++. After too many years of
structured programming and with much effort I am beginning to gain
some working knowledge of these amazingly flexible objects. Can
someone please point me in a smart and C# style direction for solving
this not so uncommon and rather entry level task?

Thanks.

-- Tom
 
N

Nicholas Paldino [.NET/C# MVP]

Tom,

Well, how would you do it in C++? There isn't a facility out-of-the-box
to do this in .NET, but if you provide some insight as to how you would do
it in C++, I'm sure a number of people would be able to chime in to show how
you would do the equivalent in .NET.
 
T

Tom

I'm still in the early coding phase of this task and its forcing me to
dig into the C# details and thus *learn* some things which is always a
good thing. The principal areas of my confusion >>

Area Status
1) The Unicode usage. Somewhat figured out.
2) The need for a pointer. Somewhat figured out.
3) Memory management. Please help. Beyond my skills.
4) Scroll/Thumb logic. Any advice appreciated.

Comments on the above >>

#1) One important observation is that StreamReader correctly imports
ASCII text data files into a RichTextBox because it defaults to UTF-8.
Why is this confusing? My texts state that .Net uses Unicode(UTF-16)?
Some digging in the doco explains the usage of the "preamble" Byte
Order Mark(BOM) tag. Interesting how the default behavior differs from
some very popular textbook teachings.

#2) StreamReader calls File.OpenText that has attribute CanSeek =
false. Thus the only other C# option I have found so far is to use
FileStream. Unfortunately FileStream supports the reading of bytes and
not lines. There was some happy dancing here when I figured out the
byte_array -> string -> RichTexBox conversion and input. Key
statements follow:

#using System.Text;

public class ModifiedRichTextBox : RichTextBox

FileInfo myFile = new FileInfo(path);
using (FileStream fs = File.OpenRead(myFile.FullName))
fs.Position = myFile.Length / 2;
int byteArraySize = 500;
byte[] byteArray = new byte[byteArraySize];
string charBlock = "";
if( fs.Read(byteArray, 0, byteArraySize) > 0 )
{
charBlock = Encoding.UTF8.GetString(byteArray); // <- tricky 4 me
this.Text += charBlock; // "this" is the RichTextBox
}

#3) Using the thumb to "Position" the pointer within the file,
clearing the RichTextBox and performing another RichTextBox input is
level one performance. Being able to smoothly scroll in both
directions using a buffer would kick it up several notches.

Inputting the small byte array into the RichTextBox allows some local
scrolling. Controlling the buffer expertly would be VERY nice ... but
way beyond my current skill level. Perhaps this is what is known as
"virtualization" or "memory mapping"? I hate to throw words around out
of ignorance ... but I also have no problems recognizing my lack of
skill. A quote from Wikipedia >>

"The Microsoft .Net runtime environment does not natively include
managed access to memory mapped files, but there are third-party
libraries which do so."

Wiki references the following >>

http://www.winterdom.com/dev/dotnet/index.html

The above has a FileMap and Patch that looks promising ... I hope to
explore them soon and hopefully understand them. Yikes! Most likely
over my head.

#4) A crude scrolling/thumb implementation simple scales the percent
thumb travel to that of the pointer being Positioned to that same
percentage of the file.Length. Some clever scroll and cursor event
trapping within the RichTextBox might can effectively reload the
RichTextBox ... but I suspect less than smooth operation. Even so,
viewing portions of a huge file quickly and without hogging resources
is a good thing. :)

I do have a need for such a file viewer and am also using it as a
learning task. Any direction I receive here will hasten that learning
experience and keep me out of the all too many mined fields. LOL ..
perhaps I should say mind fields?

All guidance appreciated!!

Thanks.

-- Tom
 
P

Peter Duniho

[...]
#1) One important observation is that StreamReader correctly imports
ASCII text data files into a RichTextBox because it defaults to UTF-8.
Why is this confusing? My texts state that .Net uses Unicode(UTF-16)?

..NET uses UTF-16 internally, but it supports a wide range of text formats
for reading and writing. Even in situations where the default encoding
used to read a file doesn't work, you can specify an encoding yourself and
..NET will convert.
[...]
#2) StreamReader calls File.OpenText that has attribute CanSeek =
false. Thus the only other C# option I have found so far is to use
FileStream. Unfortunately FileStream supports the reading of bytes and
not lines. There was some happy dancing here when I figured out the
byte_array -> string -> RichTexBox conversion and input.

I don't think your effort with FileStream was lost, since you learned more
about using encodings to convert bytes to or from strings.

But you can in fact seek with a StreamReader. You just need to use the
BaseStream for seeking, calling DiscardBufferedData() on the StreamReader
after you seek to synchronize the StreamReader with the underlying stream.
[...]
#3) Using the thumb to "Position" the pointer within the file,
clearing the RichTextBox and performing another RichTextBox input is
level one performance. Being able to smoothly scroll in both
directions using a buffer would kick it up several notches.

[...]
Wiki references the following >>

http://www.winterdom.com/dev/dotnet/index.html

The above has a FileMap and Patch that looks promising ... I hope to
explore them soon and hopefully understand them. Yikes! Most likely
over my head.

Well, as Nicholas asked, how would you do it in C++? Assuming you're
familiar with the native Win32 memory mapping API, then assuming the
memory mapping libraries you found work in a similar fashion, I'd guess
they would be able to handle the buffering useful for better performance..
It kind of depends how elaborate the implementations are, especially with
respect to going backwards (would they pre-load data from the file that
precedes data you're reading?) I didn't actually look at the libraries,
so I obviously don't know.

Of course, you don't have to use memory mapping techniques. You could
buffer the file yourself, keeping decent-size blocks in a linked list or
even just referenced in a small array (one small enough that shifting the
array elements is a trivial, fast operation).
#4) A crude scrolling/thumb implementation simple scales the percent
thumb travel to that of the pointer being Positioned to that same
percentage of the file.Length. Some clever scroll and cursor event
trapping within the RichTextBox might can effectively reload the
RichTextBox ... but I suspect less than smooth operation. Even so,
viewing portions of a huge file quickly and without hogging resources
is a good thing. :)

I'm not really sure what you're asking here. The default minimum and
maximum range for a ScrollBar is suited to percentages, but you can set
the range to whatever you want, up to the range of the Int32 type. Is
there something you're looking to do other than just setting the Maximum
property to the file length?

Finally, I'll suggest that whatever you do, trying to do this with an
actual RichTextBox is always going to be on the slow side (because you
need to reset the text of the RichTextBox whenever you scroll), and
possibly even prone to UI oddities depending on how you do it (having your
own scrollbar rather than using the textbox's solves some issues, but may
introduce new ones). You may find that in the long run it makes more
sense to implement your own textbox control that supports the
virtualization you're looking for.

Or, for that matter, maybe someone already has. Have you looked to see if
a class like this already exists?

Pete
 
T

Tom

Thanks Nicholas & Pete !!

Nicholas >>

You got me thinking and digging much harder to be able to respond
after my first posting. You'd probably be amazed at how many times I
rewrote that reply as I continuously found more useful tips in the
doco. At some point I needed to reply out of courtesy and let you know
progress was being made and your little push helped!!

Pete >>

Calling the BaseStream is so logical! I just don't think at that level
yet. Obviously there is an internal pointer that allows StreamReader
to function. I dug a little using Reflector (a new tool for me!) ...
but I never connected the dots.

Your comments on using a simplified class and handling the buffering
myself has me thinking outside my little box. I've searched a few
hours for enhancements on this project and I focused more on the
enhanced DataGridView offerings, tutorials and examples on binding,
and the WPF overviews than buffered text viewers. I need to spend some
additional time searching more thoroughly in that narrowed topic. Wow
there is a lot to absorb!

A one line comment I received the other day about CodeProject.com sure
has opened my eyes to the vastness of public domain class
enhancements. Once I began trying to get fancy with OwnerDraw in
ListView several realizations hit me. I'm still trying to comprehend a
bigger picture and the guidance I receive in here is absolutely the
MOST helpful.

Your comment on the ScrollBar range being Int32 leads to much cleaner
handling than working with percentages. The smallest thumb increment
on the largest files might span several pages and I'll have to work
out those issues with scroll and cursor events.

----------------------------------------------------------------

The good news is the help I receive in here also carries over to
improved usage of the online doco. The bad news is I still have a ton
of stupid to overcome. Hopefully a little less every day. ;)

Both you guys are batting 1,000 in my ledger. Thanks again.

-- Tom
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top