Read text file to Temp file and apply formating and color changes

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

I want to open a text file and format it into a specific line and then apply
color to a specific location of the text and then display it in a RichTextBox
after all of this is done.

I can do all of the above after the file is loaded into the RichTextBox, and
I am trying to speed the process up by doing it in a temp file.
 
Brian,

You might as well do it in the RichTextBox. Depending on how much
formatting you need, you would basically have to write the code the RTF
format. If you can find another component that will write RTF format files
for you, you might be better off doing that.

Or, if you don't need to edit your files, use HTML (you can just create
the markup on the fly).
 
Bummer, I was hoping that I could do it outside of the RTB in order to speed
the formatting process up.

Only other item I need to finish this tool is that when I apply the
formatting and color it scrolls down to the bottom of the file as it applies,
and does not put the RichTextBox up to the first line after it is done.

What would you recommend?

Thanks!

Nicholas Paldino said:
Brian,

You might as well do it in the RichTextBox. Depending on how much
formatting you need, you would basically have to write the code the RTF
format. If you can find another component that will write RTF format files
for you, you might be better off doing that.

Or, if you don't need to edit your files, use HTML (you can just create
the markup on the fly).


--
- Nicholas Paldino [.NET/C# MVP]
- (e-mail address removed)

Brian Cook said:
I want to open a text file and format it into a specific line and then
apply
color to a specific location of the text and then display it in a
RichTextBox
after all of this is done.

I can do all of the above after the file is loaded into the RichTextBox,
and
I am trying to speed the process up by doing it in a temp file.
 
Brian said:
Bummer, I was hoping that I could do it outside of the RTB in order to speed
the formatting process up.

I'm not sure why you think file i/o would be faster than doing things in
memory. If you can explain what it is about using the RichTextBox you
think you can actually speed up, that might help in providing you an
answer more closely suited to your needs.
Only other item I need to finish this tool is that when I apply the
formatting and color it scrolls down to the bottom of the file as it applies,
and does not put the RichTextBox up to the first line after it is done.

What would you recommend?

Well, it seems to me that for initializing the control from an existing
file, you should just create an RTF string, and use the RichTextBox.Rtf
property to initialize the textbox. I cannot imagine that there would
be a faster way to initialize the RichTextBox. Certainly I don't
believe that doing file i/o is going to speed things up.

If for some reason you feel you cannot just create an RTF string and
must instead initialize the text and then format it in place, and you
are concerned that formatting the text after it's added to the
RichTextBox is causing a performance problem due to on-screen updating,
one thing you might try doing is hiding the control while you do the
update. This would work best if the control can simply start out hidden
and not be shown until it's initialized.

An alternative (read, "hack" :) ) would be to use DrawToBitmap() to
create a bitmap image of the control, place that in the UI instead of
the RichTextBox until it's finally initialized, and then go ahead and
show the RichTextBox.

As far as positioning the control at the first line of text, you should
be able to get that to happen using these lines of code:

richTextBox1.Select(0, 0);
richTextBox1.ScrollToCaret();

Hope that helps.

Pete
 
Hi Peter, the ScrollToCaret worked great!

Here is a sample of the text as it appears in the file;

Each line starts with a Date/Time:

2007/08/13 14:00:03 [144007]ARES_EINDICATION 010.050.016.010 420.1.01
(6901) RX 68 bytes
2007/08/13 14:00:03 [144007] 69 01 26 02 28 A4 AA 20 76 96 51 44 50 76 08
45
2007/08/13 14:00:03 [144007] 46 00 34 02 02 C7 88 01 C7 88 AA 50 76 08 45
46
2007/08/13 14:00:03 [144007] 20 76 96 51 44 D7 07 08 0D 0D 3B 3B 00 00 10
06
2007/08/13 14:00:03 [144007] 0A 06 06 06 06 06 06 0A 0A 06 06 06 06 0A 0A
9B
2007/08/13 14:00:03 [144007] 67 5F 91 F8

It is then formatted as one line as it is loaded into the RichTextBox, and
looks like this; (I have placed [] around the section that I change the color
on. The [] is not normally there)

2007/08/13 14:00:03 [144007]ARES_EINDICATION 010.050.016.010 420.1.01
(6901) RX 68 bytes 69 01 26 02 28 A4 AA 20 76 96 51 44 50 76 08 45
46 00 34 02 02 C7 88 [01] C7 88 AA 50 76 08 45 46 20 76 96 51 44 D7 07 08 0D
0D 3B 3B 00 00 10 06 0A 06 06 06 06 06 06 0A 0A 06 06 06 06 0A 0A 9B 67 5F 91
F8

I have tried it with the control invisible, and thier is no difference in
the time it takes apply the color to the with it set that way or if it is
visible.

Thanks,

Brian
 
Brian said:
[...]
I have tried it with the control invisible, and thier is no difference in
the time it takes apply the color to the with it set that way or if it is
visible.

So what's the problem? Did you try using the Rtf property to initialize
the control with pre-formatted text? Just how much time are we talking
about here? My experience has been that on a fast computer, control
initializations/interactions are practically instant. What kind of
delay are you running into and why is it such a big deal?

It would really help if you could be a little more specific and
objective about the exact behavior you're seeing and what you'd like
instead.

Pete
 
It takes 3Min30Sec for a 7MB file.

I am runing a P4 3.4Mhz PC with 2GB of RAM.

The big deal, is the next step of this project is to be able to search,
filter and highhlight all entries that match a specific match, or filter it
as it is loaded for one or multiple specific strings.

For example

filter on:
[144007]
[144008]
[144008]

Thanks, and I hope I explained myself better.

Brian

Peter Duniho said:
Brian said:
[...]
I have tried it with the control invisible, and thier is no difference in
the time it takes apply the color to the with it set that way or if it is
visible.

So what's the problem? Did you try using the Rtf property to initialize
the control with pre-formatted text? Just how much time are we talking
about here? My experience has been that on a fast computer, control
initializations/interactions are practically instant. What kind of
delay are you running into and why is it such a big deal?

It would really help if you could be a little more specific and
objective about the exact behavior you're seeing and what you'd like
instead.

Pete
 
Brian said:
It takes 3Min30Sec for a 7MB file.

I am runing a P4 3.4Mhz PC with 2GB of RAM.

I assume that's "3.4Ghz", you mean. :)

Maybe you could post a concise-but-complete example of code that
reliably demonstrates your scenario. I am not seeing nearly so awful a
performance result.

I created a short application that simply puts 7MB worth of dummy text
into a RichTextBox. Actually, it's technically 14MB...I took the
conservative view that your "7MB file" is regular ASCII, and so I
actually generate 7 * 1024 * 1024 characters. Since .NET uses Unicode,
this is actually 14MB, not 7MB.

I generate the dummy text randomly one character at a time, one fake
sentence at a time, appending each new sentence to a StringBuilder that
has been preinitialized to the 7 million (or so :) ) characters. Then I
use the ToString() method and assign the result to the Text property
of the RichTextBox.

I ran this on my Core 2 Duo 2.33Ghz computer with 2GB of RAM.

The initialization of the data takes roughly 1 second, and roughly
another 3 seconds to copy it all to the RichTextBox.

So yes, there's a fair amount of overhead in the control, relatively
speaking (copying the text takes three times as long as generating it in
the first place). But three and a half minutes is WAY too slow. I
don't know what you're code is doing, but it seems likely to me that you
have some basic algorithmic issues going on, and your problem will be
better solved addressing those rather than looking for ways to make the
control itself faster. My experiment suggests that the control itself
has acceptable performance, when used efficiently.

IMHO, the best sample code would include something that autogenerates
the data equivalent to your 7MB file, and which also includes whatever
initialization code you are using, along with a RichTextBox contained by
the form for the program. In other words, the program itself should be
as small as possible (no downloading 7MB files :) ), but should still
reproduce the same performance results you're seeing, by doing the exact
same initialization you're doing.

Pete
 
Sure thing Peter, Here is the code. Do you need me to email you a sample file?

Yes I meant Ghz.

I call this routine from the OpenFileDialog;

This reads and formats the text into the single line format;
----- Begin Code -----
using System;
using System.Drawing;
using System.IO;
using System.Collections.Generic;
using System.Text;
using System.Windows.Forms;

namespace TCEditor
{
class Streamer
{

#region File Stream and Format Routine
public static string LayoutInput(string input)
{
StreamReader sr = File.OpenText(input);
StringBuilder sb = new StringBuilder(input.Length);
bool firstLine = true;
string line;
while ((line = sr.ReadLine()) != null)
{
if (line.Trim() == "")
continue;
if (line.Length < 29) { throw new InvalidOperationException("invalid
input"); }
if (line[29] != ' ')
{
int txPos;
int rxPos = -1;
int len = 0;
if (firstLine)
firstLine = false;
else
sb.Append("\r\n");
if (((txPos = line.IndexOf("TX")) > -1) || ((rxPos = line.IndexOf("RX")) > 0))
{
int charactersTillPoint;
if (txPos > -1)
{
charactersTillPoint = txPos;
len = line.Substring(txPos).Length;
}
else
{
charactersTillPoint = rxPos;
len = line.Substring(rxPos).Length;
}
string part0 = line.Substring(0, charactersTillPoint);
string part1 = line.Substring(charactersTillPoint);
sb.Append(part0.PadRight(86));
sb.Append(part1);
}
else
sb.Append(line);
sb.Append(' ');
if (len == 12)
sb.Append(' ');
}
else
{
sb.Append(line.Substring(31));
}
}
return sb.ToString();
}
#endregion
}
}
-----End Code-----
This is the routine that adds the color to the specified locations;
I call this as the next routine in the OpenFileDialog
-----Begin Code-----
private void findSequenceNumbers()
{
toolStripStatusLabel.Visible = toolStripProgressBar.Visible = true;
toolStripProgressBar.Value = 0;
int lineNum = 0;
bool startingNewLine = true;
FontStyle style = FontStyle.Bold;
string[] lines = rtbDoc.Lines;
string text = rtbDoc.Text;
toolStripProgressBar.Maximum = text.Length;
for (int i = 0; i < text.Length; i++)
{
if (startingNewLine)
{
if ((lines[lineNum].Contains("ARES_EINDICATION")) ||
(lines[lineNum].Contains("ARES_INDICATION")))
{
i += 169;

rtbDoc.Select(i, 2);
rtbDoc.SelectionFont = new Font(rtbDoc.SelectionFont,
rtbDoc.SelectionFont.Style ^ style);
rtbDoc.SelectionColor = Color.DarkBlue;
}
else if (lines[lineNum].Contains("]CODELINE_INDICATION_MSG"))
{
i += 160;
rtbDoc.Select(i, 2);
rtbDoc.SelectionFont = new Font(rtbDoc.SelectionFont,
rtbDoc.SelectionFont.Style ^ style);
rtbDoc.SelectionColor = Color.DarkBlue;
}
else
{
i += lines[lineNum].Length - 1;
}
startingNewLine = false;
Application.DoEvents();
}
if (text == '\n')
{
startingNewLine = true;
lineNum++;
}
toolStripProgressBar.Value = i;
}
toolStripStatusLabel.Visible = toolStripProgressBar.Visible = false;
rtbDoc.Select(0, 0);
rtbDoc.ScrollToCaret();
}
-----End Code-----
 
Brian said:
Sure thing Peter, Here is the code. Do you need me to email you a sample file?

No. As I said in my post, your sample code should auto-generate some
relevant data. That way, no one has to have an exact copy of the sample
file you are using.

Also, I have asked twice already whether you have tried pre-formatting
the text and then assigning the RTF string to the RichTextBox.Rtf
property. You haven't answered that question yet. IMHO, that technique
is likely to give you the best performance. If nothing else, it will
make clear where the performance bottleneck is (assuming you bother to
time the individual sections of your initialization, of course).

Please at least answer that question, even if you have no intention of
trying the suggestion.

Pete
 
Peter Duniho said:
No. As I said in my post, your sample code should auto-generate some
relevant data. That way, no one has to have an exact copy of the sample
file you are using.

Sorry, I thought I had answered that. I have not tried to auto-generate some
relevant data, as I am unsure how to do that being new to C#.
Also, I have asked twice already whether you have tried pre-formatting
the text and then assigning the RTF string to the RichTextBox.Rtf
property. You haven't answered that question yet. IMHO, that technique
is likely to give you the best performance. If nothing else, it will
make clear where the performance bottleneck is (assuming you bother to
time the individual sections of your initialization, of course).
As above, no I have not tried doing that as I was not aware of that
technique, and do not know how to do it.

I am doing what I can to learn this as I go along, and have made good
progress on some of what I consider the harder things. This is still very
new.
Please at least answer that question, even if you have no intention of
trying the suggestion.

Pete I will gladly try your suggestion. I just don't know how to do it..
That is why I posted here so that I can get suggestions (e.g. examples) of
how I can do it, or assistance in figuring it out.

We all have to start someplace, and I am at the beginning, without any
classes, years of C++ experiance or other practical knowledge.

Brian
 
Brian said:
Sorry, I thought I had answered that. I have not tried to auto-generate some
relevant data, as I am unsure how to do that being new to C#.

It should not be much different from how you'd do it in any other
language. Your data appears to have a fairly regular format. So it
should not be hard. In fact, if anything it's likely easier in C#/.NET
because the most complicated thing in your data -- the time stamp -- is
easily supported using the DateTime class and associated string
formatting options.
As above, no I have not tried doing that as I was not aware of that
technique, and do not know how to do it.

Well, I did mention it twice before. :)
I am doing what I can to learn this as I go along, and have made good
progress on some of what I consider the harder things. This is still very
new.

No doubt the new environment can seem daunting. However, please do keep
in mind that we are not here to write your code for you. You will
necessarily have to do the bulk of the work, and that does mean taking
the time to learn the new concepts that are introduced as they come up
in the discussion.

If that takes some time, that's just how it has to be. It may mean that
you have to spend a day or two familiarizing yourself with the
additional concepts before you reach a point where you can post another
question. But that's not wasted time; it will serve you well as you
move forward in your use of .NET.

What's _not_ a good idea, IMHO, is to simply ignore the suggestions
offered and not bother to respond to them.
Pete I will gladly try your suggestion. I just don't know how to do it..
That is why I posted here so that I can get suggestions (e.g. examples) of
how I can do it, or assistance in figuring it out.

My best suggestion is that when a specific property or method in a class
is mentioned to you, your first step is to review the MSDN documentation
for that property or method. I wrote about the RichTextBox.Rtf property
specifically, and the documentation for it can be found here:

http://msdn2.microsoft.com/en-us/library/system.windows.forms.richtextbox.rtf.aspx

On that page, there is this comment: "For the RTF codes, see "rich text
format (RTF) Specification, version 1.6" in the MSDN library". Entering
the words "rich text format specification" in the MSDN search field and
executing the search produces this page as the very first result:

http://msdn2.microsoft.com/en-us/library/aa140277(office.10).aspx

In that document, you should be able to find everything you need to know
about inserting the desired formatting codes into a text string to make
it suitable for assignment to the RichTextBox.Rtf property.

In addition to all that, there's always good old reverse-engineering and
trial and error. You could easily format the text the way you are doing
it now, and then get the Rtf property to see what the results look like.
Then you just reproduce that going the other way.
We all have to start someplace, and I am at the beginning, without any
classes, years of C++ experiance or other practical knowledge.

Understood. And I hope you have not been given the impression that we
(and I in particular) expect you to know everything without needing any
guidance.

However, as I hope I've reasonably illustrated above, it would not have
been all that hard for you to go from reading my suggestion that you use
the RichTextBox.Rtf property, to finding detailed documentation
explaining everything you should need to know about using it, or
otherwise educating yourself on its use. I could write a post spelling
it all out for you, but IMHO given that MSDN has already put it all
together for you already, there should be no need for me to duplicate
that effort.

If after reviewing the documentation, you _still_ have questions, that's
an entirely different matter. The documentation is not always perfectly
clear, and in some cases it may make more sense to someone with at least
a little experience with writing Windows software. It's absolutely
reasonable to post a question asking (among other things) for an
explanation of something in the documentation. It's just that it's
important that you do that initial leg-work yourself.

Now, all that said...it's possible that simply reviewing the code you
posted, something useful will jump out. With three and a half minutes
to process just 7MB of data, it seems likely to me that there's at least
one obvious algorithmic error, and maybe more than one. :)

So lacking the inclusion of some code to auto-generate some test data,
it's possible I or someone else might be able to provide some good
feedback regarding the algorithm you're using. I'll get a chance to
look at the code soon, and if I notice anything I'll post another reply
to the post with the code.

Pete
 
Thanks,

Peter Duniho said:
It should not be much different from how you'd do it in any other
language. Your data appears to have a fairly regular format. So it
should not be hard. In fact, if anything it's likely easier in C#/.NET
because the most complicated thing in your data -- the time stamp -- is
easily supported using the DateTime class and associated string
formatting options.


Well, I did mention it twice before. :)


No doubt the new environment can seem daunting. However, please do keep
in mind that we are not here to write your code for you. You will
necessarily have to do the bulk of the work, and that does mean taking
the time to learn the new concepts that are introduced as they come up
in the discussion.

If that takes some time, that's just how it has to be. It may mean that
you have to spend a day or two familiarizing yourself with the
additional concepts before you reach a point where you can post another
question. But that's not wasted time; it will serve you well as you
move forward in your use of .NET.

What's _not_ a good idea, IMHO, is to simply ignore the suggestions
offered and not bother to respond to them.


My best suggestion is that when a specific property or method in a class
is mentioned to you, your first step is to review the MSDN documentation
for that property or method. I wrote about the RichTextBox.Rtf property
specifically, and the documentation for it can be found here:

http://msdn2.microsoft.com/en-us/library/system.windows.forms.richtextbox.rtf.aspx

On that page, there is this comment: "For the RTF codes, see "rich text
format (RTF) Specification, version 1.6" in the MSDN library". Entering
the words "rich text format specification" in the MSDN search field and
executing the search produces this page as the very first result:

http://msdn2.microsoft.com/en-us/library/aa140277(office.10).aspx

In that document, you should be able to find everything you need to know
about inserting the desired formatting codes into a text string to make
it suitable for assignment to the RichTextBox.Rtf property.

In addition to all that, there's always good old reverse-engineering and
trial and error. You could easily format the text the way you are doing
it now, and then get the Rtf property to see what the results look like.
Then you just reproduce that going the other way.


Understood. And I hope you have not been given the impression that we
(and I in particular) expect you to know everything without needing any
guidance.

However, as I hope I've reasonably illustrated above, it would not have
been all that hard for you to go from reading my suggestion that you use
the RichTextBox.Rtf property, to finding detailed documentation
explaining everything you should need to know about using it, or
otherwise educating yourself on its use. I could write a post spelling
it all out for you, but IMHO given that MSDN has already put it all
together for you already, there should be no need for me to duplicate
that effort.

If after reviewing the documentation, you _still_ have questions, that's
an entirely different matter. The documentation is not always perfectly
clear, and in some cases it may make more sense to someone with at least
a little experience with writing Windows software. It's absolutely
reasonable to post a question asking (among other things) for an
explanation of something in the documentation. It's just that it's
important that you do that initial leg-work yourself.

Now, all that said...it's possible that simply reviewing the code you
posted, something useful will jump out. With three and a half minutes
to process just 7MB of data, it seems likely to me that there's at least
one obvious algorithmic error, and maybe more than one. :)

So lacking the inclusion of some code to auto-generate some test data,
it's possible I or someone else might be able to provide some good
feedback regarding the algorithm you're using. I'll get a chance to
look at the code soon, and if I notice anything I'll post another reply
to the post with the code.

Pete
 
Okay...had a chance to look through your code. I don't know if I can
cut three minutes or more from the cost while sticking with the same
basic algorithm, but there is some low-hanging fruit. You haven't
provided any information regarding which parts of your initialization is
slow, so it may or may not be that some or all of these changes are
significant. But they are still potentially issues that can be fixed.

See what happens if you clean some of these things up (and by the way,
please when posting code make sure that your indentation is
preserved...it's a lot harder to read the code without it):

Brian said:
This reads and formats the text into the single line format;
----- Begin Code -----
using System;
using System.Drawing;
using System.IO;
using System.Collections.Generic;
using System.Text;
using System.Windows.Forms;

namespace TCEditor
{
class Streamer
{

#region File Stream and Format Routine
public static string LayoutInput(string input)
{
StreamReader sr = File.OpenText(input);
StringBuilder sb = new StringBuilder(input.Length);
bool firstLine = true;
string line;
while ((line = sr.ReadLine()) != null)
{
if (line.Trim() == "")
continue;
if (line.Length < 29) { throw new InvalidOperationException("invalid
input"); }
if (line[29] != ' ')
{
int txPos;
int rxPos = -1;
int len = 0;
if (firstLine)
firstLine = false;
else
sb.Append("\r\n");
if (((txPos = line.IndexOf("TX")) > -1) || ((rxPos = line.IndexOf("RX")) > 0))

Minor issue above: IndexOf() has to scan through the string each time
you call it. Worst-case is, of course, when the text you're looking for
doesn't exist.

You might do better using a regular expression (see Regex class) to
search for both "TX" and "RX" at the same time. Then you can check the
actual match (if any) to determined which matched. Though, that said, I
don't see anything that actually depends on which one matched; you seem
to treat both the same. So even more reason to just use Regex.
{
int charactersTillPoint;
if (txPos > -1)
{
charactersTillPoint = txPos;
len = line.Substring(txPos).Length;

Major issue above: this is the worst way to calculate "len". Getting
the length of the original string is fast. Doing a subtraction is fast.
Creating a whole new string just so you can see how long it is? Not fast.

Change this line to:

len = line.Length - txPos;
}
else
{
charactersTillPoint = rxPos;
len = line.Substring(rxPos).Length;
Likewise.

}
string part0 = line.Substring(0, charactersTillPoint);
string part1 = line.Substring(charactersTillPoint);
sb.Append(part0.PadRight(86));
sb.Append(part1);

Medium issue: creating new strings just to append to the StringBuilder
incurs the overhead of the instance creation. But the StringBuilder has
overloads for the Append() method to avoid that.

I did a quick-and-dirty test, and it appears to me that creating the
string almost doubles the total time it takes to append text to a
StringBuilder.

At the very least, I would change the "part1" appending so that it looks
more like this:

sb.Append(line, charactersTillPoint, len);

I suspect you would also gain a win by not using PadRight, and instead
doing that work yourself:

sb.Append(line, 0, charactersTillPoint);
sb.Append(new string(' ', 86 - charactersTillPoint));

That way, you avoid the creation of both substrings (but not the new
padding string, of course). This cuts your string instantiation in this
area of the code from three strings down to just one.
}
else
sb.Append(line);
sb.Append(' ');
if (len == 12)
sb.Append(' ');
}
else
{
sb.Append(line.Substring(31));

Likewise here, use the Append() overload that extracts the substring for
you:

sb.Append(line, 31, line.Length - 31);

One last note on the substring thing: I also tested the overload that
takes a char[] instead of an array, along with the substring index and
length. It's actually even a little faster than passing a string, but
not by a lot. The big win here will be to just stop instantiating new
strings to append.
}
}
return sb.ToString();
}
#endregion
}
}
-----End Code-----
This is the routine that adds the color to the specified locations;
I call this as the next routine in the OpenFileDialog

I'm not going to provide specific comments for this method. Probably
the most costly part of it is all of the selecting and formatting that's
going on, and the best way to fix that would be to simply move the
formatting logic into the same method where you are reading the file,
and insert the necessary RTF format codes, rather than interacting with
the control directly.

That said, on a style perspective, I'd suggest that one significant
thing wrong with this method is that you wind up with two copies of the
text from the control. IMHO, since you want to process the text on a
line-by-line basis, you should just get the string[] from the Lines
property and operate on that. The RichTextBox control has methods such
as GetFirstCharIndexFromLine to allow you to determine actual character
indices for the purpose of formatting, and I would be surprised if using
that method significantly reduces the overall performance of this
method. As a result, your memory footprint will be halved, and the code
will be much closer to your intended algorithm.

A couple of other changes I'd make are to not call DoEvents(), and to
not update your progress control so often.

If you can't get the performance of this stuff down to something
acceptable for being in-line with the UI code, the correct solution is
to move the processing to a background thread. The BackgroundWorker
class is designed especially for this sort of thing and would work
nicely for you.

As far as the updating of the progress control goes, the main issue
there is that it potentially generates UI updates. I haven't checked
its exact implementation, but at the very least you are calling the
control many more times than one would actually be able to perceive.
IMHO, it'd be better to set the max for the control to 100 and update it
any time you progress 1%. A possible middle-ground would be to base the
maximum on the number of lines, and only update it when you hit the code
that checks for the start of a new line. Of course, if you fix the code
to be line-based in the first place, this becomes even easier.

All of this is moot if you change the design to generate RTF text
instead. That's actually the solution that I think would provide the
best results.

Pete
 
Peter thank you for the suggestions. Those helped me understand how to
accomplish what I need. Samples/snipits are easier for me to understand,
learn from and impliment.

For the File Open method, I moved to a StringCollection which has reduced
the time it takes to open and format the file significantly.

Still working around the color portion. I have placed it into the file open
method, yet I am getting an index out of bounds error that I am tracking down.

Thanks again.

Brian

Peter Duniho said:
Okay...had a chance to look through your code. I don't know if I can
cut three minutes or more from the cost while sticking with the same
basic algorithm, but there is some low-hanging fruit. You haven't
provided any information regarding which parts of your initialization is
slow, so it may or may not be that some or all of these changes are
significant. But they are still potentially issues that can be fixed.

See what happens if you clean some of these things up (and by the way,
please when posting code make sure that your indentation is
preserved...it's a lot harder to read the code without it):

Brian said:
This reads and formats the text into the single line format;
----- Begin Code -----
using System;
using System.Drawing;
using System.IO;
using System.Collections.Generic;
using System.Text;
using System.Windows.Forms;

namespace TCEditor
{
class Streamer
{

#region File Stream and Format Routine
public static string LayoutInput(string input)
{
StreamReader sr = File.OpenText(input);
StringBuilder sb = new StringBuilder(input.Length);
bool firstLine = true;
string line;
while ((line = sr.ReadLine()) != null)
{
if (line.Trim() == "")
continue;
if (line.Length < 29) { throw new InvalidOperationException("invalid
input"); }
if (line[29] != ' ')
{
int txPos;
int rxPos = -1;
int len = 0;
if (firstLine)
firstLine = false;
else
sb.Append("\r\n");
if (((txPos = line.IndexOf("TX")) > -1) || ((rxPos = line.IndexOf("RX")) > 0))

Minor issue above: IndexOf() has to scan through the string each time
you call it. Worst-case is, of course, when the text you're looking for
doesn't exist.

You might do better using a regular expression (see Regex class) to
search for both "TX" and "RX" at the same time. Then you can check the
actual match (if any) to determined which matched. Though, that said, I
don't see anything that actually depends on which one matched; you seem
to treat both the same. So even more reason to just use Regex.
{
int charactersTillPoint;
if (txPos > -1)
{
charactersTillPoint = txPos;
len = line.Substring(txPos).Length;

Major issue above: this is the worst way to calculate "len". Getting
the length of the original string is fast. Doing a subtraction is fast.
Creating a whole new string just so you can see how long it is? Not fast.

Change this line to:

len = line.Length - txPos;
}
else
{
charactersTillPoint = rxPos;
len = line.Substring(rxPos).Length;
Likewise.

}
string part0 = line.Substring(0, charactersTillPoint);
string part1 = line.Substring(charactersTillPoint);
sb.Append(part0.PadRight(86));
sb.Append(part1);

Medium issue: creating new strings just to append to the StringBuilder
incurs the overhead of the instance creation. But the StringBuilder has
overloads for the Append() method to avoid that.

I did a quick-and-dirty test, and it appears to me that creating the
string almost doubles the total time it takes to append text to a
StringBuilder.

At the very least, I would change the "part1" appending so that it looks
more like this:

sb.Append(line, charactersTillPoint, len);

I suspect you would also gain a win by not using PadRight, and instead
doing that work yourself:

sb.Append(line, 0, charactersTillPoint);
sb.Append(new string(' ', 86 - charactersTillPoint));

That way, you avoid the creation of both substrings (but not the new
padding string, of course). This cuts your string instantiation in this
area of the code from three strings down to just one.
}
else
sb.Append(line);
sb.Append(' ');
if (len == 12)
sb.Append(' ');
}
else
{
sb.Append(line.Substring(31));

Likewise here, use the Append() overload that extracts the substring for
you:

sb.Append(line, 31, line.Length - 31);

One last note on the substring thing: I also tested the overload that
takes a char[] instead of an array, along with the substring index and
length. It's actually even a little faster than passing a string, but
not by a lot. The big win here will be to just stop instantiating new
strings to append.
}
}
return sb.ToString();
}
#endregion
}
}
-----End Code-----
This is the routine that adds the color to the specified locations;
I call this as the next routine in the OpenFileDialog

I'm not going to provide specific comments for this method. Probably
the most costly part of it is all of the selecting and formatting that's
going on, and the best way to fix that would be to simply move the
formatting logic into the same method where you are reading the file,
and insert the necessary RTF format codes, rather than interacting with
the control directly.

That said, on a style perspective, I'd suggest that one significant
thing wrong with this method is that you wind up with two copies of the
text from the control. IMHO, since you want to process the text on a
line-by-line basis, you should just get the string[] from the Lines
property and operate on that. The RichTextBox control has methods such
as GetFirstCharIndexFromLine to allow you to determine actual character
indices for the purpose of formatting, and I would be surprised if using
that method significantly reduces the overall performance of this
method. As a result, your memory footprint will be halved, and the code
will be much closer to your intended algorithm.

A couple of other changes I'd make are to not call DoEvents(), and to
not update your progress control so often.

If you can't get the performance of this stuff down to something
acceptable for being in-line with the UI code, the correct solution is
to move the processing to a background thread. The BackgroundWorker
class is designed especially for this sort of thing and would work
nicely for you.

As far as the updating of the progress control goes, the main issue
there is that it potentially generates UI updates. I haven't checked
its exact implementation, but at the very least you are calling the
control many more times than one would actually be able to perceive.
IMHO, it'd be better to set the max for the control to 100 and update it
any time you progress 1%. A possible middle-ground would be to base the
maximum on the number of lines, and only update it when you hit the code
that checks for the start of a new line. Of course, if you fix the code
to be line-based in the first place, this becomes even easier.

All of this is moot if you change the design to generate RTF text
instead. That's actually the solution that I think would provide the
best results.

Pete
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top