how long should this take?

M

mp

I'm using streamreader to open text files
I then look at each character in the file to detect certain patterns
it took 253 seconds
to parse 748 files
containing
563302 characters

then i commented out a few debug.print statements (which were called for
each file of course)
the time went down to 421 milliseconds

is that a normal impact from debug.print?
does that (421 milliseconds) sound like a reasonable amount of time to read
those characters?
or does it sound like further time optimizations should be sought out?

thanks
mark

in wpf form
private void button3_Click(object sender, RoutedEventArgs e)
{
int timein = Environment.TickCount;
int numFiles=0;
int sizeFiles = 0;

SortedDictionary<string, List<string>> defunsThisFolder = new
SortedDictionary<string, List<string>>();
string path = "C:\\0\\0code\\lisp\\newformat";
DirectoryInfo root = new DirectoryInfo(path);

var fileQuery = from fileInfo in
root.EnumerateFiles("*.lsp")
select fileInfo;

string FullName;
foreach (var fileInfo in fileQuery)
{
numFiles++;
FullName = fileInfo.FullName;

this.Title = "Reading file " + FullName + " , Please
wait...";
cLispFile cf = new cLispFile(FullName);
cf.ReadFile();
sizeFiles = sizeFiles + cf.FileLength;

cf.ListDefuns();
defunsThisFolder.Add(FullName, cf.DefunList);
}

foreach (KeyValuePair<string, List<string>> kvp in
defunsThisFolder)
{
this.lbOrig.Items.Add (kvp.Key);
foreach (string s in kvp.Value)
this.lbOrig.Items.Add('\t' + s);
}
int timeout = Environment.TickCount;
int elapsed = timeout - timein;
this.Title = "checked " + numFiles.ToString() + " files in "
+ (elapsed).ToString() + " milliseconds (" + sizeFiles.ToString() + ")
characters";
}

in cLispFile:
public void ReadFile()
{
if (_FileNotFound == false)
{
try
{
using (StreamReader sr = File.OpenText(_fileName))
{
GetCommentsFromStreamReader(sr);
}
}
catch (Exception ex)
{
Debug.Print(ex.Message);
throw;
}
}
}

private void GetCommentsFromStreamReader(StreamReader sr)
{
String input;
while ((input = sr.ReadLine()) != null)
{
GetCommentFromLine(input);
}
}

private void GetCommentFromLine(string inputString)
{
char preceedingCharacter = '\0';
char beforepreceedingCharacter = '\0';

char currentCharacter;
StringBuilder currentquote = new StringBuilder();

StringBuilder sbCommentsThisLine = new StringBuilder();
StringBuilder sbCommentsStrippedThisLine = new StringBuilder();
//----------------------------------------------------------------
//unless it's a multi line comment,(which im not yet
implementing)
//then each line clears InComment flag
_amInComment = false;
//----------------------------------------------------------------
_fileLength=_fileLength+inputString.Length;

for (int CharPos = 0; CharPos < inputString.Length; CharPos++)
{
currentCharacter=inputString[CharPos];

if (_amInQuote != true)
{
//semi colon is only comment if not inside a quoted
string""
if (currentCharacter == comment)
{
_amInComment = true;
}

//only process quote that are not commented out
if (_amInComment != true)
{
SetQuoteState(currentCharacter, CharPos,
preceedingCharacter, beforepreceedingCharacter, ref _amInQuote);
}
}
else//amInQuote is true
{
currentquote.Append(currentCharacter);
SetQuoteState(currentCharacter, CharPos,
preceedingCharacter, beforepreceedingCharacter, ref _amInQuote);
}

if (_amInComment == true)
{
sbCommentsThisLine.Append(currentCharacter);
}
else
{
sbCommentsStrippedThisLine.Append(currentCharacter);
}
beforepreceedingCharacter = preceedingCharacter;
preceedingCharacter = currentCharacter;
}//end for

_sbCommentsStrippedThisFile.Append(sbCommentsStrippedThisLine.ToString());
_sbCommentsStrippedThisFile.Append(Environment.NewLine);
_sbCommentsThisFile.Append(sbCommentsThisLine.ToString());
_sbCommentsThisFile.Append(Environment.NewLine);

RaiseCommentDetectedEvent(new
CommentDetectedEventArgs(inputString, sbCommentsThisLine.ToString(),
sbCommentsStrippedThisLine.ToString()));
}

private void SetQuoteState(char currentCharacter, int charPos, char
preceedingCharacter, char beforePreceedingCharacter, ref Boolean _amInQuote)
{
if (currentCharacter == quote)
{
if (_amInQuote == true)
{
if (preceedingCharacter != backslash)
{
_amInQuote = false;
}
else if (preceedingCharacter == backslash)
{
//if char before preceedingchar is backslash
then backslash is backslashed
if (beforePreceedingCharacter == backslash )
{
_amInQuote = false;
}
}
}
else
{
_amInQuote = true;
}
}//if current char is quote
}
 
M

mp

Peter Duniho said:
Since the code you posted doesn't actually include any non-exception calls
to Debug.Print(), there's no way to know for sure. But the debug console
in VS is a major bottleneck; text that goes through there can really slow
a program down. One interesting test would be to just run your debug
build with the Debug.Print() statements from the command line instead of
the debugger.


with debug.print lines
(not putting into listbox)(commented that out)

run from ide (debug menu|debug)
256700 milliseconds

run from command prompt
4227 milliseconds

without debug.print lines(commented that out)
(not putting into listbox)(commented that out)

run from ide or command prompt
1500 +/- milliseconds if i change the form title for each file read:
//this.Title = "Reading file " + FullName + " , Please wait...";

300 milliseconds +/- with that line commented out

In general, I would not expect (for example) a few thousand debug
console write-lines to take 253 seconds. That's only maybe 15 calls to
Debug.Print() per second, which is really slow. But if your system is
slow and/or you're running under the IDE, that might explain the delay.


Keep in mind that time includes not just the reading, but the processing
as well, which includes the use of StringBuilder instances to dissect the
text in a line you've read.

You don't say what encoding your text files are in, but since the code
apparently works I'll assume they are compatible with UTF8. And since
they are source code, then even if they are UTF8 instead of ASCII, I'd
guess most if not all of the characters are a single byte. So, let's say
the total bytes read is the same as total characters.

i'm not up on encodings. I think they're ascii...that's what i think of as
"text" files (what notepad creates)
That means you're processing about 1.3 megabytes per second. That is in
fact considerably slower than nominal disk i/o speed, by a factor of about
100. But again, without specific details, including the exact hardware
you're running on



and a complete code example (I mean, we don't
even know what the ListDefuns() method does

oops forgot that one...i commented the call to that for these timings

public void ListDefuns()
{
GetDefunsFromString(_sbCommentsStrippedThisFile.ToString());
RaiseDefunDetectedEvent(new
DefunDetectedEventArgs(_sbDefuns.ToString()));
}

private void GetDefunsFromString(string inputString)
{
string defunString = "defun";
string argStart = "(";
string balanceOfstring = inputString ;

_DefunListThisFile= new List<string>();
int defunPos = balanceOfstring.IndexOf(defunString,
StringComparison.OrdinalIgnoreCase);
if (defunPos > -1)
{
int argPos = balanceOfstring.IndexOf(argStart, defunPos +
defunString.Length);
int nextStartPos;
string thisDefun = balanceOfstring.Substring(defunPos +
defunString.Length, argPos - (defunPos + defunString.Length)).Trim();

while (defunPos > -1)
{
_DefunListThisFile.Add(thisDefun);
_sbDefuns.Append(thisDefun);
_sbDefuns.Append(Environment.NewLine);
nextStartPos = argPos;

balanceOfstring =
balanceOfstring.Substring(nextStartPos);
defunPos = balanceOfstring.IndexOf(defunString,
StringComparison.OrdinalIgnoreCase);

if (defunPos > -1)
{
argPos = balanceOfstring.IndexOf(argStart, defunPos
+ defunString.Length);
thisDefun = balanceOfstring.Substring(defunPos +
defunString.Length, argPos - (defunPos + defunString.Length)).Trim();
}
}
}
and there's no sample data
with which to test the code), it's hard to know whether that's reasonable
or not.

dell laptop m60 precision
Processor Intel(R) Pentium(R) M processor 1700MHz

Processor Speed 599 MHz

Memory (RAM) 1024 MB

Operating System Microsoft Windows XP Professional

Operating System Version 5.1.2600

I do note that you're maintaining a sorted dictionary in the middle of
processing, and then copying the whole thing to what appears to be a GUI
control. The update of the GUI control alone could easily account for
half the time spent or more, depending on just how many items are really
in there. (And if that is indeed a big part of the cost, and the "lbOrig"
represents a ListBox control, then it's /possible/ you could improve that
aspect by calling the BeginInit() method before starting the update
process, and EndInit() when the loop is done).

yes, "lbOrig" represents a ListBox control,
i'll look into those methods
for the timings above i commented that part out


The fact is, with the total execution time only being 421 milliseconds,
there may be a lot of fixed-cost embedded in there that makes the whole
thing seem slow compared to disk i/o speeds, but which would practically
disappear when processing a larger data set. Or it could just be that the
GUI updating is taking a lot of time. Either way, it seems to me that if
the data set is typical, then being able to complete the whole thing in
under half a second suggests that there's not really much need for a
performance improvement, even if the code is in fact less efficient than
it could be.

Bottom line for performance issues: you need to first start with specific
criteria that defines what constitutes acceptable performance. Then you
need to measure your program against those specific criteria. And finally,
if the program doesn't meet the criteria, you need to time specific
sections of code to figure out what in the program is actually occupying
the most time and is in need of the most attention (a profiler is the best
way to accomplish that, but you can also just use the Stopwatch class
judiciously to time specific sections of code and figure out where the
time is being spent).

Pete

thanks for the ideas
mark

a typical file looks something like:(as far as locations of comments (;) and
quotes (") that i'm trying to deal with)

;COMMENTED line with "quote in it"
(defun C:test ()
(setq string "this is a test line that starts
a quote with a ; semicolon in it
this is a test that ends the quote");and a comment after
(setq string2 "this is a test that has a quote \" and an escaped quote
inside it");and a comment after
)
(defun test2 ()
(setq string "this is a test line that starts
a quote with a ; semicolon in it
this is a test that ends the quote");and a comment after
(setq string2 "this is a test that has a quote \" and an escaped quote
inside it");and a comment after
)
(defun test3
()
(setq string "this is a test line that starts
a quote with a ; semicolon in it
this is a test that ends the quote");and a comment after
(setq string2 "this is a test that has a quote \" and an escaped quote
inside it");and a comment after
)
(defun test4()
(setq string "this is a string ending with single backslash \\");and comment
after
)
 
K

kndg

I'm using streamreader to open text files
I then look at each character in the file to detect certain patterns
it took 253 seconds
to parse 748 files
containing
563302 characters

then i commented out a few debug.print statements (which were called for
each file of course)
the time went down to 421 milliseconds

is that a normal impact from debug.print?
does that (421 milliseconds) sound like a reasonable amount of time to read
those characters?
or does it sound like further time optimizations should be sought out?

thanks
mark

Hi Mark,

Here, I would like to add some general advices...

When doing a timing, I would do it on a "release" build. That way, the
compiler will ignore any method that marked with Conditional["DEBUG"]
attribute, which include all the methods in the Debug class. So, you
don't have to manually commented out your Debug.Print statement.

If I have a choice between speed and code readability, I would choose
the later. If I didn't take good care of the readability of my code, I
may be understand it now but it might gaves me a puzzling mind when I
read it a few months later. After all, processor speed getting better
year after year, and creating a super performant code will take much
more time (which will impact the schedule) than creating a readable
code. Usually, base on my experience, when creating a readable code, the
optimization will naturally come up by itself, but not vice-versa.

For example,
- I would try to refactor the method to make it as slim as possible
- I would reduce the number of parameter the method takes
- I would reduce the deep of my if statement
- I would stick to naming guideline and make it consistent.

Just my two cents.
 
M

mp

Peter Duniho said:
[...]
Processor Intel(R) Pentium(R) M processor 1700MHz
Processor Speed 599 MHz

599 MHz? Is that right? I know many laptops, including Dells, will drop
the CPU speed when not plugged in. But I'm not used to seeing a nearly
2/3rds reduction in speed.

That said, if your computer really is running a 600Mhz, I'd say you're
pretty lucky to get the stuff to finish as fast as it does. :)

that came from xp's menu = Start | Help | Computer Information...
I just did that again and now it says
Processor Intel(R) Pentium(R) M processor 1700MHz

Processor Speed 1.66 GHz


I have no clue why it comes up different at different times...it was a
direct copy and paste from that dialog in xp
i did notice that with the longer times in my testing that the cpu fan would
come on once or twice...so i knew this poor antique notebook was really
having to work...:)

[...]
a typical file looks something like:(as far as locations of comments (;)
and
quotes (") that i'm trying to deal with) [...]

I'm familiar with Lisp. The question of sample data is more about having
a data set with which to run the program that is the same as what you're
using for your tests. You could only provide that by uploading it to a
web or FTP server somewhere and providing the download link.

Pete

well i don't have a website or ftp site and it's not important enough that
anyone would want to waste their time on testing against the actual data.
the general input you have provided is really what I was looking for and
appreciate all your comments and insights.
thanks again
mark
 
M

mp

kndg said:
I'm using streamreader to open text files snip

Hi Mark,

Here, I would like to add some general advices...

When doing a timing, I would do it on a "release" build. That way, the
compiler will ignore any method that marked with Conditional["DEBUG"]
attribute, which include all the methods in the Debug class. So, you don't
have to manually commented out your Debug.Print statement.

cool, thanks
If I have a choice between speed and code readability, I would choose the
later. If I didn't take good care of the readability of my code, I may be
understand it now but it might gaves me a puzzling mind when I read it a
few months later. After all, processor speed getting better year after
year, and creating a super performant code will take much more time (which
will impact the schedule) than creating a readable code. Usually, base on
my experience, when creating a readable code, the optimization will
naturally come up by itself, but not vice-versa.

i too am always *Attempting* to write readable code for exactly those
reasons.
doesn't mean i *succeed* at it!

do you have any specific areas you're referring to in what I posted that
violated the principles you mention below?
any input is appreciated.
For example,
- I would try to refactor the method to make it as slim as possible
- I would reduce the number of parameter the method takes
- I would reduce the deep of my if statement
- I would stick to naming guideline and make it consistent.

Just my two cents.

thanks
mark
 
M

mp

mp said:
Peter Duniho said:
[...]
Processor Intel(R) Pentium(R) M processor 1700MHz
Processor Speed 599 MHz

599 MHz? Is that right? I know many laptops, including Dells, will drop
the CPU speed when not plugged in. But I'm not used to seeing a nearly
2/3rds reduction in speed.

That said, if your computer really is running a 600Mhz, I'd say you're
pretty lucky to get the stuff to finish as fast as it does. :)

that came from xp's menu = Start | Help | Computer Information...
I just did that again and now it says
Processor Intel(R) Pentium(R) M processor 1700MHz
Processor Speed 1.66 GHz

I have no clue why it comes up different at different times...it was a
direct copy and paste from that dialog in xp

bizarre
i just tried it again and it's back to 599mhz
oh, and i almost always run it plugged in, so it wasn't unplugged at the
time either.
mark
 
M

mp

mp said:

ok I changed one of the functions from the previous post to this...does it
seem more readable to you? (its definitely shorter :) ... and while
studying it to see how i could refactor i saw i had an old unused argument
left over from previous iterations...)
(declarations at class level)
const char quote = '"';
const char backslash = '\\';

private void SetQuoteState(char currentCharacter, char preceedingCharacter,
char beforePreceedingCharacter, ref Boolean _amInQuote)
{
if (((currentCharacter == quote) && (preceedingCharacter ==
backslash) && (beforePreceedingCharacter == backslash))||((currentCharacter
== quote) && (preceedingCharacter != backslash)))
{ _amInQuote = !_amInQuote; }
}
-previous version -
private void SetQuoteState(char currentCharacter, int charPos, char
preceedingCharacter, char beforePreceedingCharacter, ref Boolean _amInQuote)
{
if (currentCharacter == quote)
{
if (_amInQuote == true)
{
if (preceedingCharacter != backslash)
{
_amInQuote = false;
//Debug.Print("End quote");
}
else if (preceedingCharacter == backslash)
{
//if char before preceedingchar is backslash
then backslash is backslashed
if (beforePreceedingCharacter == backslash )
{
_amInQuote = false;
//Debug.Print("End quote");
}
}
}
else//Debug.Print("not Inside quote");
{
_amInQuote = true;
//Debug.Print("Start quote");
}
}//if current char is quote
}//end SetQuoteState
 
M

mp

another refactoring question...
interested in your opinion

kndg said:
On 11/10/2010 1:03 PM, mp wrote: snip
If I have a choice between speed and code readability, I would choose the
later. snip
For example,
- I would try to refactor the method to make it as slim as possible
- I would reduce the number of parameter the method takes
- I would reduce the deep of my if statement
- I would stick to naming guideline and make it consistent.

Just my two cents.

am I going in the right direction???

i refactored this:

for (int CharPos = 0; CharPos < inputString.Length; CharPos++)
{
currentCharacter=inputString[CharPos];

if (_amInQuote != true)
{
//semi colon is only comment if not inside a quoted
string""
if (currentCharacter == comment)
{ _amInComment = true;
//Debug.Print("Start comment");
}


//only process quote that are not commented out
if (_amInComment != true)
{
SetQuoteState3(currentCharacter,
preceedingCharacter, beforepreceedingCharacter, ref _amInQuote);
}
}
else//amInQuote is true
{
SetQuoteState3(currentCharacter,
preceedingCharacter, beforepreceedingCharacter, ref _amInQuote);
}

if (_amInComment == true)
{
sbCommentsThisLine.Append(currentCharacter);
//Debug.Print("comment " + currentCharacter);
}
else
{
sbCommentsStrippedThisLine.Append(currentCharacter);
}

beforepreceedingCharacter = preceedingCharacter;
preceedingCharacter = currentCharacter;
}//end for

into this:

for (int CharPos = 0; CharPos < inputString.Length; CharPos++)
{
currentCharacter = inputString[CharPos];

//semi colon is only comment if not inside a quoted string""
if ((_amInQuote != true) && (currentCharacter == comment))
{
_amInComment = true;
//Debug.Print("Start comment");
}

if (_amInComment == true)
{
sbCommentsThisLine.Append(currentCharacter);
//Debug.Print("comment " + currentCharacter);
}
else
// only process quote that are not commented out
{
SetQuoteState3(currentCharacter,
preceedingCharacter, beforepreceedingCharacter, ref _amInQuote);
sbCommentsStrippedThisLine.Append(currentCharacter);
}//if not in comment

beforepreceedingCharacter = preceedingCharacter;
preceedingCharacter = currentCharacter;
}//end for
 
M

mp

"lbOrig" represents a ListBox control, then it's /possible/ you could
improve that aspect by calling the BeginInit() method before starting the
update process, and EndInit() when the loop is done).

that took loading listbox time from 20 ms to 10 ms, nice.
not that it's noticable of course but theory is fun :)
what's interesting is the code that fills the listbox and times the filling
and reporting the results in the form.title all runs at least a second
before the listbox is visually refreshed. so it tells me how long it took
to fill it before i see it get filled <g>
mark
 
K

kndg

[...]
do you have any specific areas you're referring to in what I posted that
violated the principles you mention below?
any input is appreciated.

Okay, if you don't mind...

cLispFile class
---------------
1. GetCommentFromLine() method can be shorther. You can move the code
inside your for-loop into another method.
2. GetDefunsFromString() also can be shorther. You can move the code
inside your while-loop into another method.
3. SetQuoteState() have too many parameter (five!). In another post, you
had reduced it to four, but I feel that "_amInQuote" parameter is
unnecessary (I think "_amInQuote" is your class private member, so there
is no need to pass around). So, that would reduce it to three.
4. SetQuoteState() have a very deep if-statement and I see you have made
it shorther in another post. Another way to reduce the deep is invert
it. For example,

original : if (currentCharacter == quote) { .. }
inverted : if (currentCharacter != quote) return;

5. There are a lot of inconsistent naming in your code. I think you
should read the Microsoft naming guideline
(http://msdn.microsoft.com/en-us/library/ms229002.aspx)

UI class (WPF)
--------------

In general, I would separate code that are not doing the UI work into
another class. So, in your button3_Click() method, I would segregate
which part doing the UI work and which part doing none-UI work and put
it into separate class.

Side note:
----------
1. Refactoring is very *dangerous*. It may break otherwise workable
code. It would be advised to have your unit test ready before doing the
refactoring work.
2. I see a lot of people doing this: there is no point to compare the
booleans. For example, in your ReadFile() method,

this is redundant => if (_FileNotFound == false)
use this instead => if (!_FileNotFound)
or better, change the name and the meaning => if (isFileFound)

3. If using other than the express version of Visual Studio, I would
highly recommend the Resharper tool from JetBrain
(http://www.jetbrains.com/resharper/). It make your code reads better.

Okay, that would be my advices at the moment. Maybe someone could offer
some better advices.

Happy coding!
 
K

kndg

[...]
private void SetQuoteState(char currentCharacter, char preceedingCharacter,
char beforePreceedingCharacter, ref Boolean _amInQuote)
{
if (((currentCharacter == quote)&& (preceedingCharacter ==
backslash)&& (beforePreceedingCharacter == backslash))||((currentCharacter
== quote)&& (preceedingCharacter != backslash)))
{ _amInQuote = !_amInQuote; }
}

Okay, that would read better. How about below,

if (currentCharacter != quote) return;

if (preceedingCharacter != backslash || beforePreceedingCharacter ==
backslash)
{
_amInQuote = !_amInQuote;
}

Also, please read my previous comment on "_amInQuote" variable and the
side note on refactoring.
 
K

kndg

[...]
into this:

for (int CharPos = 0; CharPos< inputString.Length; CharPos++)
{
currentCharacter = inputString[CharPos];

//semi colon is only comment if not inside a quoted string""
if ((_amInQuote != true)&& (currentCharacter == comment))
{
_amInComment = true;
//Debug.Print("Start comment");
}

if (_amInComment == true)
{
sbCommentsThisLine.Append(currentCharacter);
//Debug.Print("comment " + currentCharacter);
}
else
// only process quote that are not commented out
{
SetQuoteState3(currentCharacter,
preceedingCharacter, beforepreceedingCharacter, ref _amInQuote);
sbCommentsStrippedThisLine.Append(currentCharacter);
}//if not in comment

beforepreceedingCharacter = preceedingCharacter;
preceedingCharacter = currentCharacter;
}//end for

Okay, I see that you had made it shorther, but I can't comment on the
code logic. The only way to tell it (whether the modification is right
or wrong) is by unit test. And, anything inside the loop is a chance for
refactoring it into another method.

Regards.
 
M

mp

kndg said:
[...]
do you have any specific areas you're referring to in what I posted that
violated the principles you mention below?
any input is appreciated.

Okay, if you don't mind...
snip
Okay, that would be my advices at the moment. Maybe someone could offer
some better advices.

Happy coding

Many thanks, appreciate the pointers
mark
 
M

mp

kndg said:
[...]
private void SetQuoteState(char currentCharacter, char
preceedingCharacter,
char beforePreceedingCharacter, ref Boolean _amInQuote)
{
if (((currentCharacter == quote)&& (preceedingCharacter ==
backslash)&& (beforePreceedingCharacter ==
backslash))||((currentCharacter
== quote)&& (preceedingCharacter != backslash)))
{ _amInQuote = !_amInQuote; }
}

Okay, that would read better. How about below,

if (currentCharacter != quote) return;

if (preceedingCharacter != backslash || beforePreceedingCharacter ==
backslash)
{
_amInQuote = !_amInQuote;
}

very cool, thanks
Also, please read my previous comment on "_amInQuote" variable and the
side note on refactoring.

yes, you're right, don't know why i was passing that around as an
arg...duh!!
:)
thanks
mark
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top