Count Instances Of String Within String

  • Thread starter Thread starter Derek Hart
  • Start date Start date
D

Derek Hart

Is there an efficient line of code to count the number of instances of one
string within another.

If I have the sentence:
"I want to go to the park, and then go home."

It would give me a count of 2 for the word "go"

Derek
 
use the string compare method

Dim s As String = "I want to go to the park, and then go home."
MsgBox((String.Compare(s, "go") + 1).ToString & " Occurances of go")

regards

Michel Posseth [MCP]
 
I don't believe there is a way to put it in one line (at least not in
VB.NET, other languages might have a built-in method that does this or
possibly give you the ability to put multiple commands on one line), but
here is a simple loop that does what you want in VB.NET:


Dim teststring As String = "I want to go to the park, and then go home."
Dim i As Integer = -1
Dim stringcount As Integer = 0
While teststring.IndexOf("go", i + 1) <> -1
stringcount += 1
i = teststring.IndexOf("go", i + 1)
End While


You can also write it as a function, which I would strongly suggest if you
plan on doing this more than once:


Public Function CountInstances(ByVal lookfor As String, ByVal lookin As
String) As Integer
Dim stringcount As Integer = 0
Dim i As Integer = -1
While lookin.IndexOf(lookfor, i + 1) <> -1
stringcount += 1
i = lookin.IndexOf(lookfor, i + 1)
End While
Return stringcount
End Function


Writing it as a function will require the few lines of code to write the
function, but after that you can call it from just one line, making your
code simpler to write and debug:


stringcount = CountInstances("go", teststring)


If you have any questions, feel free to ask. Good Luck!
 
Derek,

You could also use regular expressions to match the instances of a
string inside another string. I *believe* the pattern I use returns
all instances of 'go' surrounded by spaces, but then again I'm not very
good with RegEx yet.


Imports System.Text.RegularExpressions

Module main
Sub main()

Dim strtoSearch As String = "I want to go to the park, and then
go home."
Console.WriteLine(GetStringOccurences(strtoSearch,
"go").ToString)
Console.ReadLine()

End Sub

Private Function GetStringOccurences(ByVal searchString, ByVal
searchWord) As Integer

Dim r As New Regex(String.Format("\s{0}\s", searchWord))

Return r.Matches(searchString).Count

End Function

End Module
 
Ahum :-(

Embarased mode :

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As
System.EventArgs) Handles Button1.Click
Dim s As String = "I want to go to the park, and then go home."
MsgBox(countsubstrings(s, "go"))

End Sub
Function countsubstrings(ByVal source As String, ByVal search As String)
As Integer
Dim count As Integer = -1
Dim index As Integer = -1
Do
count += 1
index = source.IndexOf(search, index + 1)
Loop Until index < 0
Return count
End Function

this wil work




Michel Posseth said:
use the string compare method

Dim s As String = "I want to go to the park, and then go home."
MsgBox((String.Compare(s, "go") + 1).ToString & " Occurances of
go")

regards

Michel Posseth [MCP]


Derek Hart said:
Is there an efficient line of code to count the number of instances of
one string within another.

If I have the sentence:
"I want to go to the park, and then go home."

It would give me a count of 2 for the word "go"

Derek
 
Dim str$ = "I want to go to the park, and then go home."
Dim findstr$ = "go"
Dim wordcount% = (Len(str) - Len(Replace(str, findstr, ""))) / Len(findstr)
MsgBox(wordcount)
 
Derek,

As once tested in this newsgroup is this the fastest method for that (I have
changed the fieldnames now so watch that).
\\\
Public Function CountString(ByVal SearchItem As String, ByVal ToCountString
_
As String) As Integer
Dim Start as Integer = 1
Dim Count as Interger = 0
Dim Result as Interger
Do
Result = InStr(Start, SearchItem, ToCountString)
If Result = 0 Then Exit Do
Count += 1
Start = Result + 1
Loop
Return Count
End Function
///

If the ToCountString becomes a ToCountChar than there are better methods.

I hope this helps,

Cor
 
Cor ...

surely the following will be much faster.
string tofind = "test"
string foo = "testtest footestfoo";
string changed = foo.Replace(tofind, "");
return (foo.length - changed.length) / tofind.length

Cheers,

Greg Young
MVP - C#
 
Greg,

I am sure it will not.
surely the following will be much faster.
string tofind = "test"
string foo = "testtest footestfoo";
string changed = foo.Replace(tofind, "");
return (foo.length - changed.length) / tofind.length
Although some little changes can make that it probably does.

:-)

I said I thought yesterday already that I found it a nice idea.

I forgot that with some changes you can use it for this as well.

Cor
 
It does probably, I misreaded something, I am testing it, what it real means
because I am in doubt about some side effects.

Cor
 
I am a strong believer in the pragmatic programmer idea of ... if in doubt
test it ...

added a loop to run 100 tests, countstring2 beat countstring 100/100 times
... even with varying data counts (including no data)... perhaps you have
some code shoing the opposite?

Cheers,

Greg Young
MVP - C#

Module Module1

Public Function CountString(ByVal SearchItem As String, ByVal
ToCountString As String) As Integer
Dim Start As Integer = 1
Dim Count As Integer = 0
Dim Result As Integer
Do
Result = InStr(Start, SearchItem, ToCountString)
If Result = 0 Then Exit Do
Count += 1
Start = Result + 1
Loop
Return Count
End Function

Public Function CountString2(ByVal SearchItem As String, ByVal
ToCountString As String) As Integer
Dim tmp As String = SearchItem.Replace(ToCountString, "")
Return (SearchItem.Length - tmp.Length) / ToCountString.Length
End Function

Sub Main()
Dim ToCountString As String = "test"
Dim SearchItem As String = "testtesttestfootesttesttest"
Dim i As Integer
Dim starttime As DateTime = DateTime.Now
Dim endtime As DateTime
For i = 0 To 1000000
CountString(SearchItem, ToCountString)
Next
endtime = DateTime.Now
Console.WriteLine("CountString - " & (endtime -
starttime).ToString())
starttime = DateTime.Now
For i = 0 To 1000000
CountString2(SearchItem, ToCountString)
Next
endtime = DateTime.Now
Console.WriteLine("CountString2 - " & (endtime -
starttime).ToString())

End Sub

End Module
 
The problem exists in both methods in use ..

If I want to look for the word "GO" and I also have worgs like gong or pogo,
they will be detected as being instances of the word, an easy way to work
around this is to pass in spaces i.e. " go " but then I will not detect
patterns such as "lets go!" because there is no traling space or "go to the
beach" because there is no leading space.

It is these items that make the implementation of an algorithm like this
tricky.

Cheers,

Greg Young
MVP - C#
 
Greg,

The problem as you wrote was my idea too, however that side effect was not
there in my simple test. 100* a string "Cor Greg GregCor CorGreg ", that I
tested 100.000 times each time both methods and counting the time.

The method with the replace beats the methode with the moving instr at least
about 5:7.

Both methods are therefore quicker than any I have seen until now, while
that replace method is for me now the fastest.

Cor
 
The replace method might be more efficient for short strings, but the
method using InStr (or IndexOf) scales better, as it doesn't create
another string that is almost as big as the string being searched.
 
Goran,

Indexof with strings is twice as slow as InStr.

With char it beats InStr that it is not to mention, but it is than of course
comparing apples with pears, because InStr(char) does not exist.

Cor
 
You are correct Goran.

Cheers,

Greg
Göran Andersson said:
The replace method might be more efficient for short strings, but the
method using InStr (or IndexOf) scales better, as it doesn't create
another string that is almost as big as the string being searched.
 
That is neatly handled by a regular expression. The /b code matches a
word boundary.

RegEx re = new RegEx("/b" + RegEx.Escape(tofind) + "/b");
int found = re.Matches(foo).Count;
 
If you have come to that conclusion I think that you haven't used them
in the same way, as both internally calls
CurrentCulture.CompareInfo.IndexOf. InStr only has more overhead.
 
Goran,

Did you test it, I had before I tested it the same idea as you. Instr is a
very simple instruction, while indexof has more complex posibilities. Those
should be tested of course.

And as I said not with char.

Cor
 
Thanks for this, Goran. I had posted something similar earlier, but I
had used the "/s" which did not correctly catch instances of the string
at the start or end of a sentence.

Is the RegEx.Escape( ) there to account for any characters in your
toFind variable that are RegEx operators?
 
Back
Top