Find & Replace in Multi-Megabyte Strings

R

R Avery

This is a re-post of "Batch Replace function for Large strings"

String processing in VBA is very slow when strings are large (1-500
MB). I have a
function that I've been using for doing batch replace ops (in the above
referenced previous post), but
it chokes on large strings with many replacements to do (like 50),
because it has to do 50 passes of the string to perform the
replacements.

Has anyone written a fast function designed to do the same thing for
large strings but only makes one pass through the data?
 
K

keepITcool

Rex Avery... REGEX Avery should be your name!


Have a look at regex. (regular expressions)

Following test builds a 28million character teststring, it will still
loop thru all the replacements, but regex replace is a "trifle" faster
than vba's.

Following example takes 20 seconds for the 6 replacements.
(but note that each replacement occurs 1 million times!

The example uses late bound code, it's a little bit faster when you
reference "Microsoft VBScript Regular Expression 5.5" and dim the rgx
variable as RegExp.

You'll have to learn some regex patterns though!
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/script5
6/html/vspropattern.asp

or try and buy RegExBuddy. (be aware that regexbuddy supports more
advanced flavors than vbscripts regex)

Sub TestRegExReplace()

Dim str$, arrPat, arrRep

Dim rgx As Object
Dim t!, n&
'This is slow but nevermind..
'it builds a string of 28 000 000 chars.

str = "This is the string I have." & vbLf
For n = 1 To 20
str = str & str
Next
Debug.Print Len(str)

arrPat = Array("This", "is", "the", "string", "I", "have")
arrRep = Array("That", "was", "that", "text", "you", "had")

t = Timer
Set rgx = CreateObject("vbscript.regexp")
With rgx
.Global = True
.IgnoreCase = True
For n = LBound(arrPat) To UBound(arrPat)
.Pattern = "\b" & arrPat(n) & "\b"
str = .Replace(str, arrRep(n))
Next
End With

t = Timer - t
MsgBox Left(str, InStr(50, str, vbLf)) & "in " & t & "seconds"

End Sub


HTH... (but i'm fairly sure it does)




--
keepITcool
| www.XLsupport.com | keepITcool chello nl | amsterdam


R Avery wrote :
 
R

R Avery

Cool! The RegEx (early bound) in my tests is 4-5x faster than VBA's
replace. However, I think that this might even be able to be improved
more... I think it should be possible to do all of the find/replaces
in a single pass of the string, further improving the speed by a factor
of however many replaces there are to do.

Since doing 6 search and replaces requires calling the Regex replace
function 6 times, it is iterating over the string 6 times, but it
should really only need to do it once.

Perhaps some sort of clever regex could combine all of the
search/replace ops into a single call of .Replace.

Or perhaps a custom procedure would be better... i'll make a post
if/when i find the answer i'm looking for.

Thanks for the idea! And for now, i'll take the 5x improvement ;D
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top