Regex to get VB6 function definitions

N

nassegris

Hello everyone!

I'm trying to write a regular expression to capture VB6 function
definitions and I'm abit stuck. The rules are:

Function header:
* Must contain the words SUB or FUNCTION
* May contain the words PUBLIC or PRIVATE
* May NOT contain the word DECLARE
* Ends at the first newline that is not preceded by an underscore (_)
followed by a space.

Function footer
* Must contain the words END FUNCTION or END SUB only

Function exit:
* Must contain the words EXIT FUNCTION or EXIT SUB only

The idea is that I want to insert some logging code after each
function header and some logging code before each exit or end. So what
I need is a regex that would capture whole functions with groups for:
* Whole function header
* Function name
* Function parameter list

So far I've come up with
(?:^\s*(?:public|private){0,1}\s(?:sub|function)\s) (?<FunctionName>[\w
\d]+) (?<FunctionArgs>(?:\s*.*_\s$)*(?:\s*.*$) | (?:.*$)) # FUNCTION
HEADER
and
(^\s*(?:end|exit)\s(?:sub|function)).*$ # FUNCTION FOOTER

but I don't really know how to combine them to get the complete
function back with the exits and ends tagged properly.

Could someone plese help me out with some tips?

Regards,
 
N

Nicholas Paldino [.NET/C# MVP]

To be honest, I don't know how you would use a regex for this, and I
hope someone else can help you on that front.

Given that you want the task of processing a number of vb files, have
you thought about writing an add-in for the VB6 environment? You can do so
in .NET, and access the specific components of the code files (functions,
properties, etc, etc) through the object model. It might make it a lot
easier than parsing it all apart using regular expressions.
 
J

Jesse Houwing

I'll try to help you out here, but making use of a tool like DxCore integrated
in Visual studio migth be teh better solution for this.

to start:
^(?<declaration>\s*((?:public|private)\s+)?(?<type>sub|function)\b(?:.|_\r?\n)*)$

from there you want everything till end sub/function (same as before), but
you want to treat an exit differently. This will make up the body

(?<body>(?s:(?!(?:end|exit)\s+(?:\k{type}\b)).|(?<exit>exit\s+\k{type}))*)

followed by the actual end.

(?<end>end\s+\k{type}\b))

putting it all together

^(?<declaration\s*((?:public|private)\s+)?(?<type>sub|function)\b(?:.|_\r?\n)*)$(?<body>(?:(?!(?:end|exit)\s+\k{type}\b).|(?<exit>exit\s+\k{type}))*)(?<end>(end\s+\k{type}\b))
Multiline ON
Ignorecase ON

Watch the wrapping.

I haven't tested it. But I'll explain what I've done in each part.

^(?<declaration>\s*((?:public|private)\s+)?(?<type>sub|function)\b(?:.|_\r?\n)*)$

^ -- beginning of the string
(?<declaration> -- capture all in a group named declaration
\s*((?:public|private)\s+)? -- optional whitespace followed by prublic or
private.
(?<type>sub|function) -- capture the type in a named group. We'll need it
to find the end function.
\b -- make sure we didn't find functionally or substiture or any other word
that starts with either sub or function
(?:.|_\r?\n)* -- with singleline off, . doesn't match a newline. So only
match newlines if they're preceded by a _ to capture the rest of the function
declaration.
) -- end of declaration capture
$ -- end of the line.

Now the body

(?<body>(?s:(?!(?:end|exit)\s+\k{type}\b).|(?<exit>exit\s+\k{type}))*)

(?<body> -- capture all in a group named body
(?s: -- set singleline, we need . to match newline to get the whole body
(?!(?:end|exit)\s+(?:\k{type})). -- find any character that isn't the start
of end sub, end function, exit sub or exit function.
| -- or
(?<exit>exit\s+\k{type}) -- find an exit point. This in combination with
the previous part of this group captures all options in the body.
)* -- repeat to find all parts of the body containing either normal code
or exit points
) -- end of body capture

Because the body regex will stop if it finds end sub or end function we still
need to capture thet

(?<end> -- capture the end in a group named end
end\s+ -- find the word end followed by at least a whitespace
\k{type} -- find the type of function we're capturing (sub or function)
\b -- make sure it's just that word we found
) -- end capture

Because the 'exit' group will be captured multiple times, it will have multipel
values in the group. You can access these as follows:
string declaration, type, body, end, exit;
Regex rx = new regex(...);
Match m = rx.Match(input);
if (m.Success)
{
declaration = m.Groups["declaration"].value;
type = m.Groups["type"].Value;
body = m.Groups["body"].Value;
end = m.Groups["end"].Value;
foreach (Capture c in m.Groups["exit"].Captures)
{
exit = c.Value;
}
}

Though if you want to do string replacements it might be much easier to just
look for the seperate parts and use a MatchEvaluator and Regex.Replace to
do the work for you. Much easier than piecing the whole thign back together
from the Match object.

^(?<declaration>\s*((?:public|private)\s+)?(?<type>sub|function)\b(?:.|_\r?\n)*)$
|(?<end>end\s+(?:function|sub))|(?<exit>exit\s+(?:function|sub))
Multiline ON
IgnoreCase ON

and then do the following:

Regex rx = new Regex(...);
rx.Replace (input, new MatchEvaluator(this.Writefunctions));

private string Writefunctions(Match m)
{
if (m.Groups["declaration"].Success)
{
return m.Groups["declaration"].Value + "\r\n" + START_FUNCTION_TEXT;
} else if (m.Groups["exit"].Success)
{
return EXIT_FUNCTION_TEXT + "\r\n" + m.Groups["exit"].value;
}
else if (m.Groups["end"].Success)
{
return END_FUNCTION_TEXT + "\r\n" + m.Groups["end"].value;
}
else
{
return m.Value;
}
}

this should do the whole trick.


Kind Regards,


Jesse Houwing


Hello Nicholas Paldino [.NET/C# MVP],
To be honest, I don't know how you would use a regex for this, and
I hope someone else can help you on that front.

Given that you want the task of processing a number of vb files,
have you thought about writing an add-in for the VB6 environment? You
can do so in .NET, and access the specific components of the code
files (functions, properties, etc, etc) through the object model. It
might make it a lot easier than parsing it all apart using regular
expressions.

Hello everyone!

I'm trying to write a regular expression to capture VB6 function
definitions and I'm abit stuck. The rules are:

Function header:
* Must contain the words SUB or FUNCTION
* May contain the words PUBLIC or PRIVATE
* May NOT contain the word DECLARE
* Ends at the first newline that is not preceded by an underscore (_)
followed by a space.
Function footer
* Must contain the words END FUNCTION or END SUB only
Function exit:
* Must contain the words EXIT FUNCTION or EXIT SUB only
The idea is that I want to insert some logging code after each
function header and some logging code before each exit or end. So
what
I need is a regex that would capture whole functions with groups for:
* Whole function header
* Function name
* Function parameter list
So far I've come up with
(?:^\s*(?:public|private){0,1}\s(?:sub|function)\s)
(?<FunctionName>[\w
\d]+) (?<FunctionArgs>(?:\s*.*_\s$)*(?:\s*.*$) | (?:.*$)) # FUNCTION
HEADER
and
(^\s*(?:end|exit)\s(?:sub|function)).*$ # FUNCTION FOOTER
but I don't really know how to combine them to get the complete
function back with the exits and ends tagged properly.

Could someone plese help me out with some tips?

Regards,
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top