need help on regular expression.

G

Guest

I need to replace the ascII strings in VC++ source code with unicode
compatiable strings. That is I want to replace "abc" with _T("abc") excluding
the strings in #include line or already in _T("..").

I have regular expression

1. {[^_T\(]"([^"\\]*(\\.[^"\\]*)*)"} // get strings without '_T(' prefix
2. {[^include ]"([^"\\]*(\\.[^"\\]*)*)"} //get strings without 'include '
prefix.

how can I get the intersection of these two sets so matched strings can be
replace by '_T(\1)'?
 
J

Jesse Houwing

* Paul Wu wrote, On 26-7-2007 20:48:
I need to replace the ascII strings in VC++ source code with unicode
compatiable strings. That is I want to replace "abc" with _T("abc") excluding
the strings in #include line or already in _T("..").

I have regular expression

1. {[^_T\(]"([^"\\]*(\\.[^"\\]*)*)"} // get strings without '_T(' prefix
2. {[^include ]"([^"\\]*(\\.[^"\\]*)*)"} //get strings without 'include '
prefix.

how can I get the intersection of these two sets so matched strings can be
replace by '_T(\1)'?


To make sure #include isn't on the line use a negative look behind:

(?<!#include.*)

To make sure you're not already in a _T"..." use a look around as well:

(?<!_T")

All other strings can be replaced (don't match the newline either,
because a string can't span lines in C++ as far as I know):

"[^"\n]*"

Combine:

(?<!#include.*)(?<!_T)"[^\n"]*"

One thing you haven't looked at is an escaped ", you'll probably need to
escape those as well. That's a bit harder as \\\" is an escaped quote,
but \\\\" isn't. There isn't really a regex way for that as far as I
know. You could try:

((^|[^\\])(\\\\)*\\"

Which would lead to:

(?<!#include.*)(?<!_T)"((^|[^\\])(\\\\)*\\"|[^"\n])*"

Which does the trick.

This regex is written using the System.Text.RegularExpressions syntax
and won't work in the Visual Studio find & replace window. You could
probably write a simple commandline tool to do the trick.

Jesse
 
G

Guest

Thanks a lot. This works well -- there is an VS 2003 addin-on
(http://www.codeproject.com/csharp/SearcherAddIn.asp) that can be used,
alberit it is buggish.
Developer


Jesse Houwing said:
* Paul Wu wrote, On 26-7-2007 20:48:
I need to replace the ascII strings in VC++ source code with unicode
compatiable strings. That is I want to replace "abc" with _T("abc") excluding
the strings in #include line or already in _T("..").

I have regular expression

1. {[^_T\(]"([^"\\]*(\\.[^"\\]*)*)"} // get strings without '_T(' prefix
2. {[^include ]"([^"\\]*(\\.[^"\\]*)*)"} //get strings without 'include '
prefix.

how can I get the intersection of these two sets so matched strings can be
replace by '_T(\1)'?


To make sure #include isn't on the line use a negative look behind:

(?<!#include.*)

To make sure you're not already in a _T"..." use a look around as well:

(?<!_T")

All other strings can be replaced (don't match the newline either,
because a string can't span lines in C++ as far as I know):

"[^"\n]*"

Combine:

(?<!#include.*)(?<!_T)"[^\n"]*"

One thing you haven't looked at is an escaped ", you'll probably need to
escape those as well. That's a bit harder as \\\" is an escaped quote,
but \\\\" isn't. There isn't really a regex way for that as far as I
know. You could try:

((^|[^\\])(\\\\)*\\"

Which would lead to:

(?<!#include.*)(?<!_T)"((^|[^\\])(\\\\)*\\"|[^"\n])*"

Which does the trick.

This regex is written using the System.Text.RegularExpressions syntax
and won't work in the Visual Studio find & replace window. You could
probably write a simple commandline tool to do the trick.

Jesse
 
J

Jesse Houwing

* Paul Wu wrote, On 27-7-2007 17:34:
Thanks a lot. This works well -- there is an VS 2003 addin-on
(http://www.codeproject.com/csharp/SearcherAddIn.asp) that can be used,
alberit it is buggish.
Developer

I'll start nagging DevExpress to add a feature to CodeRush.. my
favorite add-in for Visual Studio.

Jesse
Jesse Houwing said:
* Paul Wu wrote, On 26-7-2007 20:48:
I need to replace the ascII strings in VC++ source code with unicode
compatiable strings. That is I want to replace "abc" with _T("abc") excluding
the strings in #include line or already in _T("..").

I have regular expression

1. {[^_T\(]"([^"\\]*(\\.[^"\\]*)*)"} // get strings without '_T(' prefix
2. {[^include ]"([^"\\]*(\\.[^"\\]*)*)"} //get strings without 'include '
prefix.

how can I get the intersection of these two sets so matched strings can be
replace by '_T(\1)'?

To make sure #include isn't on the line use a negative look behind:

(?<!#include.*)

To make sure you're not already in a _T"..." use a look around as well:

(?<!_T")

All other strings can be replaced (don't match the newline either,
because a string can't span lines in C++ as far as I know):

"[^"\n]*"

Combine:

(?<!#include.*)(?<!_T)"[^\n"]*"

One thing you haven't looked at is an escaped ", you'll probably need to
escape those as well. That's a bit harder as \\\" is an escaped quote,
but \\\\" isn't. There isn't really a regex way for that as far as I
know. You could try:

((^|[^\\])(\\\\)*\\"

Which would lead to:

(?<!#include.*)(?<!_T)"((^|[^\\])(\\\\)*\\"|[^"\n])*"

Which does the trick.

This regex is written using the System.Text.RegularExpressions syntax
and won't work in the Visual Studio find & replace window. You could
probably write a simple commandline tool to do the trick.

Jesse
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top