Regular expressions

Z

Zach

In order to remove squiggles from a string I loop through the characters
and throw out those outside a permitted value. Can this be done better
with regular expressions? My current knowledge of regular expressions
only covers finding out whether a sting argument is present in another
string. I have tried the Internet but couldn't find an example or a
reference. Can it be done? Might you have a reference?
 
G

Gene Wirchenko

The Regex class can be used to remove things from strings. But they are
most applicable when the pattern being removed is complex enough to
justify the use of regular expressions.

I don't know what a "squiggle" is, but you can easily remove any single
string literal from another string with the String.Replace() method.
Just specify the string to be removed as the first argument, and the
empty string as the second.

Or inversely, keep the characters you want with something like:
String DeleteThese=OldString.Replace("_","");
String NewString=OldString.Replace(DeleteThese,"");
replace the underscore with the set of characters that you wish to
keep (say, alphanumerics).

The first line deletes the characters that you want to keep to
give you a hitlist for the second line.

Sincerely,

Gene Wirchenko
 
R

Registered User

In order to remove squiggles from a string I loop through the characters
and throw out those outside a permitted value. Can this be done better
with regular expressions?
What you describe as a squiggle is likely a tilde i.e. ~. Depending
upon the specifics of the task I would consider using either String
class methods or regular expressions.
My current knowledge of regular expressions
only covers finding out whether a sting argument is present in another
string. I have tried the Internet but couldn't find an example or a
reference. Can it be done? Might you have a reference?

I have found this site to be a good reference
http://www.regular-expressions.info/

regards
A.G.
 
Z

Zach

Or inversely, keep the characters you want with something like:
String DeleteThese=OldString.Replace("_","");
String NewString=OldString.Replace(DeleteThese,"");
replace the underscore with the set of characters that you wish to
keep (say, alphanumerics).

The first line deletes the characters that you want to keep to
give you a hitlist for the second line.

Sincerely,

Gene Wirchenko

For the sake of the problem statement: I want to
remove everything form a string except characters a - z.
Like the quasi code: new_string = old_string.Remove(!(a-z),"");
This can be done in a loop, throwing out the unwanted
characters but is there a "one liner" that could do
the trick more efficiently like the quasi code?
 
Z

Zach

For the sake of the problem statement: I want to
remove everything form a string except characters a - z.
Like the quasi code: new_string = old_string.Remove(!(a-z),"");
This can be done in a loop, throwing out the unwanted
characters but is there a "one liner" that could do
the trick more efficiently like the quasi code?
Sorry read Replace in stead of Remove
 
G

Gene Wirchenko

On 11/24/2011 11:41 PM, Gene Wirchenko wrote:
[snip]
Or inversely, keep the characters you want with something like:
String DeleteThese=OldString.Replace("_","");
String NewString=OldString.Replace(DeleteThese,"");
replace the underscore with the set of characters that you wish to
keep (say, alphanumerics).

The first line deletes the characters that you want to keep to
give you a hitlist for the second line.
For the sake of the problem statement: I want to
remove everything form a string except characters a - z.
Like the quasi code: new_string = old_string.Remove(!(a-z),"");
This can be done in a loop, throwing out the unwanted
characters but is there a "one liner" that could do
the trick more efficiently like the quasi code?

Two lines is not really that long. Since I know C# slightly, I
tried to keep it clear. Would this do it?
theString=theString.Replace(theString.Replace("_",""),"");
Again, replace the underscore with the characters that you want to
keep.

Sincerely,

Gene Wirchenko
 
A

Arne Vajhøj

Sorry read Replace in stead of Remove

Try:

new_string = Regex.Replace(old_string, "[^a-z]","");

Note that the operation will always by O(n) and
regex is probably slower than a for loop. But
unless this is something you do billions
of times, then I would go for the simplest
source code.

Arne
 
Z

Zach

On 11/25/2011 12:26 AM, Zach wrote:
For the sake of the problem statement: I want to
remove everything form a string except characters a - z.
Like the quasi code: new_string = old_string.Remove(!(a-z),"");
This can be done in a loop, throwing out the unwanted
characters but is there a "one liner" that could do
the trick more efficiently like the quasi code?

Sorry read Replace in stead of Remove

Try:

new_string = Regex.Replace(old_string, "[^a-z]","");

Note that the operation will always by O(n) and
regex is probably slower than a for loop. But
unless this is something you do billions
of times, then I would go for the simplest
source code.

Arne
Thank you Arne.
Just for my curiosity, how would one get rid of the two spaces (one each
side of 2?

using System;
using System.Text.RegularExpressions;

namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string input = "@%^&(^&678Cat jumped over 2 walls, and said
bingo! But where am I?";
Console.WriteLine(Regex.Replace(input,"[^a-zA-Z ,:;!?.]",""));
Console.WriteLine("123456789012345678901234567890");
Console.ReadLine();
}
}
}
 
Z

Zach

On 11/24/2011 6:30 PM, Zach wrote:
On 11/25/2011 12:26 AM, Zach wrote:
For the sake of the problem statement: I want to
remove everything form a string except characters a - z.
Like the quasi code: new_string = old_string.Remove(!(a-z),"");
This can be done in a loop, throwing out the unwanted
characters but is there a "one liner" that could do
the trick more efficiently like the quasi code?

Sorry read Replace in stead of Remove

Try:

new_string = Regex.Replace(old_string, "[^a-z]","");

Note that the operation will always by O(n) and
regex is probably slower than a for loop. But
unless this is something you do billions
of times, then I would go for the simplest
source code.

Arne
Thank you Arne.
Just for my curiosity, how would one get rid of the two spaces (one each
side of 2?

using System;
using System.Text.RegularExpressions;

namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string input = "@%^&(^&678Cat jumped over 2 walls, and said bingo! But
where am I?";
Console.WriteLine(Regex.Replace(input,"[^a-zA-Z ,:;!?.]",""));
Console.WriteLine("123456789012345678901234567890");
Console.ReadLine();
}
}
}
I mean how not to show two spaces, but only one, resulting in: "over
walls", rather than "over walls".
 
Z

Zach

On 11/24/2011 11:41 PM, Gene Wirchenko wrote:
[snip]
Or inversely, keep the characters you want with something like:
String DeleteThese=OldString.Replace("_","");
String NewString=OldString.Replace(DeleteThese,"");
replace the underscore with the set of characters that you wish to
keep (say, alphanumerics).

The first line deletes the characters that you want to keep to
give you a hitlist for the second line.
For the sake of the problem statement: I want to
remove everything form a string except characters a - z.
Like the quasi code: new_string = old_string.Remove(!(a-z),"");
This can be done in a loop, throwing out the unwanted
characters but is there a "one liner" that could do
the trick more efficiently like the quasi code?

Two lines is not really that long. Since I know C# slightly, I
tried to keep it clear. Would this do it?
theString=theString.Replace(theString.Replace("_",""),"");
Again, replace the underscore with the characters that you want to
keep.

Sincerely,

Gene Wirchenko

Thank you Gene, however, I believe Arne provided the key to what I was
looking for.
Regards,
Zach.
 
Z

Zach

On 11/24/2011 6:49 PM, Arne Vajhøj wrote:
On 11/24/2011 6:30 PM, Zach wrote:
On 11/25/2011 12:26 AM, Zach wrote:
For the sake of the problem statement: I want to
remove everything form a string except characters a - z.
Like the quasi code: new_string = old_string.Remove(!(a-z),"");
This can be done in a loop, throwing out the unwanted
characters but is there a "one liner" that could do
the trick more efficiently like the quasi code?

Sorry read Replace in stead of Remove

Try:

new_string = Regex.Replace(old_string, "[^a-z]","");

Note that the operation will always by O(n) and
regex is probably slower than a for loop. But
unless this is something you do billions
of times, then I would go for the simplest
source code.

Arne
Thank you Arne.
Just for my curiosity, how would one get rid of the two spaces (one each
side of 2?

using System;
using System.Text.RegularExpressions;

namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string input = "@%^&(^&678Cat jumped over 2 walls, and said bingo! But
where am I?";
Console.WriteLine(Regex.Replace(input,"[^a-zA-Z ,:;!?.]",""));
Console.WriteLine("123456789012345678901234567890");
Console.ReadLine();
}
}
}
I mean how not to show two spaces, but only one, resulting in: "over
walls", rather than "over walls".

This will work (code below), but I would like to solve the issue with
regular expressions

using System;
using System.Text.RegularExpressions;

namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string input = "@%^&(^&678Cat jumped over 2 walls, and said
bingo! But where am I?";
Console.WriteLine(Regex.Replace(input, "[^a-zA-Z ,:;!?.]",
"").Replace(" ", " "));
Console.ReadLine();
}
}
}
 
Z

Zach

What you describe as a squiggle is likely a tilde i.e. ~. Depending
upon the specifics of the task I would consider using either String
class methods or regular expressions.


I have found this site to be a good reference
http://www.regular-expressions.info/

regards
A.G.

Yes, as you point out, there are good resources to be found on the
Internet. Thank you,
Zach.
 
A

Arne Vajhøj

On 11/24/2011 6:49 PM, Arne Vajhøj wrote:
On 11/24/2011 6:30 PM, Zach wrote:
On 11/25/2011 12:26 AM, Zach wrote:
For the sake of the problem statement: I want to
remove everything form a string except characters a - z.
Like the quasi code: new_string = old_string.Remove(!(a-z),"");
This can be done in a loop, throwing out the unwanted
characters but is there a "one liner" that could do
the trick more efficiently like the quasi code?

Sorry read Replace in stead of Remove

Try:

new_string = Regex.Replace(old_string, "[^a-z]","");

Note that the operation will always by O(n) and
regex is probably slower than a for loop. But
unless this is something you do billions
of times, then I would go for the simplest
source code.
Thank you Arne.
Just for my curiosity, how would one get rid of the two spaces (one each
side of 2?

using System;
using System.Text.RegularExpressions;

namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string input = "@%^&(^&678Cat jumped over 2 walls, and said bingo! But
where am I?";
Console.WriteLine(Regex.Replace(input,"[^a-zA-Z ,:;!?.]",""));
Console.WriteLine("123456789012345678901234567890");
Console.ReadLine();
}
}
}
I mean how not to show two spaces, but only one, resulting in: "over
walls", rather than "over walls".

Try:

Regex.Replace(input," [^a-zA-Z ,:;!?.]+((?= ))|([^a-zA-Z ,:;!?.]+)","")

Arne
 
A

Arne Vajhøj

On 11/25/2011 3:16 AM, Arne Vajhøj wrote:
On 11/24/2011 6:49 PM, Arne Vajhøj wrote:
On 11/24/2011 6:30 PM, Zach wrote:
On 11/25/2011 12:26 AM, Zach wrote:
For the sake of the problem statement: I want to
remove everything form a string except characters a - z.
Like the quasi code: new_string = old_string.Remove(!(a-z),"");
This can be done in a loop, throwing out the unwanted
characters but is there a "one liner" that could do
the trick more efficiently like the quasi code?

Sorry read Replace in stead of Remove

Try:

new_string = Regex.Replace(old_string, "[^a-z]","");

Note that the operation will always by O(n) and
regex is probably slower than a for loop. But
unless this is something you do billions
of times, then I would go for the simplest
source code.
Just for my curiosity, how would one get rid of the two spaces (one each
side of 2?

using System;
using System.Text.RegularExpressions;

namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string input = "@%^&(^&678Cat jumped over 2 walls, and said bingo! But
where am I?";
Console.WriteLine(Regex.Replace(input,"[^a-zA-Z ,:;!?.]",""));
Console.WriteLine("123456789012345678901234567890");
Console.ReadLine();
}
}
}
I mean how not to show two spaces, but only one, resulting in: "over
walls", rather than "over walls".

This will work (code below), but I would like to solve the issue with
regular expressions

using System;
using System.Text.RegularExpressions;

namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string input = "@%^&(^&678Cat jumped over 2 walls, and said bingo! But
where am I?";
Console.WriteLine(Regex.Replace(input, "[^a-zA-Z ,:;!?.]", "").Replace("
", " "));
Console.ReadLine();
}
}
}

That solution may actually be fine.

Note that this solution and the regex I just posted has slightly
different functionality (they treat two consecutive spaces in the
original different!).

Arne
 
Z

Zach

Regex.Replace(input," [^a-zA-Z ,:;!?.]+((?= ))|([^a-zA-Z ,:;!?.]+)","")

Trying out your solution, this works as well:
Regex.Replace(input," [^a-zA-Z ,:;!?.]|([^a-zA-Z ,:;!?.]+)","")

What do you intend with (?= ) in your solution?
 
A

Arne Vajhøj

Regex.Replace(input," [^a-zA-Z ,:;!?.]+((?= ))|([^a-zA-Z ,:;!?.]+)","")

Trying out your solution, this works as well:
Regex.Replace(input," [^a-zA-Z ,:;!?.]|([^a-zA-Z ,:;!?.]+)","")

What do you intend with (?= ) in your solution?

(?= ) requires that the comes a space after the previous.

"x 2 x" should become "x x" but "x 2x" should become "x x" not "xx".

Arne
 
A

Arne Vajhøj

Regex.Replace(input," [^a-zA-Z ,:;!?.]+((?= ))|([^a-zA-Z ,:;!?.]+)","")

Trying out your solution, this works as well:
Regex.Replace(input," [^a-zA-Z ,:;!?.]|([^a-zA-Z ,:;!?.]+)","")

What do you intend with (?= ) in your solution?

(?= ) requires that the comes a space after the previous.

"x 2 x" should become "x x" but "x 2x" should become "x x" not "xx".

Just looked at the regex. There is a typo.It should be:

Regex.Replace(input,"( [^a-zA-Z ,:;!?.]+(?= ))|([^a-zA-Z ,:;!?.]+)","")

I am not sure whether that makes a difference.

Arne
 
Z

Zach

Regex.Replace(input," [^a-zA-Z ,:;!?.]+((?= ))|([^a-zA-Z ,:;!?.]+)","")

Trying out your solution, this works as well:
Regex.Replace(input," [^a-zA-Z ,:;!?.]|([^a-zA-Z ,:;!?.]+)","")

What do you intend with (?= ) in your solution?

(?= ) requires that the comes a space after the previous.

"x 2 x" should become "x x" but "x 2x" should become "x x" not "xx".

Arne
--------------------------------------------------------------------
Regex.Replace( +((?= ))
I changed your string to +(?= ) without additional () and that worked ok.
--------------------------------------------------------------------
Without +(?= ) the code works fine as well.
All the conditions seem to be fulfilled by the remaining code.
(1.) " [^a
(2.) zA-Z ,
(3.) plus the rest of your code

Then you get (code below), which works OK.

using System;
using System.Text.RegularExpressions;

namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string input = "@%^&(^&678Cat jumped over 2 walls, not 1, and said
bingo! But where am I?";
Console.WriteLine(Regex.Replace(input, " [^a-zA-Z ,:;!?.]|([^a-zA-Z
,:;!?.]+)", ""));
Console.ReadLine();
}
}
}
 
A

Arne Vajhøj

On 11/25/2011 10:35 PM, Arne Vajhøj wrote:
Regex.Replace(input," [^a-zA-Z ,:;!?.]+((?= ))|([^a-zA-Z ,:;!?.]+)","")

Trying out your solution, this works as well:
Regex.Replace(input," [^a-zA-Z ,:;!?.]|([^a-zA-Z ,:;!?.]+)","")

What do you intend with (?= ) in your solution?

(?= ) requires that the comes a space after the previous.

"x 2 x" should become "x x" but "x 2x" should become "x x" not "xx".
--------------------------------------------------------------------
Regex.Replace( +((?= ))
I changed your string to +(?= ) without additional () and that worked ok.
--------------------------------------------------------------------
Without +(?= ) the code works fine as well.
All the conditions seem to be fulfilled by the remaining code.
(1.) " [^a
(2.) zA-Z ,
(3.) plus the rest of your code

Then you get (code below), which works OK.

using System;
using System.Text.RegularExpressions;

namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string input = "@%^&(^&678Cat jumped over 2 walls, not 1, and said
bingo! But where am I?";
Console.WriteLine(Regex.Replace(input, " [^a-zA-Z ,:;!?.]|([^a-zA-Z
,:;!?.]+)", ""));
Console.ReadLine();
}
}
}

If you are happy with "x 2x" becoming "xx", then just stay with that.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top