Regular expressions

  • Thread starter Thread starter Zach
  • Start date Start date
Z

Zach

In order to remove squiggles from a string I loop through the characters
and throw out those outside a permitted value. Can this be done better
with regular expressions? My current knowledge of regular expressions
only covers finding out whether a sting argument is present in another
string. I have tried the Internet but couldn't find an example or a
reference. Can it be done? Might you have a reference?
 
The Regex class can be used to remove things from strings. But they are
most applicable when the pattern being removed is complex enough to
justify the use of regular expressions.

I don't know what a "squiggle" is, but you can easily remove any single
string literal from another string with the String.Replace() method.
Just specify the string to be removed as the first argument, and the
empty string as the second.

Or inversely, keep the characters you want with something like:
String DeleteThese=OldString.Replace("_","");
String NewString=OldString.Replace(DeleteThese,"");
replace the underscore with the set of characters that you wish to
keep (say, alphanumerics).

The first line deletes the characters that you want to keep to
give you a hitlist for the second line.

Sincerely,

Gene Wirchenko
 
In order to remove squiggles from a string I loop through the characters
and throw out those outside a permitted value. Can this be done better
with regular expressions?
What you describe as a squiggle is likely a tilde i.e. ~. Depending
upon the specifics of the task I would consider using either String
class methods or regular expressions.
My current knowledge of regular expressions
only covers finding out whether a sting argument is present in another
string. I have tried the Internet but couldn't find an example or a
reference. Can it be done? Might you have a reference?

I have found this site to be a good reference
http://www.regular-expressions.info/

regards
A.G.
 
Or inversely, keep the characters you want with something like:
String DeleteThese=OldString.Replace("_","");
String NewString=OldString.Replace(DeleteThese,"");
replace the underscore with the set of characters that you wish to
keep (say, alphanumerics).

The first line deletes the characters that you want to keep to
give you a hitlist for the second line.

Sincerely,

Gene Wirchenko

For the sake of the problem statement: I want to
remove everything form a string except characters a - z.
Like the quasi code: new_string = old_string.Remove(!(a-z),"");
This can be done in a loop, throwing out the unwanted
characters but is there a "one liner" that could do
the trick more efficiently like the quasi code?
 
For the sake of the problem statement: I want to
remove everything form a string except characters a - z.
Like the quasi code: new_string = old_string.Remove(!(a-z),"");
This can be done in a loop, throwing out the unwanted
characters but is there a "one liner" that could do
the trick more efficiently like the quasi code?
Sorry read Replace in stead of Remove
 
On 11/24/2011 11:41 PM, Gene Wirchenko wrote:
[snip]
Or inversely, keep the characters you want with something like:
String DeleteThese=OldString.Replace("_","");
String NewString=OldString.Replace(DeleteThese,"");
replace the underscore with the set of characters that you wish to
keep (say, alphanumerics).

The first line deletes the characters that you want to keep to
give you a hitlist for the second line.
For the sake of the problem statement: I want to
remove everything form a string except characters a - z.
Like the quasi code: new_string = old_string.Remove(!(a-z),"");
This can be done in a loop, throwing out the unwanted
characters but is there a "one liner" that could do
the trick more efficiently like the quasi code?

Two lines is not really that long. Since I know C# slightly, I
tried to keep it clear. Would this do it?
theString=theString.Replace(theString.Replace("_",""),"");
Again, replace the underscore with the characters that you want to
keep.

Sincerely,

Gene Wirchenko
 
Sorry read Replace in stead of Remove

Try:

new_string = Regex.Replace(old_string, "[^a-z]","");

Note that the operation will always by O(n) and
regex is probably slower than a for loop. But
unless this is something you do billions
of times, then I would go for the simplest
source code.

Arne
 
On 11/25/2011 12:26 AM, Zach wrote:
For the sake of the problem statement: I want to
remove everything form a string except characters a - z.
Like the quasi code: new_string = old_string.Remove(!(a-z),"");
This can be done in a loop, throwing out the unwanted
characters but is there a "one liner" that could do
the trick more efficiently like the quasi code?

Sorry read Replace in stead of Remove

Try:

new_string = Regex.Replace(old_string, "[^a-z]","");

Note that the operation will always by O(n) and
regex is probably slower than a for loop. But
unless this is something you do billions
of times, then I would go for the simplest
source code.

Arne
Thank you Arne.
Just for my curiosity, how would one get rid of the two spaces (one each
side of 2?

using System;
using System.Text.RegularExpressions;

namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string input = "@%^&(^&678Cat jumped over 2 walls, and said
bingo! But where am I?";
Console.WriteLine(Regex.Replace(input,"[^a-zA-Z ,:;!?.]",""));
Console.WriteLine("123456789012345678901234567890");
Console.ReadLine();
}
}
}
 
On 11/24/2011 6:30 PM, Zach wrote:
On 11/25/2011 12:26 AM, Zach wrote:
For the sake of the problem statement: I want to
remove everything form a string except characters a - z.
Like the quasi code: new_string = old_string.Remove(!(a-z),"");
This can be done in a loop, throwing out the unwanted
characters but is there a "one liner" that could do
the trick more efficiently like the quasi code?

Sorry read Replace in stead of Remove

Try:

new_string = Regex.Replace(old_string, "[^a-z]","");

Note that the operation will always by O(n) and
regex is probably slower than a for loop. But
unless this is something you do billions
of times, then I would go for the simplest
source code.

Arne
Thank you Arne.
Just for my curiosity, how would one get rid of the two spaces (one each
side of 2?

using System;
using System.Text.RegularExpressions;

namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string input = "@%^&(^&678Cat jumped over 2 walls, and said bingo! But
where am I?";
Console.WriteLine(Regex.Replace(input,"[^a-zA-Z ,:;!?.]",""));
Console.WriteLine("123456789012345678901234567890");
Console.ReadLine();
}
}
}
I mean how not to show two spaces, but only one, resulting in: "over
walls", rather than "over walls".
 
On 11/24/2011 11:41 PM, Gene Wirchenko wrote:
[snip]
Or inversely, keep the characters you want with something like:
String DeleteThese=OldString.Replace("_","");
String NewString=OldString.Replace(DeleteThese,"");
replace the underscore with the set of characters that you wish to
keep (say, alphanumerics).

The first line deletes the characters that you want to keep to
give you a hitlist for the second line.
For the sake of the problem statement: I want to
remove everything form a string except characters a - z.
Like the quasi code: new_string = old_string.Remove(!(a-z),"");
This can be done in a loop, throwing out the unwanted
characters but is there a "one liner" that could do
the trick more efficiently like the quasi code?

Two lines is not really that long. Since I know C# slightly, I
tried to keep it clear. Would this do it?
theString=theString.Replace(theString.Replace("_",""),"");
Again, replace the underscore with the characters that you want to
keep.

Sincerely,

Gene Wirchenko

Thank you Gene, however, I believe Arne provided the key to what I was
looking for.
Regards,
Zach.
 
On 11/24/2011 6:49 PM, Arne Vajhøj wrote:
On 11/24/2011 6:30 PM, Zach wrote:
On 11/25/2011 12:26 AM, Zach wrote:
For the sake of the problem statement: I want to
remove everything form a string except characters a - z.
Like the quasi code: new_string = old_string.Remove(!(a-z),"");
This can be done in a loop, throwing out the unwanted
characters but is there a "one liner" that could do
the trick more efficiently like the quasi code?

Sorry read Replace in stead of Remove

Try:

new_string = Regex.Replace(old_string, "[^a-z]","");

Note that the operation will always by O(n) and
regex is probably slower than a for loop. But
unless this is something you do billions
of times, then I would go for the simplest
source code.

Arne
Thank you Arne.
Just for my curiosity, how would one get rid of the two spaces (one each
side of 2?

using System;
using System.Text.RegularExpressions;

namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string input = "@%^&(^&678Cat jumped over 2 walls, and said bingo! But
where am I?";
Console.WriteLine(Regex.Replace(input,"[^a-zA-Z ,:;!?.]",""));
Console.WriteLine("123456789012345678901234567890");
Console.ReadLine();
}
}
}
I mean how not to show two spaces, but only one, resulting in: "over
walls", rather than "over walls".

This will work (code below), but I would like to solve the issue with
regular expressions

using System;
using System.Text.RegularExpressions;

namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string input = "@%^&(^&678Cat jumped over 2 walls, and said
bingo! But where am I?";
Console.WriteLine(Regex.Replace(input, "[^a-zA-Z ,:;!?.]",
"").Replace(" ", " "));
Console.ReadLine();
}
}
}
 
What you describe as a squiggle is likely a tilde i.e. ~. Depending
upon the specifics of the task I would consider using either String
class methods or regular expressions.


I have found this site to be a good reference
http://www.regular-expressions.info/

regards
A.G.

Yes, as you point out, there are good resources to be found on the
Internet. Thank you,
Zach.
 
On 11/24/2011 6:49 PM, Arne Vajhøj wrote:
On 11/24/2011 6:30 PM, Zach wrote:
On 11/25/2011 12:26 AM, Zach wrote:
For the sake of the problem statement: I want to
remove everything form a string except characters a - z.
Like the quasi code: new_string = old_string.Remove(!(a-z),"");
This can be done in a loop, throwing out the unwanted
characters but is there a "one liner" that could do
the trick more efficiently like the quasi code?

Sorry read Replace in stead of Remove

Try:

new_string = Regex.Replace(old_string, "[^a-z]","");

Note that the operation will always by O(n) and
regex is probably slower than a for loop. But
unless this is something you do billions
of times, then I would go for the simplest
source code.
Thank you Arne.
Just for my curiosity, how would one get rid of the two spaces (one each
side of 2?

using System;
using System.Text.RegularExpressions;

namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string input = "@%^&(^&678Cat jumped over 2 walls, and said bingo! But
where am I?";
Console.WriteLine(Regex.Replace(input,"[^a-zA-Z ,:;!?.]",""));
Console.WriteLine("123456789012345678901234567890");
Console.ReadLine();
}
}
}
I mean how not to show two spaces, but only one, resulting in: "over
walls", rather than "over walls".

Try:

Regex.Replace(input," [^a-zA-Z ,:;!?.]+((?= ))|([^a-zA-Z ,:;!?.]+)","")

Arne
 
On 11/25/2011 3:16 AM, Arne Vajhøj wrote:
On 11/24/2011 6:49 PM, Arne Vajhøj wrote:
On 11/24/2011 6:30 PM, Zach wrote:
On 11/25/2011 12:26 AM, Zach wrote:
For the sake of the problem statement: I want to
remove everything form a string except characters a - z.
Like the quasi code: new_string = old_string.Remove(!(a-z),"");
This can be done in a loop, throwing out the unwanted
characters but is there a "one liner" that could do
the trick more efficiently like the quasi code?

Sorry read Replace in stead of Remove

Try:

new_string = Regex.Replace(old_string, "[^a-z]","");

Note that the operation will always by O(n) and
regex is probably slower than a for loop. But
unless this is something you do billions
of times, then I would go for the simplest
source code.
Just for my curiosity, how would one get rid of the two spaces (one each
side of 2?

using System;
using System.Text.RegularExpressions;

namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string input = "@%^&(^&678Cat jumped over 2 walls, and said bingo! But
where am I?";
Console.WriteLine(Regex.Replace(input,"[^a-zA-Z ,:;!?.]",""));
Console.WriteLine("123456789012345678901234567890");
Console.ReadLine();
}
}
}
I mean how not to show two spaces, but only one, resulting in: "over
walls", rather than "over walls".

This will work (code below), but I would like to solve the issue with
regular expressions

using System;
using System.Text.RegularExpressions;

namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string input = "@%^&(^&678Cat jumped over 2 walls, and said bingo! But
where am I?";
Console.WriteLine(Regex.Replace(input, "[^a-zA-Z ,:;!?.]", "").Replace("
", " "));
Console.ReadLine();
}
}
}

That solution may actually be fine.

Note that this solution and the regex I just posted has slightly
different functionality (they treat two consecutive spaces in the
original different!).

Arne
 
Regex.Replace(input," [^a-zA-Z ,:;!?.]+((?= ))|([^a-zA-Z ,:;!?.]+)","")

Trying out your solution, this works as well:
Regex.Replace(input," [^a-zA-Z ,:;!?.]|([^a-zA-Z ,:;!?.]+)","")

What do you intend with (?= ) in your solution?
 
Regex.Replace(input," [^a-zA-Z ,:;!?.]+((?= ))|([^a-zA-Z ,:;!?.]+)","")

Trying out your solution, this works as well:
Regex.Replace(input," [^a-zA-Z ,:;!?.]|([^a-zA-Z ,:;!?.]+)","")

What do you intend with (?= ) in your solution?

(?= ) requires that the comes a space after the previous.

"x 2 x" should become "x x" but "x 2x" should become "x x" not "xx".

Arne
 
Regex.Replace(input," [^a-zA-Z ,:;!?.]+((?= ))|([^a-zA-Z ,:;!?.]+)","")

Trying out your solution, this works as well:
Regex.Replace(input," [^a-zA-Z ,:;!?.]|([^a-zA-Z ,:;!?.]+)","")

What do you intend with (?= ) in your solution?

(?= ) requires that the comes a space after the previous.

"x 2 x" should become "x x" but "x 2x" should become "x x" not "xx".

Just looked at the regex. There is a typo.It should be:

Regex.Replace(input,"( [^a-zA-Z ,:;!?.]+(?= ))|([^a-zA-Z ,:;!?.]+)","")

I am not sure whether that makes a difference.

Arne
 
Regex.Replace(input," [^a-zA-Z ,:;!?.]+((?= ))|([^a-zA-Z ,:;!?.]+)","")

Trying out your solution, this works as well:
Regex.Replace(input," [^a-zA-Z ,:;!?.]|([^a-zA-Z ,:;!?.]+)","")

What do you intend with (?= ) in your solution?

(?= ) requires that the comes a space after the previous.

"x 2 x" should become "x x" but "x 2x" should become "x x" not "xx".

Arne
--------------------------------------------------------------------
Regex.Replace( +((?= ))
I changed your string to +(?= ) without additional () and that worked ok.
--------------------------------------------------------------------
Without +(?= ) the code works fine as well.
All the conditions seem to be fulfilled by the remaining code.
(1.) " [^a
(2.) zA-Z ,
(3.) plus the rest of your code

Then you get (code below), which works OK.

using System;
using System.Text.RegularExpressions;

namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string input = "@%^&(^&678Cat jumped over 2 walls, not 1, and said
bingo! But where am I?";
Console.WriteLine(Regex.Replace(input, " [^a-zA-Z ,:;!?.]|([^a-zA-Z
,:;!?.]+)", ""));
Console.ReadLine();
}
}
}
 
On 11/25/2011 10:35 PM, Arne Vajhøj wrote:
Regex.Replace(input," [^a-zA-Z ,:;!?.]+((?= ))|([^a-zA-Z ,:;!?.]+)","")

Trying out your solution, this works as well:
Regex.Replace(input," [^a-zA-Z ,:;!?.]|([^a-zA-Z ,:;!?.]+)","")

What do you intend with (?= ) in your solution?

(?= ) requires that the comes a space after the previous.

"x 2 x" should become "x x" but "x 2x" should become "x x" not "xx".
--------------------------------------------------------------------
Regex.Replace( +((?= ))
I changed your string to +(?= ) without additional () and that worked ok.
--------------------------------------------------------------------
Without +(?= ) the code works fine as well.
All the conditions seem to be fulfilled by the remaining code.
(1.) " [^a
(2.) zA-Z ,
(3.) plus the rest of your code

Then you get (code below), which works OK.

using System;
using System.Text.RegularExpressions;

namespace ConsoleApplication2
{
class Program
{
static void Main(string[] args)
{
string input = "@%^&(^&678Cat jumped over 2 walls, not 1, and said
bingo! But where am I?";
Console.WriteLine(Regex.Replace(input, " [^a-zA-Z ,:;!?.]|([^a-zA-Z
,:;!?.]+)", ""));
Console.ReadLine();
}
}
}

If you are happy with "x 2x" becoming "xx", then just stay with that.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top