How to ...

  • Thread starter Thread starter Jacek Jurkowski
  • Start date Start date
J

Jacek Jurkowski

Remove a language specific chars from a string?
To make a "aaa" from "a¶aæaó" string...
 
U¿ytkownik "Jacek Jurkowski said:
Remove a language specific chars from a string?
To make a "aaa" from "a¶aæaó" string...

There you have my five minutes code:
--CUT HERE--
static void Main(string[] args)
{
string someText = "a¶aæaó";
Encoding ascii = Encoding.GetEncoding("ascii");
Encoding unicode = Encoding.Unicode;

byte[] unicodeBytes = unicode.GetBytes(someText);
byte[] asciiBytes = Encoding.Convert(unicode, ascii, unicodeBytes);

System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();

string str = Regex.Replace(enc.GetString(asciiBytes), @"[^\w\.@-]", "");
Console.WriteLine(str);
Console.ReadLine();
}
--CUT HERE--

pzdr,
Inez Korczyñski
 
Inez Korczynski said:
U¿ytkownik "Jacek Jurkowski said:
Remove a language specific chars from a string?
To make a "aaa" from "a¶aæaó" string...

There you have my five minutes code:
--CUT HERE--
static void Main(string[] args)
{
string someText = "a¶aæaó";
Encoding ascii = Encoding.GetEncoding("ascii");
Encoding unicode = Encoding.Unicode;

byte[] unicodeBytes = unicode.GetBytes(someText);
byte[] asciiBytes = Encoding.Convert(unicode, ascii, unicodeBytes);

System.Text.ASCIIEncoding enc = new System.Text.ASCIIEncoding();

Why are you creating two different ASCII encodings when you can just
use Encoding.ASCII both times?
string str = Regex.Replace(enc.GetString(asciiBytes), @"[^\w\.@-]", "");
Console.WriteLine(str);
Console.ReadLine();
}
--CUT HERE--

Alternatively, and somewhat more simpler IMO (longer, but simpler):

static string RemoveNonAscii(string original)
{
int charsToRemove = 0;
foreach (char c in original)
{
if (c > 127)
{
charsToRemove++;
}
}
if (charsToRemove==0)
{
return original;
}

StringBuilder builder = new StringBuilder
(original.Length-charsToRemove);
foreach (char c in original)
{
if (c < 128)
{
builder.Append(c);
}
}
return builder.ToString();
}

If you want a simpler but less efficient version:

static string RemoveNonAscii(string original)
{
StringBuilder builder = new StringBuilder();
foreach (char c in original)
{
if (c < 128)
{
builder.Append(c);
}
}
return builder.ToString();
}
 
Back
Top