Simple Encoding Question

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Hello,

I am playing a little with Encoding, and I have what is possibly (forgive
me) a newbie-type question.

I have a function that takes a string and a codepage (based upon the basic
MSDN help - look for "Using Unicode Encoding").

Anyway, I want to pass the contents of a textbox "as is", but always get a
literal string. Using the example from MSDN, say I want to pass in this
string:
"\u307b,\u308b,\u305a,\u3042,\u306d"

If I pass in Textbox.Text, I get
@"\u307b,\u308b,\u305a,\u3042,\u306d", which is not what I want.

In other words, instead of "Unicode character 307b, unicode 308b", etc., I
get "slash u three..."

Thanks,
pagates
 
pagates said:
I am playing a little with Encoding, and I have what is possibly (forgive
me) a newbie-type question.

I have a function that takes a string and a codepage (based upon the basic
MSDN help - look for "Using Unicode Encoding").

Anyway, I want to pass the contents of a textbox "as is", but always get a
literal string. Using the example from MSDN, say I want to pass in this
string:
"\u307b,\u308b,\u305a,\u3042,\u306d"
If I pass in Textbox.Text, I get
@"\u307b,\u308b,\u305a,\u3042,\u306d", which is not what I want.

In other words, instead of "Unicode character 307b, unicode 308b", etc., I
get "slash u three..."

Could you post a short but complete program which demonstrates the
problem?

See http://www.pobox.com/~skeet/csharp/complete.html for details of
what I mean by that.

Here's a sample program which seems to go against your post:

using System.Windows.Forms;
using System.Drawing;
using System;

public class Test
{
static void Main()
{
Form f = new Form();
f.Size = new Size(200, 200);
TextBox tb = new TextBox();
tb.Text = "\u307b,\u308b,\u305a,\u3042,\u306d";
f.Controls.Add(tb);
Application.Run(f);
}
}

While that only displays boxes and commas on my box, it makes the point
that it's *not* displaying "\u307b (etc)".
 
Hi Jon,

Thanks for the reply, but you have my problem in reverse. Here is a SBCP
that demonstrates what I'm trying to achieve:

Code:
using System.Windows.Forms;
using System.Drawing;
using System.Text;

public class frmTest : Form
{
private Button   btn;
private TextBox  tb;
private ListView lv;
private ColumnHeader colByte;
private ColumnHeader colChar;

public frmTest()
{
InitializeComponent();
}

private void InitializeComponent()
{
btn = new Button();
lv = new ListView();
colByte = new ColumnHeader();
colChar = new ColumnHeader();
tb = new TextBox();

// btnEncode
btn.Location = new Point(424, 8);
btn.Size = new Size(64, 24);
btn.Text = "Encode";
btn.Click += new System.EventHandler(btn_Click);

// lv
lv.Columns.AddRange(new ColumnHeader[] { colByte,colChar });
lv.Location = new Point(0, 80);
lv.Size = new Size(496, 240);
lv.View = System.Windows.Forms.View.Details;

// colByte, colChar
colByte.Text = "Byte";
colByte.Width = 33;
colChar.Text = "Character";
colChar.Width = 58;

// tb
tb.Location = new Point(8, 8);
tb.Size = new Size(408, 21);
tb.Text = "This is the text that will be encoded.";

// frmTest
ClientSize = new Size(496, 318);
Controls.Add(tb);
Controls.Add(lv);
Controls.Add(btn);
}

private void PrintCPBytes(string str, int codePage)
{
Encoding targetEncoding;
byte[] encodedChars;

targetEncoding = Encoding.GetEncoding(codePage);

// Gets the byte representation of the specified string.
encodedChars = targetEncoding.GetBytes(str);

for (int i = 0; i < encodedChars.Length; i++)
{
ListViewItem lItem = new ListViewItem(i.ToString());
lItem.SubItems.Add(encodedChars[i].ToString());
lv.Items.Add(lItem);
}

}

private void btn_Click(object sender, System.EventArgs e)
{
lv.Items.Clear();
PrintCPBytes(tb.Text, 1252);   // 1252 is Latin, 932 is Japanese
PrintCPBytes(tb.Text, 932);    // 1252 is Latin, 932 is Japanese
}

static void Main()
{
Application.Run(new frmTest());
}
}

I'd like to put "\u307b" (etc) into the TextBox, and apply that to the
PrintCPBytes function.

Thanks,
pagates
 
Anyway, I want to pass the contents of a textbox "as is", but always get a
literal string. Using the example from MSDN, say I want to pass in this
string:
"\u307b,\u308b,\u305a,\u3042,\u306d"

If I pass in Textbox.Text, I get
@"\u307b,\u308b,\u305a,\u3042,\u306d", which is not what I want.

In other words, instead of "Unicode character 307b, unicode 308b", etc., I
get "slash u three..."

So you basically want to apply escape character parsing on the textbox
string. I don't know if the framework has any such function. Someone
else may answer that, otherwise here is a method that implements the
basic, including the unicode escape code parsing that you want. If you
need to be able to handle other escape codes you have to add it
manually:


static string ParseBackSlashString(string s)
{
StringBuilder sb = new StringBuilder(s.Length);
for (int i = 0; i < s.Length; i++)
{
if (s == '\\')
{
i++;
if (i >= s.Length) //There must be a character after
backslash
throw new ApplicationException("String may not end
with a \\.");
switch (s)
{
case '\\':
sb.Append('\\');
break;
case 'u':
if (i + 4 >= s.Length)
throw new ApplicationException("Unrecognized
escape sequence.");
else
{
int value = 0;
for (int j = 1; j <= 4; j++)
{
char c = s[i+j];
if (c >= '0' && c <= '9')
value += (int)Math.Pow(16, 4-j)*
(c-'0');
else if (c >= 'a' && c <= 'f')
value += (int)Math.Pow(16, 4 - j) * (c
- 'a' + 10);
else if (c >= 'A' && c <= 'F')
value += (int)Math.Pow(16, 4 - j) * (c
- 'A' + 10);
else
throw new
ApplicationException("Unrecognized escape sequence.");
}
sb.Append((char)value);
i += 4;
}
break;
default:
throw new ApplicationException("Unrecognized
escape sequence.");
}

}
else //This is the default when there isn't a backslash
{
sb.Append(s);
}
}
return sb.ToString();
}
 
I'd like to put "\u307b" (etc) into the TextBox, and apply that to the
PrintCPBytes function.

In that case, you'll have to parse the text. The TextBox itself doesn't
(and shouldn't!) care about C# escaping rules.

You'll need to look for \u in a string, and then parse the next 4
characters as a hex number (eg using Convert.ToInt32(string,int),
specifying 16 as the base).
 
if (i + 4 >= s.Length)
throw new ApplicationException("Unrecognized
escape sequence.");
else
{
int value = 0;
for (int j = 1; j <= 4; j++)
{
char c = s[i+j];
if (c >= '0' && c <= '9')
value += (int)Math.Pow(16, 4-j)*
(c-'0');
else if (c >= 'a' && c <= 'f')
value += (int)Math.Pow(16, 4 - j) * (c
- 'a' + 10);
else if (c >= 'A' && c <= 'F')
value += (int)Math.Pow(16, 4 - j) * (c
- 'A' + 10);
else
throw new
ApplicationException("Unrecognized escape sequence.");
}
sb.Append((char)value);
i += 4;
}
break;

After having read Jon Skeet's reply I would just like to mention that
the following code would be more readable using the Convert.ToInt32
method that he mentioned:

try
{
int value = Convert.ToInt32(s.Substring(i + 1, 4), 16);
sb.Append((char)value);
i += 4;
break;
}
catch (System.FormatException)
{
throw new ApplicationException("Unrecognized escape sequence.");
}
catch (System.ArgumentOutOfRangeException)
{
throw new ApplicationException("Unrecognized escape sequence.");
}
 
Jon,

The C# compiler already does this. In the interest of "reusable code",
don't you think, since the code is already written, it would have been nice
to have this method available/exposed? --I don't even know where to look
(using Roeder's Reflector) to see if this is possible or not...

In addition to this specific case, I find, sometimes, I find myself writing
code to reproduce something the framework already does. (Of course, I can't
remember exactly what it was I was trying to do.)

Scott
 
Scott Coonce said:
The C# compiler already does this. In the interest of "reusable code",
don't you think, since the code is already written, it would have been nice
to have this method available/exposed? --I don't even know where to look
(using Roeder's Reflector) to see if this is possible or not...

Hmmm... I'm not at all sure. It's quite possible that it does this
escaping within an internal structure which shouldn't be exposed, or at
the same time as maintaining other state. There may be some way of
doing it at the moment using the compiler services, but it's likely to
be much more tortuous than just writing the code by hand.
 
Thanks, all. I was afraid that I was going to have to parse it, but I wanted
to make sure I wasn't reinventing the .NET wheel.

Thanks again,
pagates
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top