Unicode values

billsahiker · May 13, 2008

Where do I find the unicode values for math operators like equal,
minus and plus sign and how to I check if the value of a byte array is
one of these operators? I populate the byte array from a filestream
object using the Read method. So far Ihave been working with utf8
files and I just use

if(byte == 61) //0x3D works also

it returns true if it is the equal sign. But how do I do this if I
work with a unicode/utf16 encoded file? I figure I need to compare two
bytes for unicode, right? but where do I get those values? I googled
for unicode code chart and the like but after a couple hours cannot
find this.

BTW, the files I am reading are all text. my test files are created
with streamwriter using the desired encoding object.

Bill

billsahiker · May 13, 2008

There are a variety of sources. Windows has the Character Map utility
that allows you to browse characters on a per-font basis, and will tell
you the Unicode value for a character.

However, you may be going about whatever you're trying to do the wrong
way. You should read your text in using an Encoder class appropriate to
the format, converting to the char type in C#. Then you can just use the
literal '=' (for example) to compare for the equals character, without
ever needing to know the actual Unicode value.

Pete

I am looking for maximum performance. I originally read the file with
streamreader and did the parsing with strings, but it was way too
slow. I am thinking there should be two byte values for a specific
character in a given language -do the bytes vary by font as well?

billsahiker · May 13, 2008

Pete,

The performance issue was parsing strings vs. a byte array. since I
already have a working routine
that searches a byte array for utf8 files, I wanted to modify it for
unicode. turns out I can still search for the same
byte values, e.g., 0x3D for the equal sign, because the first byte is
the same in unicode and ut8 for the math symbols
I need. Once I discovered that, all I needed to do was increment the
pointer variable in the buffer by two instead of one.
With that minor adjustment the routine now works for both utf8 and
unicode.

Thanks for your help.

Bill

Unicode in .NET	8	Apr 30, 2010
This spanish character string "ñ" cause something that I don't understand	7	Mar 31, 2010
size of a file and unicode	6	Mar 2, 2010
I'm using about twice as many bytes of memory as the size of the file	8	Mar 4, 2010
C# and encodings	30	Feb 3, 2009
How to create a .txt file with unicode encoding	1	Mar 27, 2007
converting letters to it's unicode representation	2	May 22, 2007
Unicode Parsing	3	Apr 24, 2009

Unicode values

billsahiker

billsahiker

billsahiker

Ask a Question

Similar Threads