Unicode values

B

billsahiker

Where do I find the unicode values for math operators like equal,
minus and plus sign and how to I check if the value of a byte array is
one of these operators? I populate the byte array from a filestream
object using the Read method. So far Ihave been working with utf8
files and I just use

if(byte == 61) //0x3D works also

it returns true if it is the equal sign. But how do I do this if I
work with a unicode/utf16 encoded file? I figure I need to compare two
bytes for unicode, right? but where do I get those values? I googled
for unicode code chart and the like but after a couple hours cannot
find this.

BTW, the files I am reading are all text. my test files are created
with streamwriter using the desired encoding object.

Bill
 
B

billsahiker

There are a variety of sources.  Windows has the Character Map utility  
that allows you to browse characters on a per-font basis, and will tell  
you the Unicode value for a character.

However, you may be going about whatever you're trying to do the wrong  
way.  You should read your text in using an Encoder class appropriate to 
the format, converting to the char type in C#.  Then you can just use the  
literal '=' (for example) to compare for the equals character, without  
ever needing to know the actual Unicode value.

Pete

I am looking for maximum performance. I originally read the file with
streamreader and did the parsing with strings, but it was way too
slow. I am thinking there should be two byte values for a specific
character in a given language -do the bytes vary by font as well?
 
B

billsahiker

Pete,

The performance issue was parsing strings vs. a byte array. since I
already have a working routine
that searches a byte array for utf8 files, I wanted to modify it for
unicode. turns out I can still search for the same
byte values, e.g., 0x3D for the equal sign, because the first byte is
the same in unicode and ut8 for the math symbols
I need. Once I discovered that, all I needed to do was increment the
pointer variable in the buffer by two instead of one.
With that minor adjustment the routine now works for both utf8 and
unicode.

Thanks for your help.

Bill
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top