how to tell text from binary file

G

Guest

How can a binary file be distinguished from a text file on Windows?

Obviously I want a way that is more sophisicated that just looking at the
dot extention in the filename.

I want to write code that processes all text files in a directory but leaves
binary files alone.
 
G

giddy

Florence said:
How can a binary file be distinguished from a text file on Windows?


well.. ALL FILES.. including TEXT files are stored as binary on the
computer , you can read both types , both ways ... if u read a text
file binararily.. ..you'll get ASCII values in binary..similarly u
could get some insane text .. if u read a binary file in text mode..
..EG. this is what i get if i open a .ICO (binary) file in notepad(which
obvuisly reads in "text mode"):

ÿØÿà JFIF    ÿÛ C 
 %# , #&')*)-0-(0%()(ÿÛ C

if you check the ascii values for these letters many of them will hav
ascii values ABOVE 130 and below 32 (space) ...because of course this
is NOT TEXT!.. soo... make a little function.. that would read a file
IN TEXT MODE .. and check the asciii values of the first 150 - 200
characters and if most or even some of them have wierd ascii values
like 1 - 30 , or 130+ .. then its a binary.....cuz text files are
MOSTLY not going to hav these values.. . look up some chart for all the
values..

That was the best i could come up with =)

Good Luck!

Gideon
 
G

giddy

Florence said:
How can a binary file be distinguished from a text file on Windows?


well.. ALL FILES.. including TEXT files are stored as binary on the
computer , you can read both types , both ways ... if u read a text
file binararily.. ..you'll get ASCII values in binary..similarly u
could get some insane text .. if u read a binary file in text mode..
..EG. this is what i get if i open a .ICO (binary) file in notepad(which
obvuisly reads in "text mode"):

ÿØÿà JFIF    ÿÛ C 
 %# , #&')*)-0-(0%()(ÿÛ C

if you check the ascii values for these letters many of them will hav
ascii values ABOVE 130 and below 32 (space) ...because of course this
is NOT TEXT!.. soo... make a little function.. that would read a file
IN TEXT MODE .. and check the asciii values of the first 150 - 200
characters and if most or even some of them have wierd ascii values
like 1 - 30 , or 130+ .. then its a binary.....cuz text files are
MOSTLY not going to hav these values.. . look up some chart for all the
values..

That was the best i could come up with =)

Good Luck!

Gideon
 
G

giddy

Florence said:
How can a binary file be distinguished from a text file on Windows?


well.. ALL FILES.. including TEXT files are stored as binary on the
computer , you can read both types , both ways ... if u read a text
file binararily.. ..you'll get ASCII values in binary..similarly u
could get some insane text .. if u read a binary file in text mode..
..EG. this is what i get if i open a .ICO (binary) file in notepad(which
obvuisly reads in "text mode"):

ÿØÿà JFIF    ÿÛ C 
 %# , #&')*)-0-(0%()(ÿÛ C

if you check the ascii values for these letters many of them will hav
ascii values ABOVE 130 and below 32 (space) ...because of course this
is NOT TEXT!.. soo... make a little function.. that would read a file
IN TEXT MODE .. and check the asciii values of the first 150 - 200
characters and if most or even some of them have wierd ascii values
like 1 - 30 , or 130+ .. then its a binary.....cuz text files are
MOSTLY not going to hav these values.. . look up some chart for all the
values..

That was the best i could come up with =)

Good Luck!

Gideon
 
M

Marc Gravell

You may wish to anticipate chars 9, 10 and 13 in the text (tab, line
feed and carriage return); ASCII doesn't define chars over 127, but
this does happen for many code-pages/encodings - it is still "text"
though. Ditto unicode etc.

Marc
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top