System.IO.File.IsBinary() ?

Y

Ympostor

Hello.

Is there a method in the .NET class libraries to know if a given file is
binary or just plain text (ASCII)?

Thanks in advance.

--
 
G

garyusenet

If you are dealing with just ASCII text and not Unicode.

You could start reading the file for binary access. Check in the using
block if any of the bytes are zero, if so return false, if not return
true.

This isn't 100% but it would be very unlikely for you to find a zero
byte in an 'ordinary' text file.

I'm sure there are better ways of doing this? But this is the only
logical way I can think of.

I cant find any predefined methods for determining this?

HTH Gary.
 
J

Jon Skeet [C# MVP]

Ympostor said:
Is there a method in the .NET class libraries to know if a given file is
binary or just plain text (ASCII)?

"Plain text" can mean a lot more than just ASCII - and files which are
composed entirely of ASCII characters can be binary files too. For
instance, the eicar test virus is entirely represented as plain text,
but it is also executable.
 
S

Stephany Young

You're damn right it's not 100%. The first character in the ASCII character
set has a code of 0 (zero) and is named NUL.

Using NUL as a delimiter in a data file that otherwise contains 'text' is a
very valid practice that is widely used.

Unless a given file has a special feature, such as as preamble that provides
such information (some UTF encoded files contain such a preamble to indicate
the spefic UTF encoding used), there is no way of determining (from the
content) if a file should be treated as binary or text.

Passing data by way of a file is a form of a contract in which the provider
usually says 'I will produce files is such-and-such a way and here is the
documentation to enable you to interpret the content'.

Way too may people seem to think that they can grab any old data file and
that there will be a magic bullet so that they can read the file without
having to think about how they should be doing it.

One sure way way is to open the file in question is your favourtie editor
and eyeball the content. If it looks like readable text then it probably is.
If it doesn't look like readable text then reading it binary it probably the
way to go.

If in doubt ask the producer of the file!!!!!!!!!!!!
 
L

Lucian Wischik

Ympostor said:
Is there a method in the .NET class libraries to know if a given file is
binary or just plain text (ASCII)?

"FindMimeFromData". But it's a win32 function, and I couldn't find it
in .net, so you'll have to pinvoke it.
 
Y

Ympostor

Thanks for all replies!

Lucian Wischik escribió:
"FindMimeFromData". But it's a win32 function, and I couldn't find it
in .net, so you'll have to pinvoke it.

But if I manage to get the Mime data, how can I still know if that
specific mime type is binary or not?

Regards.

--
 
G

Guest

Ciaran said:
All files are techinically binary. There is no difference to the file system.
You could check the first 100 characters to see if there are non printable
characters in there.

That depends on the file system.

Some file systems actually carry meta data about how the
content bytes should be interpreted.

Most common Windows and Unix/Linux file systems does not though.

Arne
 
?

=?ISO-8859-1?Q?Arne_Vajh=F8j?=

Ympostor said:
Is there a method in the .NET class libraries to know if a given file is
binary or just plain text (ASCII)?

As already stated by other then there are neither a clear definition
of binary or an safe programmatic way of checking.

From a practical point of view, then you can make an heuristic
test based on the files content.

If the file is in a western language then checking if X %
of the bytes are in the 32-126 range should work pretty well.

I have used X = 80 previously.

Arne
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top