How do I know whether the file is text or binary?

  • Thread starter Alexander Vasilevsky
  • Start date
Z

zacks

How do I know whether the file is text or binary?

http://www.alvas.net- Audio tools for C# and VB.Net developers + Christmas
discount

The "brute force" method would be to open the file with a
BinaryReader, read every byte in the file and if any byte is outside
the range of standard ASCII characters, then it is a binary file. You
may have to take into account possible international characters.

I suppose it could be possible that a true binary file may not contain
any non-standard ASCII characters, but that, IMHO, would be rare.
 
J

Jon Skeet [C# MVP]

The "brute force" method would be to open the file with a
BinaryReader, read every byte in the file and if any byte is outside
the range of standard ASCII characters, then it is a binary file. You
may have to take into account possible international characters.

I suppose it could be possible that a true binary file may not contain
any non-standard ASCII characters, but that, IMHO, would be rare.

More to the point, it's possible to be a text file which contains non-
ASCII characters. If it's encoded in UTF-8, for instance, it may well
contain a BOM at the start which is non-ASCII, as well as non-ASCII
encoded characters.

For the OP: there's no such thing as a "text file" or a "binary file"
really - it's all a question of interpretation. A file is (at least in
the simple case - there may be alternate streams etc) just a sequence
of bytes. Any particular sequence of bytes can be treated as binary,
or perhaps treated as text depending on which encoding is chosen.

Jon
 
M

Michael A. Covington

For the OP: there's no such thing as a "text file" or a "binary file"
really - it's all a question of interpretation. A file is (at least in
the simple case - there may be alternate streams etc) just a sequence
of bytes. Any particular sequence of bytes can be treated as binary,
or perhaps treated as text depending on which encoding is chosen.

Well said. I recall several mainframe operating systems that make the
distinction, but in Windows, a file does not have a type. Every file is a
stream of bytes, and some streams of bytes can be interpreted as text.

Of course files have names, and the names end with extensions, and *by
convention* many pieces of software assume that the extension gives
information about the file. But the OS does not keep track of file types.
 
B

Ben Voigt [C++ MVP]

Michael A. Covington said:
Well said. I recall several mainframe operating systems that make the
distinction, but in Windows, a file does not have a type. Every file is a
stream of bytes, and some streams of bytes can be interpreted as text.

Of course files have names, and the names end with extensions, and *by
convention* many pieces of software assume that the extension gives
information about the file. But the OS does not keep track of file types.

The OS doesn't define any mandatory file type metadata. But if an
application adds metadata to the file, the OS most certainly keeps track of
it. Of course, it would still not be meaningful as a file type to the OS,
it would be just as much a convention as the file extension.
 
A

Arne Vajhøj

Alexander said:
How do I know whether the file is text or binary?

As many other has states then there are really no way to tell.

If you implement the following logic:
read the first 1000 bytes file
if number of <LF> > 5 and
number of <NUL> == 0 and
frequency of ' '..'~' > 90%
then
return text
else
return binary
end if

You will be correct in 95+% of cases for western language text.

Arne
 
T

Tom

I'm just now transitioning from C/C++ structured to C# and am working
with various file types in the learning process. My work in C involved
a lot of binary file I/O as well as a lot of text data file input and
conversion. Your question is one I have worked on recently too but do
not have fully resolved as yet ... but here's some comments >>

As others have pointed out ... the files are just streams of bytes and
it is how you are able to interpret them that is the key.

If the file is binary, you'd have to have the specific algorithm to
utilize it ... or be a very good code breaker. Using WinHex is a good
first look tool to use to see if you have any interest in that area.
It takes a little digging to get proficient with it ... but you can
certainly see every bit and various translations as you explore the
file.

If the file is text, the cultural and encoding attributes need to be
dealt with. If you are only working in English ... that simplifies
things greatly. My approach is to open the file in a RichTextBox and
view the first 2000 bytes. In a blink you'll know if it is readable or
gibberish.

As a learning project I am still polishing on an enhanced FilePicker
that began with Petzold's Directory TreeView and File ListView. I have
added a RichTextBox in a splitter panel below the tree and list
panels. I set it up as read only to assure no accidental editing
occurs. Select a file in the list and on the same form you get a fast
preview. Easy to recognize as text and also provides enough to usually
determine if it is the file I am targeting.

I'm still trying to get a working understanding of binding and
DataGridView to replace the list with a more feature packed class;
however, all that comes at price of speed and footprint. I'm not sure
if it is worth it ... but it sure provides a good focus to use for
learning. Also, I am working on a buffering algorithm to allow fast
viewing of huge data files without loading the entire file. Endless
enhancements are possible and such a project you might find fun.

If you decide to explore WinHex ... make a binary file with some known
values of doubles and various sized ints. Then when examining it you
will know what you are looking for and it becomes a lot easier. Also
open a short little NotePad txt file and give it a view. It's really
pretty interesting.

Best of Luck.

-- Tom
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads


Top