How to identify file formats?

  • Thread starter Thread starter Guest
  • Start date Start date
G

Guest

Hi,

I need to find a way to identify between a few different file formats
WITHOUT looking at the file extension. Very often our customers will name
file incorrectly. For example, they'll send us a file that's named
'filename.xls', but it's actually a tab delimited or comma delimited file.

The possible formats that I need to identify are: HTML, tab delimited, comma
delimited or Excel.
 
Eric,

This seems to me if the extentions are made for nothing and that it is
possible to find the format by its content. Is this not a strange question
therefore?

I think that than there would have been never extentions, it was hard to get
that dicipline with users (and you even did not manage that with yours).

Another case is if a file does not has the file format it need to have,
however that will probably fail directly in a try and catch when you have
done it right.

Just my thought,

Cor
 
You can guess for well known formats like Excel or HTML by looking at the
header. But it would be next to impossible to distinguish tab-delimited and
comma seperated, unless you have stringent rules of enclosing string values
in quotes etc.

Rgds,
Anand M
http://www.dotnetindia.com
 
Eric said:
I need to find a way to identify between a few different file formats
WITHOUT looking at the file extension. Very often our customers will name
file incorrectly. For example, they'll send us a file that's named
'filename.xls', but it's actually a tab delimited or comma delimited file.

The possible formats that I need to identify are: HTML, tab delimited, comma
delimited or Excel.

You'll have to analyze the file's content/header.

Some file format specifications can be found here:

<URL:http://www.wotsit.org/>
 
¤ Hi,
¤
¤ I need to find a way to identify between a few different file formats
¤ WITHOUT looking at the file extension. Very often our customers will name
¤ file incorrectly. For example, they'll send us a file that's named
¤ 'filename.xls', but it's actually a tab delimited or comma delimited file.
¤
¤ The possible formats that I need to identify are: HTML, tab delimited, comma
¤ delimited or Excel.

Can't be done programmatically. There is nothing particularly unique about tab delimited or comma
delimited files (other than the delimiter).

Unless you add a customized header or identifier of some type you won't likely be able to tell them
apart without looking at them.


Paul ~~~ (e-mail address removed)
Microsoft MVP (Visual Basic)
 
Back
Top