How to identify file formats?

G

Guest

Hi,

I need to find a way to identify between a few different file formats
WITHOUT looking at the file extension. Very often our customers will name
file incorrectly. For example, they'll send us a file that's named
'filename.xls', but it's actually a tab delimited or comma delimited file.

The possible formats that I need to identify are: HTML, tab delimited, comma
delimited or Excel.
 
C

Cor Ligthert

Eric,

This seems to me if the extentions are made for nothing and that it is
possible to find the format by its content. Is this not a strange question
therefore?

I think that than there would have been never extentions, it was hard to get
that dicipline with users (and you even did not manage that with yours).

Another case is if a file does not has the file format it need to have,
however that will probably fail directly in a try and catch when you have
done it right.

Just my thought,

Cor
 
G

Guest

You can guess for well known formats like Excel or HTML by looking at the
header. But it would be next to impossible to distinguish tab-delimited and
comma seperated, unless you have stringent rules of enclosing string values
in quotes etc.

Rgds,
Anand M
http://www.dotnetindia.com
 
H

Herfried K. Wagner [MVP]

Eric said:
I need to find a way to identify between a few different file formats
WITHOUT looking at the file extension. Very often our customers will name
file incorrectly. For example, they'll send us a file that's named
'filename.xls', but it's actually a tab delimited or comma delimited file.

The possible formats that I need to identify are: HTML, tab delimited, comma
delimited or Excel.

You'll have to analyze the file's content/header.

Some file format specifications can be found here:

<URL:http://www.wotsit.org/>
 
P

Paul Clement

¤ Hi,
¤
¤ I need to find a way to identify between a few different file formats
¤ WITHOUT looking at the file extension. Very often our customers will name
¤ file incorrectly. For example, they'll send us a file that's named
¤ 'filename.xls', but it's actually a tab delimited or comma delimited file.
¤
¤ The possible formats that I need to identify are: HTML, tab delimited, comma
¤ delimited or Excel.

Can't be done programmatically. There is nothing particularly unique about tab delimited or comma
delimited files (other than the delimiter).

Unless you add a customized header or identifier of some type you won't likely be able to tell them
apart without looking at them.


Paul ~~~ (e-mail address removed)
Microsoft MVP (Visual Basic)
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Top