PC Review


Reply
Thread Tools Rate Thread

calculate an unique id of a file

 
 
timor.super@gmail.com
Guest
Posts: n/a
 
      17th Oct 2008
Hi group,

I need a way of calculating an unique id for a file.

I've seen things like Crc32, 64, checksum .... there's a list here :
http://en.wikipedia.org/wiki/List_of_hash_functions

What is the best option for me ? I have to identify larges files and
small files. I need something fast if possible. How can I be sure that
it is unique ?

Thanks for any advice
 
Reply With Quote
 
 
 
 
Hans Kesting
Guest
Posts: n/a
 
      17th Oct 2008
(E-Mail Removed) pretended :
> Hi group,
>
> I need a way of calculating an unique id for a file.
>
> I've seen things like Crc32, 64, checksum .... there's a list here :
> http://en.wikipedia.org/wiki/List_of_hash_functions
>
> What is the best option for me ? I have to identify larges files and
> small files. I need something fast if possible. How can I be sure that
> it is unique ?
>
> Thanks for any advice


I don't think you can guarantee that you can identify file uniquely by
some hash code.
Say you calculate a hash of a single byte. This can hold 256 distinct
values, so by the time you encode file #257 you *will* have found a
duplicate hashcode. For a hash of a 32-bit integer, that amount is in
the billions while larger sizes go to astronomical numbers, but still
you cannot guarantee that there will be no duplicates.

As for speed, large files will require more time, as every byte has to
be read and processed.

Hans Kesting


 
Reply With Quote
 
Peter Morris
Guest
Posts: n/a
 
      17th Oct 2008
Based only on the data or the filename too?

If you have the same filename, same size, and same hashcode it is likely it
is the same file. However, you don't say whether or not the filename is
important. If it isn't a factor then I would probably make the ID based on

FileSize/Hash1/Hash2

Where Hash1 and Hash2 are the results of two different hash algorithms.


Pete
====
http://mrpmorris.blogspot.com
http://www.capableobjects.com

 
Reply With Quote
 
timor.super@gmail.com
Guest
Posts: n/a
 
      17th Oct 2008
On 17 oct, 19:18, rossum <rossu...@coldmail.com> wrote:
> On Fri, 17 Oct 2008 05:23:05 -0700 (PDT), timor.su...@gmail.com wrote:
> >Hi group,

>
> >I need a way of calculating an unique id for a file.

>
> >I've seen things like Crc32, 64, checksum .... there's a list here :
> >http://en.wikipedia.org/wiki/List_of_hash_functions

>
> >What is the best option for me ? I have to identify larges files and
> >small files. I need something fast if possible. How can I be sure that
> >it is unique ?

>
> >Thanks for any advice

>
> No hash function can guarantee uniqueness; a CRC32 will have a
> collision probability of between 1 and 1 in 2^32. *The 1 is for cases
> where you should have used a cryptographically secure hash function,
> i.e. where there is someone deliberately trying to break your system
> or the Data Protection law requires a reasonable level of security.
>
> For non-cryptographic use go for a CRC with sufficient size to reduce
> the collision probability to an acceptably low level.
>
> For cryptographic purposes use SHA-256 or SHA-512. *The MD series are
> broken and SHA-384 just calculates a SHA-512 result and truncates it
> so you might as well go for SHA-512.
>
> rossum


Thanks for your answer,
I think I don't need SHA.

In fact, I have to know if a file has been modified between to access.
Then, if I use a crc64, it should be enough to know that file has been
modified. Ins't it ?

Do you think this class is good ? http://damieng.com/blog/2007/11/19/c...4-in-c-and-net

Best regards
 
Reply With Quote
 
Arne Vajhøj
Guest
Posts: n/a
 
      18th Oct 2008
(E-Mail Removed) wrote:
> I need a way of calculating an unique id for a file.
>
> I've seen things like Crc32, 64, checksum .... there's a list here :
> http://en.wikipedia.org/wiki/List_of_hash_functions
>
> What is the best option for me ? I have to identify larges files and
> small files. I need something fast if possible. How can I be sure that
> it is unique ?


Any CRC or Hash should do.

Hash will have less probability of collisions than CRC.

If you need to guarantee no collisions, then you can not use
any of them.

If you can live with a small probability of collisions, then
everything is possible.

The important questions is then whether you need to worry about
files with identical start but different end. Because if not then
you could speed up the process a lot by only checksumming
up to like 100 KB of data.

Arne
 
Reply With Quote
 
 
 
Reply

Thread Tools
Rate This Thread
Rate This Thread:

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are Off


Similar Threads
Thread Thread Starter Forum Replies Last Post
SUMPRODUCT to calculate unique occurences of string in column of d WildWill Microsoft Excel Misc 3 3rd Apr 2009 03:16 PM
How do I autosave a unique Excel file in the file format 03F67000 =?Utf-8?B?Zmx5Ym95MDY=?= Microsoft Excel Setup 1 14th Jun 2006 11:11 PM
capture unique values and calculate bcamp1973 Microsoft Excel Misc 4 15th May 2006 06:06 PM
Macro Save File (Unique file name) =?Utf-8?B?U0pD?= Microsoft Excel Worksheet Functions 5 27th Oct 2005 10:09 PM
Is there a way to calculate UNIQUE data? Mark Livingstone Microsoft Excel Worksheet Functions 3 21st Mar 2004 02:59 PM


Features
 

Advertising
 

Newsgroups
 


All times are GMT +1. The time now is 06:42 PM.