Object Hash of Contents

  • Thread starter Thread starter Evan Camilleri
  • Start date Start date
E

Evan Camilleri

What is the fastest way to get the 'hash' (or CRC32 or whatever) of the
contents of an object

i don't care what's inside, I just want the 'hash' of its contents

(not to mixed with object.GetHashCode() which gives a hash code of an
instance, I want the hash of the data contents)

Evan
 
Evan said:
What is the fastest way to get the 'hash' (or CRC32 or whatever) of the
contents of an object

In both of your questions, you are overloooking a critical detail: there
is no consistent definition of an object's "contents".

You can come the closest, IMHO, for objects that can be serialized.
Then you can serialize them and use the resulting stream of data
(whether it's XML or binary) for your purposes.

But not all objects can be serialized, and unless you're talking about a
struct or class that only has value-type fields as data, there's not a
single block of data that represents the object.

If you can define "contents" in a way that allows you to address your
first question ("memory stream"), then the answer to the second ("hash")
can be discussed (though I think that answering "fastest" is
problematic, since any hash is going to have tradeoffs with respect to
speed versus how you intend to use it...the method that's truly
"fastest" may not work for your purposes).

Without a better definition and criteria for how you want to do this, I
don't think it's really possible to answer your question in a reasonable
way.

Pete
 
AFAIK

MD5 Hash is fast and pretty reliable for its purpose CRC is more reliable
but also much slower

MD5 hash is often used to quickly compare 2 binary`s ( are they the same or
not , idea; for update/ file synchronization programs etc etc )
CRC is often used by compression program`s ( winzip , winrar etc etc ) to
check if they are exactly the same on byte level ( to check if the file not
has gone corrupted during deflation )

HTH

Michel
 
Michel Posseth said:
MD5 Hash is fast and pretty reliable for its purpose CRC is more reliable
but also much slower

I'm pretty sure that's the wrong way round. In particular, CRCs don't
attempt to foil deliberate attempts to circumvent them.
MD5 hash is often used to quickly compare 2 binary`s ( are they the same or
not , idea; for update/ file synchronization programs etc etc )
CRC is often used by compression program`s ( winzip , winrar etc etc ) to
check if they are exactly the same on byte level ( to check if the file not
has gone corrupted during deflation )

Well, MD5 can be used for the latter as well, but is harder to
deliberately fool.

MD5 isn't the safest hash algorithm around - there are ways to break it
in certain circumstances - but it's a lot safer from a tampering point
of view than CRC.
 
Jon said:
I'm pretty sure that's the wrong way round. In particular, CRCs don't
attempt to foil deliberate attempts to circumvent them.

No doubt about the reliable part.

Your assumption about the speed part is the same I had. CRC ought
to be much faster than MD5.

But apparently it is not.

#ZipLib CRC-32 is only about 5% faster than .NET MD5.

Maybe that implementation is not super good - Java CRC-32 is
40% faster that Java MD5, but still nowhere near the expected
difference.


I guess CRC's are really intended for hardware not for
software.

BTW, the CRC-32's I used are according to Wikipedia not a
real CRC, but are supposed to be faster than real CRC's, so ...

Arne
 
Strange ,,,,


Just one week ago , i was asked to create a network file synchronization
mechanism wich did not care about file versions
"file remote different as the local version , copy it local" as we are
talking about hundreds of files and a total size of + 100 MB
i needed a fast way to check these files .

So i went digging on the web wich algorythm would be the fastest and found
my previous conclusion on various websites

My project is finished and performs superb , but you are telling me now that
CRC ought to be faster but less reliable ??



Michel
 
Michel Posseth said:
Strange ,,,,


Just one week ago , i was asked to create a network file synchronization
mechanism wich did not care about file versions
"file remote different as the local version , copy it local" as we are
talking about hundreds of files and a total size of + 100 MB
i needed a fast way to check these files .

So i went digging on the web wich algorythm would be the fastest and found
my previous conclusion on various websites

My project is finished and performs superb , but you are telling me now that
CRC ought to be faster but less reliable ??

It *may* be faster, depending on the exact implementation. However,
it's unlikely that the hash performance is going to be significant
compared with the IO cost. Hashing 100MB of data is likely to be very
quick with either algorithm.
 
However,
it's unlikely that the hash performance is going to be significant
compared with the IO cost.

Yes ... that is a good one .. in my situation the cost of copying a file
that did not need replacement

However it seems that i got my implentation right as MD5 hash would be more
reliable but probably a bit slower as a CRC
and in my situation it turned out thet this is exactly what i need cause it
is more costly to copy the file over the intranet to the client

So i guess i had a lucky day when i wrote it :-)

regards

And thanks for sharing

Michel
 
Michel Posseth said:
Yes ... that is a good one .. in my situation the cost of copying a file
that did not need replacement

What I meant is that the cost of reading a file in order to calculate
the hash is probably bigger than the computational cost of the hash.
Both MD5 and CRC will require the whole file to be read, so there's no
benefit there either.
 
I actually wanted to see some code since I cannot find how to get MD5 or CRC
for the object's data

thanks

Evan
 
I actually wanted to see some code since I cannot find how to get MD5 or CRC
for the object's data

As Peter said, there's no such concept as "the object's data" that
makes taking an MD5 hash sensible in all cases.

What would the MD5 of a NetworkStream be? Would you have to read all
its contents to find out?

Jon
 

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Back
Top