Fedora Linux Support Community & Resources Center

Go Back   FedoraForum.org > Fedora 17/18 > Using Fedora
FedoraForum Search

Forgot Password? Join Us!

Using Fedora General support for current versions. Ask questions about Fedora and it's software that do not belong in any other forum.

Reply
 
Thread Tools Search this Thread Display Modes
  #1  
Old 11th June 2005, 08:30 PM
tejas's Avatar
tejas Offline
Registered User
 
Join Date: Feb 2005
Location: Bangalore
Age: 25
Posts: 1,574
md5sum theory

Hello, I had some doubts about how md5sum works, which was hard to find answers in google for because everywhere I look is only a link to download an md5sum checker.

I got to thinking. I'f I'm not wrong, I think it must be possible for 2 files to have the same md5sum. Each md5sum has 32 letter/numbers, resulting in a total no of combinations of (26 + 26 + 10) ^ 32 = 2.27 * 10^57 combinations (2272657884496751345355241563627544170162852933518 655225856) to be exact. (yes, I used bc, and cut and pasted). Can the md5sum exceed 32 digits?

As there are probably more files then this, there must be at least two files with the same md5sum.

If not: (md5sum is unique)
and, every file has a unique one, why isn't it possible to reverse the md5sum algorithm, and generate a complete file from the md5sum?

If true: (two files have same md5sum)
what are the conditions for two files to have the same md5sum? Does this mean it is not really as accurate as I believed? Would it be possible, in theory, to generate a file from the md5sum and another parameter, say, the size, and the first n bytes of the file?

Please point me in the wrong direction.

PS: No, I'm not drunk

-Tejas Dinkar
__________________
I Was Counted! Registered Linux User #384707

BASH && Beer && Bass! Now THAT makes sense!!
Reply With Quote
  #2  
Old 11th June 2005, 08:38 PM
jtang613's Avatar
jtang613 Offline
Registered User
 
Join Date: Apr 2004
Location: Ottawa, Canada
Posts: 1,932
Here's the RFC that describes The MD5 Message Digest Algorithm
http://www.faqs.org/rfcs/rfc1321.html

Enjoy,
Jason
__________________
There is no 'CTRL' button on Chuck Norris's computer. Chuck Norris is always in control.
Reply With Quote
  #3  
Old 11th June 2005, 08:42 PM
tejas's Avatar
tejas Offline
Registered User
 
Join Date: Feb 2005
Location: Bangalore
Age: 25
Posts: 1,574
thanks jtang613

I was hoping (but not really expecting) for a shorter answer.

Anyway, I'll have to read this when it is not 1:30am (ie. NOW!)

Looks complicated
__________________
I Was Counted! Registered Linux User #384707

BASH && Beer && Bass! Now THAT makes sense!!
Reply With Quote
  #4  
Old 11th June 2005, 08:47 PM
Shakes's Avatar
Shakes Offline
Registered User
 
Join Date: Feb 2005
Location: Scotland
Posts: 445
From the link given

Quote:
4. Summary

The MD5 message-digest algorithm is simple to implement, and provides
a "fingerprint" or message digest of a message of arbitrary length.
It is conjectured that the difficulty of coming up with two messages
having the same message digest is on the order of 2^64 operations,
and that the difficulty of coming up with any message having a given
message digest is on the order of 2^128 operations. The MD5 algorithm
has been carefully scrutinized for weaknesses. It is, however, a
relatively new algorithm and further security analysis is of course
justified, as is the case with any new proposal of this sort.
Reply With Quote
  #5  
Old 11th June 2005, 09:11 PM
keithmo Offline
Registered User
 
Join Date: Apr 2005
Location: Wrocław, Poland
Age: 49
Posts: 38
There has been a lot work recently on MD5 collisions. See http://www.cits.rub.de/MD5Collisions/ for an easy read on the subject.
Reply With Quote
  #6  
Old 12th June 2005, 04:46 AM
tejas's Avatar
tejas Offline
Registered User
 
Join Date: Feb 2005
Location: Bangalore
Age: 25
Posts: 1,574
Very easy to understand keithmo.

They actually let you get a copy of two files with the same md5sum

Well, I guess the only way to be 100% accurate algorithm is to download the file again.

Just out of curiosity, has there ever been a documented case where the incorrect file downloaded has the same md5sum as the original. I know the chances are a:2.27*10^57, but still?
__________________
I Was Counted! Registered Linux User #384707

BASH && Beer && Bass! Now THAT makes sense!!
Reply With Quote
  #7  
Old 12th June 2005, 11:15 AM
keithmo Offline
Registered User
 
Join Date: Apr 2005
Location: Wrocław, Poland
Age: 49
Posts: 38
The number of possible MD5 hashes is actually 2^128, which is about 3.4*10^38.

(Each MD5 sum has 32 characters, but they're really hexadecimal digits. The string is a hex representation of the 128-bit hash. Since the hash is 128 bits, the number of hash values is 2^128.)

I've never heard of a case of an "accidental" MD5 collision. It's not likely, but it's not impossible either.
Reply With Quote
  #8  
Old 12th June 2005, 12:15 PM
markkuk Offline
Registered User
 
Join Date: Apr 2005
Location: Finland
Posts: 5,076
Quote:
Originally Posted by tejas
I got to thinking. I'f I'm not wrong, I think it must be possible for 2 files to have the same md5sum. Each md5sum has 32 letter/numbers, resulting in a total no of combinations of (26 + 26 + 10) ^ 32 = 2.27 * 10^57 combinations (2272657884496751345355241563627544170162852933518 655225856) to be exact. (yes, I used bc, and cut and pasted). Can the md5sum exceed 32 digits?

As there are probably more files then this, there must be at least two files with the same md5sum.
Looks like you don't understand how large that number (2.27E57) really is. Earth has around 10^49 - 10^50 atoms according to Fermilab and other sources. Current technology doesn't allow storing 10 million files on a single atom

Even with the correct number of possible MD5 values (3.4E38) is much larger than any reasonable number of files in existence, it would mean about 5*10^28 files for each person on Earth.
Reply With Quote
  #9  
Old 12th June 2005, 12:49 PM
keithmo Offline
Registered User
 
Join Date: Apr 2005
Location: Wrocław, Poland
Age: 49
Posts: 38
Accidental collisions are thankfully quite rare.

The more interesting question is: how difficult is it to create a "malicious collision"? This is the subject of the document I linked to earlier. The authors provide two postscript files. They are the same length. They have identical MD5 hashes. However, they print differently (one is a letter of recommendation for Alice, the other grants her a higher security clearance).

The authors claim it took "just a few hours on a customary PC" to find the collision.

MD5 is starting to crack under combined pressure of Moore's Law and modern cryptoanalysis techniques. Other available algorithms (SHA-256, for example) appear to be more resistant to these attacks. Only time will tell.
Reply With Quote
  #10  
Old 12th June 2005, 12:57 PM
bitrain's Avatar
bitrain Offline
Registered User
 
Join Date: Nov 2004
Location: Netherlands
Age: 25
Posts: 1,426
When md5 isn't good enough any more, they just make the 32 bits something like 1024 (or more) and then it will take even on the most modern computers a long time.
__________________
Registered Linux user number 389291

Laptop: Nec Versa p550, Pentium M 1.86GHz, 1024MB ram, x300, 80 GB HD, bluetooth, 2915BG Wlan card
Desktop: Amd Athlon x2 4200+, 2GB ram, Geforce 7300GT 512MB silent, 160GB HD in a nice centurion 534 case :cool:
Reply With Quote
  #11  
Old 12th June 2005, 01:29 PM
tejas's Avatar
tejas Offline
Registered User
 
Join Date: Feb 2005
Location: Bangalore
Age: 25
Posts: 1,574
Quote:
Originally Posted by keithmo
but they're really hexadecimal digits. The string is a hex representation of the 128-bit hash.
I just learnt something new. I did an md5sum. As I saw a couple of letters, I assumed that the md5sum could contain A-Z, a-z and 0-9, hense 26 + 26 + 10. I see the error now. I never realised that none of the letters exceeded 'f', corresponding to 15 in hex. Thanks for the clarification.
__________________
I Was Counted! Registered Linux User #384707

BASH && Beer && Bass! Now THAT makes sense!!
Reply With Quote
Reply

Tags
md5sum, theory

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off

Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Book for theory of CS lawson Programming & Packaging 27 25th June 2009 07:57 PM
VPN theory: where to set-up Mat Servers & Networking 3 7th September 2007 05:06 PM
WEP Cracking Theory Slider Servers & Networking 2 26th July 2005 01:03 PM
DNS Setup & Theory pzeigler Servers & Networking 0 9th May 2005 04:34 AM
My theory on upgrading; what's yours? Psquared Fedora Focus 17 27th October 2004 03:00 AM


Current GMT-time: 15:42 (Saturday, 25-05-2013)

TopSubscribe to XML RSS for all Threads in all ForumsFedoraForumDotOrg Archive
logo

All trademarks, and forum posts in this site are property of their respective owner(s).
FedoraForum.org is privately owned and is not directly sponsored by the Fedora Project or Red Hat, Inc.

Privacy Policy | Term of Use | Posting Guidelines | Archive | Contact Us | Founding Members

Powered by vBulletin® Copyright ©2000 - 2012, vBulletin Solutions, Inc.

FedoraForum is Powered by RedHat