 |
 |
 |
 |
| Using Fedora General support for current versions. Ask questions about Fedora and it's software that do not belong in any other forum. |

11th June 2005, 08:30 PM
|
 |
Registered User
|
|
Join Date: Feb 2005
Location: Bangalore
Age: 25
Posts: 1,574

|
|
|
md5sum theory
Hello, I had some doubts about how md5sum works, which was hard to find answers in google for because everywhere I look is only a link to download an md5sum checker.
I got to thinking. I'f I'm not wrong, I think it must be possible for 2 files to have the same md5sum. Each md5sum has 32 letter/numbers, resulting in a total no of combinations of (26 + 26 + 10) ^ 32 = 2.27 * 10^57 combinations (2272657884496751345355241563627544170162852933518 655225856) to be exact. (yes, I used bc, and cut and pasted). Can the md5sum exceed 32 digits?
As there are probably more files then this, there must be at least two files with the same md5sum.
If not: (md5sum is unique)
and, every file has a unique one, why isn't it possible to reverse the md5sum algorithm, and generate a complete file from the md5sum?
If true: (two files have same md5sum)
what are the conditions for two files to have the same md5sum? Does this mean it is not really as accurate as I believed? Would it be possible, in theory, to generate a file from the md5sum and another parameter, say, the size, and the first n bytes of the file?
Please point me in the wrong direction.
PS: No, I'm not drunk
-Tejas Dinkar
|

11th June 2005, 08:38 PM
|
 |
Registered User
|
|
Join Date: Apr 2004
Location: Ottawa, Canada
Posts: 1,932

|
|
Here's the RFC that describes The MD5 Message Digest Algorithm
http://www.faqs.org/rfcs/rfc1321.html
Enjoy,
Jason
__________________
There is no 'CTRL' button on Chuck Norris's computer. Chuck Norris is always in control.
|

11th June 2005, 08:42 PM
|
 |
Registered User
|
|
Join Date: Feb 2005
Location: Bangalore
Age: 25
Posts: 1,574

|
|
|
thanks jtang613
I was hoping (but not really expecting) for a shorter answer.
Anyway, I'll have to read this when it is not 1:30am (ie. NOW!)
Looks complicated
|

11th June 2005, 08:47 PM
|
 |
Registered User
|
|
Join Date: Feb 2005
Location: Scotland
Posts: 445

|
|
From the link given
Quote:
4. Summary
The MD5 message-digest algorithm is simple to implement, and provides
a "fingerprint" or message digest of a message of arbitrary length.
It is conjectured that the difficulty of coming up with two messages
having the same message digest is on the order of 2^64 operations,
and that the difficulty of coming up with any message having a given
message digest is on the order of 2^128 operations. The MD5 algorithm
has been carefully scrutinized for weaknesses. It is, however, a
relatively new algorithm and further security analysis is of course
justified, as is the case with any new proposal of this sort.
|
|

11th June 2005, 09:11 PM
|
|
Registered User
|
|
Join Date: Apr 2005
Location: Wrocław, Poland
Age: 49
Posts: 38

|
|
There has been a lot work recently on MD5 collisions. See http://www.cits.rub.de/MD5Collisions/ for an easy read on the subject.
|

12th June 2005, 04:46 AM
|
 |
Registered User
|
|
Join Date: Feb 2005
Location: Bangalore
Age: 25
Posts: 1,574

|
|
Very easy to understand keithmo.
They actually let you get a copy of two files with the same md5sum
Well, I guess the only way to be 100% accurate algorithm is to download the file again.
Just out of curiosity, has there ever been a documented case where the incorrect file downloaded has the same md5sum as the original. I know the chances are a:2.27*10^57, but still?
|

12th June 2005, 11:15 AM
|
|
Registered User
|
|
Join Date: Apr 2005
Location: Wrocław, Poland
Age: 49
Posts: 38

|
|
|
The number of possible MD5 hashes is actually 2^128, which is about 3.4*10^38.
(Each MD5 sum has 32 characters, but they're really hexadecimal digits. The string is a hex representation of the 128-bit hash. Since the hash is 128 bits, the number of hash values is 2^128.)
I've never heard of a case of an "accidental" MD5 collision. It's not likely, but it's not impossible either.
|

12th June 2005, 12:15 PM
|
|
Registered User
|
|
Join Date: Apr 2005
Location: Finland
Posts: 5,076

|
|
Quote:
|
Originally Posted by tejas
I got to thinking. I'f I'm not wrong, I think it must be possible for 2 files to have the same md5sum. Each md5sum has 32 letter/numbers, resulting in a total no of combinations of (26 + 26 + 10) ^ 32 = 2.27 * 10^57 combinations (2272657884496751345355241563627544170162852933518 655225856) to be exact. (yes, I used bc, and cut and pasted). Can the md5sum exceed 32 digits?
As there are probably more files then this, there must be at least two files with the same md5sum.
|
Looks like you don't understand how large that number (2.27E57) really is. Earth has around 10^49 - 10^50 atoms according to Fermilab and other sources. Current technology doesn't allow storing 10 million files on a single atom
Even with the correct number of possible MD5 values (3.4E38) is much larger than any reasonable number of files in existence, it would mean about 5*10^28 files for each person on Earth.
|

12th June 2005, 12:49 PM
|
|
Registered User
|
|
Join Date: Apr 2005
Location: Wrocław, Poland
Age: 49
Posts: 38

|
|
|
Accidental collisions are thankfully quite rare.
The more interesting question is: how difficult is it to create a "malicious collision"? This is the subject of the document I linked to earlier. The authors provide two postscript files. They are the same length. They have identical MD5 hashes. However, they print differently (one is a letter of recommendation for Alice, the other grants her a higher security clearance).
The authors claim it took "just a few hours on a customary PC" to find the collision.
MD5 is starting to crack under combined pressure of Moore's Law and modern cryptoanalysis techniques. Other available algorithms (SHA-256, for example) appear to be more resistant to these attacks. Only time will tell.
|

12th June 2005, 12:57 PM
|
 |
Registered User
|
|
Join Date: Nov 2004
Location: Netherlands
Age: 25
Posts: 1,426

|
|
|
When md5 isn't good enough any more, they just make the 32 bits something like 1024 (or more) and then it will take even on the most modern computers a long time.
__________________
Registered Linux user number 389291
Laptop: Nec Versa p550, Pentium M 1.86GHz, 1024MB ram, x300, 80 GB HD, bluetooth, 2915BG Wlan card
Desktop: Amd Athlon x2 4200+, 2GB ram, Geforce 7300GT 512MB silent, 160GB HD in a nice centurion 534 case :cool:
|

12th June 2005, 01:29 PM
|
 |
Registered User
|
|
Join Date: Feb 2005
Location: Bangalore
Age: 25
Posts: 1,574

|
|
Quote:
|
Originally Posted by keithmo
but they're really hexadecimal digits. The string is a hex representation of the 128-bit hash.
|
I just learnt something new. I did an md5sum. As I saw a couple of letters, I assumed that the md5sum could contain A-Z, a-z and 0-9, hense 26 + 26 + 10. I see the error now. I never realised that none of the letters exceeded 'f', corresponding to 15 in hex. Thanks for the clarification.
|
| Thread Tools |
Search this Thread |
|
|
|
| Display Modes |
Linear Mode
|
Posting Rules
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
HTML code is Off
|
|
|
Current GMT-time: 15:42 (Saturday, 25-05-2013)
|
|
 |
 |
 |
 |
|
|