• @squaresinger@lemmy.world
    link
    fedilink
    214 hours ago

    This is about cryptographic hashing functions (to be fair, could have spelled that out in my prior comment, but in general when someone talks about anything security relevant in conjunction with hashing, they always mean cryptographic hashing functions).

    MD5 is not a cryptographic hashing function for exactly these reasons.

    Also, the example you gave in your original comment wasn’t actually about distribution but about symbol space.

    By multiplying by four (and I guess you implicitly meant that the bit length of the hash stays the same thus dropping two bits of information) you are reducing the number of possible hashes by a factor of four, because now 3/4 of all potential hashes can’t happen any more. So sure, if your 64bit hash is actually only a 62bit hash that just includes two constant 0 bits, then of course you have to calculate the collision chance for 62bits and not 64bits.

    But if all hashes are still possible, and only the distribution isn’t perfectly even (like is the case with MD5), then the average chance for collisions doesn’t change at all. You have some hashes where collisions are more likely, but they are perfectly balanced with hashes where collisions are less likely.

    • @tyler@programming.dev
      link
      fedilink
      19 hours ago

      The article calls out hash functions and links to the relevant Wikipedia page, so I don’t think this is solely about cryptographic hash functions, though that seems to be what you were talking to the other user about.

      • @squaresinger@lemmy.world
        link
        fedilink
        16 hours ago

        I see what you are saying. But if you aren’t using a cryptographic hash function then collisions don’t matter in your use case anyway, otherwise you’d be using a cryptographic hash function.

        For example, you’d use a non-cryptographic hash function for a hashmap. While collisions aren’t exactly desireable in that use case, they also aren’t bad and in fact, the whole process is designed with them in mind. And it doesn’t matter at all that the distribution might not be perfect.

        So when we are talking about a context where collisions matter, there’s no question whether you should use a cryptographic hash or not.