dcsimg
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 2 of 2 FirstFirst 12
Results 16 to 23 of 23

Thread: How to design unordered_map's hash function for this case and another one?

  1. #16
    Join Date
    Oct 2008
    Posts
    1,456

    Re: How to design unordered_map's hash function for this case and another one?

    Quote Originally Posted by tiliavirga View Post
    I've pointed at the relevant section in the C++ 11 standard
    no you didn't

    firstly, the excerpt about the probability of collisions is in a note and notes are non normative ( they are never requirements )

    second, even if we take that note literally, given a a pair of uniformly picked long long's, the probability that their first 32 bits to be equal is EXACTLY 1 / (1+max of size_t).

    >> But you are suggesting that std::hash<long long> could produce equal hash values with probability 1 for certain numbers in the long long range

    they're called collisions and are unavoidable when you work with more than sizeof(std::size_t), and certainly the standard doesn't forbid them ...

    third, not only gcc but also (AFAIR)clang and boost hash have implementations like this.

    the moral of the story is monarch's comment in post #3, people should not write hash by themselves unless they really know what they're doing ( there's a reason why even hash_combine was not included in the final standard ... ).

  2. #17
    Join Date
    Jun 2009
    Location
    France
    Posts
    2,513

    Re: How to design unordered_map's hash function for this case and another one?

    Quote Originally Posted by laserlight View Post
    Agreed: unless it can be shown to be a mistake in the standard or is an acceptable language/library extension, where a compiler differs from the standard, it is a bug in the compiler.
    I would agree too, were the compiler to differ from the standard. But truncating a long long to size_t *is* a standard compliant hash. Take any 2 64 bit numbers. Probability of collision when truncated to 32 bits:
    2^^32 / 2^^64 == 1 / 2^^32 == 1.0 / numeric_limits<size_t>::max()

    Quote Originally Posted by laserlight View Post
    I tried monarch_dodra's test program from post #13 with g++ 4.8.4 with -std=c++11 and received as the program's output:
    4294967297
    8589934593
    53021371269121
    53021371269127
    The issue (I think) is that you did not pass the -m32 option to compile in 32 bits?

    In any case, there is something off, as 53021371269127 does not fit in a 32 bit size_t. (or any of them as a matter of fact).

    Quote Originally Posted by laserlight View Post
    It seems that there was a bug here in the standard library implementation, but it has since been fixed, though monarch_dodra is presumably still using an older version. This can hardly be included in the "experimental" disclaimer of g++ that is meant to excuse implementations of components following drafts of C++11 before publication: a bad implementation is a bad implementation.
    I'm using 4.8.4. There is not a bug (again, the hash is standard compliant), and nothing has been fixed. The relevant source code for 4.8.4 is here:
    https://github.com/gcc-mirror/gcc/blob/gcc-4_8-branch/libstdc++-v3/include/bits/functional_hash.h

    Code:
      // Explicit specializations for integer types.
    #define _Cxx_hashtable_define_trivial_hash(_Tp) 	\
      template<>						\
        struct hash<_Tp> : public __hash_base<size_t, _Tp>  \
        {                                                   \
          size_t                                            \
          operator()(_Tp __val) const noexcept              \
          { return static_cast<size_t>(__val); }            \
        };
    There have been changes to the file since, but not relative to integral hashing.
    Last edited by monarch_dodra; October 1st, 2015 at 03:09 AM.
    Is your question related to IO?
    Read this C++ FAQ article at parashift by Marshall Cline. In particular points 1-6.
    It will explain how to correctly deal with IO, how to validate input, and why you shouldn't count on "while(!in.eof())". And it always makes for excellent reading.

  3. #18
    Join Date
    Jan 2006
    Location
    Singapore
    Posts
    6,768

    Re: How to design unordered_map's hash function for this case and another one?

    Quote Originally Posted by monarch_dodra
    The issue (I think) is that you did not pass the -m32 option to compile in 32 bits?

    In any case, there is something off, as 53021371269127 does not fit in a 32 bit size_t. (or any of them as a matter of fact).
    Ah yes, my bad.
    C + C++ Compiler: MinGW port of GCC
    Build + Version Control System: SCons + Bazaar

    Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
    Kindly rate my posts if you found them useful

  4. #19
    Join Date
    Jun 2015
    Posts
    208

    Re: How to design unordered_map's hash function for this case and another one?

    Quote Originally Posted by superbonzo View Post
    no you didn't
    Yes I did. 17.6.3.4 describes the requirements on hash function objects. No one including you have pointed at anything with higher relevance.

    firstly, the excerpt about the probability of collisions is in a note and notes are non normative ( they are never requirements )
    Both normative and informative sections are part of the standard. The normative sections alone constitute a pretty useless standard. Both are required for compliancy.

    In this case for example all hash function objects could always return 0 and still be normatively compliant. The informative note prevents this and prescribes what kind of value should be returned for compliancy.

    second, even if we take that note literally, given a a pair of uniformly picked long long's, the probability that their first 32 bits to be equal is EXACTLY 1 / (1+max of size_t).
    But you have not taken the note literarly. It is stated that h(t1) and h(t2) should be equal with a very small probability only. Then h(t1) and h(t1+k) should be that too but with the suggested truncation of long long they won't. On the contrary, for a specific k they will be equal with the highest possible probability namely for certain.

    So just truncating the upper part of a long long won't produce a compliant hash code.

    the moral of the story is monarch's comment in post #3, people should not write hash by themselves unless they really know what they're doing
    That also applies to people who think they really know what they're doing.

    No one knows every corner of C++ (except maybe Stroustrup). This means you must trust your compiler to implement C++ according to the principle of least surprise. In this case it looks like VS 2015 is the winner and if you find attention to detail in one corner it's likely to be present in the next.
    Last edited by tiliavirga; October 1st, 2015 at 08:09 AM.

  5. #20
    Join Date
    Oct 2008
    Posts
    1,456

    Re: How to design unordered_map's hash function for this case and another one?

    ... please refer to the ISO documentation for what 'normative' means
    ... please refer to any elementary probability book for what 'probability' means

    just joking, I know there's no hope, as always ...

  6. #21
    Join Date
    Jun 2015
    Posts
    208

    Re: How to design unordered_map's hash function for this case and another one?

    Quote Originally Posted by superbonzo View Post
    ... please refer to the ISO documentation for what 'normative' means
    ... please refer to any elementary probability book for what 'probability' means

    just joking, I know there's no hope, as always ...
    Well, it's never fun to watch someone cave in like this. You are insinuating that the reason would be that I don't know what I'm talking about but it may also be that the impossible has happened - it may have dawned upon you that you may actually be wrong! And indeed you are. To really rub it in I'm going to summarise my points and then leave you in peace to mourn your lost besserwisser innocence.

    It's true that C++ compiler compliancy means that the normative part of the C++ standard is met. But if the informative part is disregarded the normative part may be misunderstood and the compiler still won't be compliant. For every hash function object to always return 0 would be fine normatively but then there's an informative note in effect stating that doing so would go against the purpose of hashing and thus (now that you know better) would not be compliant with the C++ standard.

    You may have missed that the note in question,

    "For two different values t1 and t2, the probability that h(t1) and h(t2) compare equal should be very small, approaching 1.0 / numeric_limits<size_t>::max()."

    is in fact making two propositions,

    P1: h(t1) and h(t2) should very seldom be equal.
    P2: P1 should hold for every different t1 and t2.

    In our case t1 and t2 are long longs and h is std::hash<long long> returning size_t.

    Now implementing h by just truncating the long long parameter to a size_t return value won't violate P1. That's why this implementation will work fine in practice with say std::unordered_set etetera. But P2 is another story. There is a known set of different h1 and h2 for which P1 doesn't hold and for which h(t1) and h(t2) are in fact always equal (it is when h1 = h2 + a specific constant). This violates P2 making the discussed implementation of h non-compliant with the C++ standard.

    Does it matter? Well at least Microsoft though so and acted accordingly when VC++ 2015 was implemented.
    Last edited by tiliavirga; October 2nd, 2015 at 04:05 AM.

  7. #22
    Join Date
    Oct 2008
    Posts
    1,456

    Re: How to design unordered_map's hash function for this case and another one?

    Quote Originally Posted by tiliavirga View Post
    But P2 is another story. There is a known set of different h1 and h2 for which P1 doesn't hold and for which h(t1) and h(t2) are in fact always equal (it is when h1 = h2 + a specific constant). This violates P2 and makes the discussed implementation of h non-compliant with the C++ standard.
    oh dear, ignoring your imaginative interpretation of the wording,

    given ANY surjective hash function h:[0,M]->[0,N] with M > N there ALWAYS exist "known" x != y such as h(x)=h(y), just take ANY y in the preimage of some h(x). So, if your funny requirement was true NO compliant hash function for long long would ever exist on 32 bit systems.

    So, along with elementary probability you may also take a look at elementary set theory as well ...

    regarding what 'normative' means, I could quote ISO wording but sadly I suppose it wouldn't help much.
    regarding why an implementation may choose to implement more "random" looking hashes, there can be many reasons, including things like security ( simple hashes can be attacked ) or implementation internals or quality of implementation reasons ... nothing to do with compliancy anyway
    Last edited by superbonzo; October 2nd, 2015 at 04:11 AM. Reason: typos

  8. #23
    Join Date
    Jun 2015
    Posts
    208

    Re: How to design unordered_map's hash function for this case and another one?

    Quote Originally Posted by superbonzo View Post
    oh dear
    Reiterating definitions, suggesting literature to read-up on, and holding lectures in besserwisser-ish make you look more like a schmuck than the genius you probably have in mind. And they are not arguments. Still I'm sure we agree on the basics. A big peg doesn't fit a small hole so hashing involves collisions. And hashing is fully deterministic so everything is known beforehand.

    What we don't agree on is the general significance of notes in the standard and the significance of a specific note concerning hash function objects. Without going into details again in my view the standard requires a compliant compiler to implement "good hashing" in the consensus sense. Always returning 0 as you claim would be fine won't do.

    We don't seem to be getting anywhere so I quit here, thank you.

Page 2 of 2 FirstFirst 12

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width




On-Demand Webinars (sponsored)