dcsimg
CodeGuru Home VC++ / MFC / C++ .NET / C# Visual Basic VB Forums Developer.com
Page 1 of 2 12 LastLast
Results 1 to 15 of 23

Thread: How to design unordered_map's hash function for this case and another one?

  1. #1
    Join Date
    Dec 2010
    Posts
    907

    How to design unordered_map's hash function for this case and another one?

    1)
    Code:
    class AStarNodePair {
    public:	 
    	int from;
    	int to;
    
    public:	 
    	AStarNodePair(int f, int t);
    
    	void init(int f, int t) { from = f; to = t; }	
    };
    
    std::unordered_map<AStarNodePair, float, ???> m_actualCosts;
    2)
    Code:
    class GridNode : public AStarNode
    {
    public: 
    	GridNode(int id, const quad::vec2* bb, const std::vector<int>& neis);
      
    	bool operator==(const GridNode& other) {
    		if (m_time_independent_id == other.m_time_independent_id)
     		{
      			return true;
    		 }
     		return false;
    	}	 
    
    	std::vector<int>& neigbours() {
    		return neis;
    	 }
     
    	bool Intersect(const D3DXVECTOR3 &point) const; 
    	float distance(const GridNode* other) const;	 
    	D3DXVECTOR3 GetPos() { return m_vPos; }
    
    protected:
     	void SetPos() { m_vPos = center();  }
    	D3DXVECTOR3 center(); 
    
    private:
    	quad::vec2	m_boundingBox[2];
    	Vector2D           m_pivot;	 
    	CComPtr<ID3DXMesh> m_pMesh;
    	 
    };
    In this case, I want to write a hash function to find a particular element of GridNode in constant time
    Currently, I am storing these GridNodes in a std::vector, and when I find the element,
    I iterate thru the container like this
    Code:
    bool Coordinater::Convert_D3D_To_ID(D3DXVECTOR3 vec3, int& id) 
    {
     	for (auto& g : NavmeshSingleton::GetNavMesher()->GetWalkables())
     	{
    
     		if (g->Intersect(vec3))
     		{
     			id = g->m_time_independent_id;
    			return true;
    		}
    	}
    	return false;
    }
    Now I want to replace this loop with a hash function.
    Rather than begging for code, I want to know what is the method to generate a good hash function,
    Just want to learn how to do it.

    Update1:
    is this a good thing/ to know at least?
    https://www.gnu.org/software/gperf/manual/gperf.html

    Just wondering whether VS 2013 has this sort of goodie?



    Thank you
    Jack
    Last edited by lucky6969b; September 29th, 2015 at 01:14 AM.

  2. #2
    Join Date
    Aug 2000
    Location
    West Virginia
    Posts
    7,716

    Re: How to design unordered_map's hash function for this case and another one?

    There are standard has functions for primitive types, you can just use them.
    Something like the following:

    Code:
    struct AStarNodePair_hash
    {
        std::size_t operator () (const AStarNodePair & nodepair) const
        {
            return std::hash<int>()(nodepair.from);
    
            // or, if you want to use both variables:
            // return std::hash<int>()(nodepair.from) + std::hash<int>()(nodepair.to);
        }
    };
    
    
    std::unordered_map<AStarNodePair, float, AStarNodePair_hash> m_actualCosts;
    Note: you should also define operator == for your class.
    Last edited by Philip Nicoletti; September 29th, 2015 at 06:24 AM.

  3. #3
    Join Date
    Jun 2009
    Location
    France
    Posts
    2,513

    Re: How to design unordered_map's hash function for this case and another one?

    Honestly, hashing things is more of an art than an exact science. Unless you have a PhD in numeric analysis, it's best to try to not do it on your own. I understand the "I want to know what is the method to generate a good hash function", but in this case, you want to use stuff that already exists. In particular, I'd go for boost::hash's hash_combine:

    http://www.boost.org/doc/libs/1_37_0...h/combine.html
    http://www.boost.org/doc/libs/1_37_0...t.hash_combine

    Myself, I have never used nor needed a perfect hash*. AFAIK, they only make sense if you are hashing data that never changes, such as some sort of compile time constant container. And even then, you'd have to justify that a "normal" hash+hash_table doesn't fit your performance needs.

    *Unless you consider indexing a array by object ID, where the ID is a naturally increasing size_t.
    Is your question related to IO?
    Read this C++ FAQ article at parashift by Marshall Cline. In particular points 1-6.
    It will explain how to correctly deal with IO, how to validate input, and why you shouldn't count on "while(!in.eof())". And it always makes for excellent reading.

  4. #4
    Join Date
    Oct 2008
    Posts
    1,456

    Re: How to design unordered_map's hash function for this case and another one?

    Quote Originally Posted by lucky6969b View Post
    Now I want to replace this loop with a hash function.
    associative containers ( hash based or not ) look up elements based on some equivalence relation; generally speaking, you can't use them to look up elements that satisfy an arbitrary property as you're trying to do. So, hashing your elements won't help you in this use case.

    instead, what you want are so called spatial trees/containers. There are many of them ( AFAIR boost implements the R-tree variant ) both tree and hash based, offering log and constant lookup, respectively. Google 'em for further details...

  5. #5
    Join Date
    Jun 2009
    Location
    France
    Posts
    2,513

    Re: How to design unordered_map's hash function for this case and another one?

    Quote Originally Posted by Philip Nicoletti View Post
    There are standard has functions for primitive types, you can just use them.
    Something like the following:

    Code:
            // or, if you want to use both variables:
            // return std::hash<int>()(nodepair.from) + std::hash<int>()(nodepair.to);
    .
    I highly suggest you don't hash a pair of integers like this. If you do, all your symmetric pairs will collide (eg [1, 2] and [2, 1]). Any symmetric operator is subject to the same issue, such as * or ^.

    If you don't use boost, you can still copy paste its (trivial) implementation. In particular, the hash function for an int is the identity function, so a good hash is:
    size_t hash = lhs + 0x9e3779b9 + (rhs << 6) + (rhs >> 2);
    Is your question related to IO?
    Read this C++ FAQ article at parashift by Marshall Cline. In particular points 1-6.
    It will explain how to correctly deal with IO, how to validate input, and why you shouldn't count on "while(!in.eof())". And it always makes for excellent reading.

  6. #6
    Join Date
    Dec 2010
    Posts
    907

    Re: How to design unordered_map's hash function for this case and another one?

    Hello guys,
    Thanks for helping me with this, really appreciated.
    I've got a related question to this one.
    When I got a std::unordered_map container which contains the some data like what I described above.

    And the data comes from a sqlite database.
    The sqlite database returns data in void* format,
    How can I quickly convert this void* into the unordered_map, I understand that casting from void* to unordered_map is unsafe. But are there any alternatives. My goal is to have it done in a short period of time.
    Thanks
    Jack
    Last edited by lucky6969b; September 30th, 2015 at 12:25 AM.

  7. #7
    Join Date
    Jun 2015
    Posts
    208

    Re: How to design unordered_map's hash function for this case and another one?

    Quote Originally Posted by Philip Nicoletti View Post
    Something like the following:

    return std::hash<int>()(nodepair.from) + std::hash<int>()(nodepair.to);
    This can be improved to avoid the symmetry of adding two ints. Instead the two ints are used to produce a number over the wider long long range which is then hashed using the standard hash function like,

    Code:
    return std::hash<long long>()(static_cast<long long>(nodepair.from)<<32 | static_cast<long long>(nodepair.to));

    Also if the range of the ints is known to be say [0 .. N-1] then this would be a "perfect" hash function,

    Code:
    return (nodepair.from * N) + nodepair.to; // in principle the the index function of a 2D array
    It produces a unique int for any combination of from,to within the [0 .. N*N-1] range which allows the objects to be stored in an ordinary array rather than a hash table. For practical purposes N cannot be too big of course. On the other hand the only reason to use hashing is when N is deemed too big. Hashing primarily is a range reducing measure, a mapping from a wider range onto a narrower.
    Last edited by tiliavirga; September 30th, 2015 at 03:02 AM.

  8. #8
    Join Date
    Jun 2009
    Location
    France
    Posts
    2,513

    Re: How to design unordered_map's hash function for this case and another one?

    Quote Originally Posted by tiliavirga View Post
    This can be improved to avoid the symmetry of adding two ints. Instead the two ints are used to produce a number over the wider long long range which is then hashed using the standard hash function like,

    Code:
    return std::hash<long long>()(static_cast<long long>(nodepair.from)<<32 | static_cast<long long>(nodepair.to));
    One issue you might face with this code, is that the default implementation of hashing integrals for most major compiler vendors (GCC for a fact. MSVC too if memory serves) is simply the identity function (truncated to size_t). So in essence, that code snipet would be equivalent to "return nodepair.to".

    You could argue that the long long hash implementation is not ideal to begin with, but that's what it is.

    Quote Originally Posted by tiliavirga View Post
    Also if the range of the ints is known to be say [0 .. N-1] then this would be a "perfect" hash function,

    Code:
    return (nodepair.from * N) + nodepair.to; // in principle the the index function of a 2D array
    It produces a unique int for any combination of from,to within the [0 .. N*N-1] range which allows the objects to be stored in an ordinary array rather than a hash table. For practical purposes N cannot be too big of course. On the other hand the only reason to use hashing is when N is deemed too big. Hashing primarily is a range reducing measure, a mapping from a wider range onto a narrower.
    That would indeed be an example of a perfect hash function (or more generally, of a "Pairing Function"). It's issue though is that since most has table implementation use modulo to map the hash into a bucket, you may encounter certain patterns which will give you high chances of repeated collisions. For example, if you "paint" an entire column in your space, and N happens to be a multiple of the current hash table's size.

    Cantor's pairing function (https://en.wikipedia.org/wiki/Pairin...iring_function) works on the same idea as yours, but reduces the odds of collisions due to simple patterns. There are still patterns that could cause high amounts of collisions, but these would arguably be less frequent.
    Is your question related to IO?
    Read this C++ FAQ article at parashift by Marshall Cline. In particular points 1-6.
    It will explain how to correctly deal with IO, how to validate input, and why you shouldn't count on "while(!in.eof())". And it always makes for excellent reading.

  9. #9
    Join Date
    Jul 2005
    Location
    Netherlands
    Posts
    2,042

    Re: How to design unordered_map's hash function for this case and another one?

    Quote Originally Posted by lucky6969b View Post
    I've got a related question to this one.
    I don't see the relation between your new question and the OP. Please start a new thread.
    Cheers, D Drmmr

    Please put [code][/code] tags around your code to preserve indentation and make it more readable.

    As long as man ascribes to himself what is merely a posibility, he will not work for the attainment of it. - P. D. Ouspensky

  10. #10
    Join Date
    Jun 2015
    Posts
    208

    Re: How to design unordered_map's hash function for this case and another one?

    Quote Originally Posted by monarch_dodra View Post
    One issue you might face with this code, is that the default implementation of hashing integrals for most major compiler vendors (GCC for a fact. MSVC too if memory serves) is simply the identity function (truncated to size_t). So in essence, that code snipet would be equivalent to "return nodepair.to".
    Are you sure? I've tested this with VS 2015 Community Edition,

    Code:
    size_t hash(int from, int to) {
    	return std::hash<long long>()(static_cast<long long>(from)<<32 | static_cast<long long>(to));
    }
    //
    std::cout << hash(1,1) << std::endl; 
    std::cout << hash(2,1) << std::endl;
    and it gives two different hash values indeed as expected. They aren't equal and certainly not 1 as you implied.

    In the C++ 11 standard, requirements on hash function objects are described in 17.6.3.4. It essentially states that if a hash function object is called with different numbers the probability that the hash values are equal should be very small (in fact 1 / (max of size_t)).

    But you are suggesting that std::hash<long long> could produce equal hash values with probability 1 for certain numbers in the long long range. That would go against the standard and the test example shows otherwise too.

    Given that I assume you are wrong.

    That would indeed be an example of a perfect hash function (or more generally, of a "Pairing Function"). It's issue though is that since most has table implementation use modulo to map the hash into a bucket, you may encounter certain patterns which will give you high chances of repeated collisions. For example, if you "paint" an entire column in your space, and N happens to be a multiple of the current hash table's size.
    You must have misunderstood something because a perfect hash function is one that has no collisions. And that's the case with the function I suggested (which essentially is just the index function of a 2D array).
    Last edited by tiliavirga; September 30th, 2015 at 11:28 AM.

  11. #11
    Join Date
    Jun 2009
    Location
    France
    Posts
    2,513

    Re: How to design unordered_map's hash function for this case and another one?

    Quote Originally Posted by tiliavirga View Post
    Are you sure?
    Yes.

    Quote Originally Posted by tiliavirga View Post
    Given that I assume you are wrong.
    Asterix. That node snippet *could* be equivalent to "return nodepair.to" on a 32 bit system.

    But you are suggesting that std::hash<long long> could produce equal hash values with probability 1 for certain numbers in the long long range. That would go against the standard and the test example shows otherwise too.
    I think you should study up on probabilities, because what I suggested fills the requirement, even on 32 bit systems. Tests agree with me.

    You must have misunderstod something because a perfect hash function is one that has no collisions. And that's the case with the function I suggested (which essentially is just the index function of a 2D array).
    You have misunderstood the difference between a hash table and a hashing function.
    Is your question related to IO?
    Read this C++ FAQ article at parashift by Marshall Cline. In particular points 1-6.
    It will explain how to correctly deal with IO, how to validate input, and why you shouldn't count on "while(!in.eof())". And it always makes for excellent reading.

  12. #12
    Join Date
    Jun 2015
    Posts
    208

    Re: How to design unordered_map's hash function for this case and another one?

    Quote Originally Posted by monarch_dodra View Post
    Tests agree with me.
    Rorschach possibly.

    You've been proved wrong. Get it.

  13. #13
    Join Date
    Jun 2009
    Location
    France
    Posts
    2,513

    Re: How to design unordered_map's hash function for this case and another one?

    Quote Originally Posted by tiliavirga View Post
    Rorschach possibly.

    You've been proved wrong. Get it.
    Code:
    g++ -std=c++11 -m32 -x c++ - << EOM && ./a.out      
    #include <iostream>
    #include <utility>
    
    size_t hash(int from, int to) {
    return std::hash<long long>()(static_cast<long long>(from)<<32 | static_cast<long long>(to));
    }
    
    int main()
    {//
        std::cout << hash(1,1) << std::endl;
        std::cout << hash(2,1) << std::endl;
        std::cout << hash(12345,1) << std::endl;
        std::cout << hash(12345,7) << std::endl;
    }
    EOM
    
    1
    1
    1
    7
    Oops. So much for being proven wrong :/
    Is your question related to IO?
    Read this C++ FAQ article at parashift by Marshall Cline. In particular points 1-6.
    It will explain how to correctly deal with IO, how to validate input, and why you shouldn't count on "while(!in.eof())". And it always makes for excellent reading.

  14. #14
    Join Date
    Jun 2015
    Posts
    208

    Re: How to design unordered_map's hash function for this case and another one?

    Quote Originally Posted by monarch_dodra View Post
    Oops. So much for being proven wrong
    C++ is defined by a standard, not by compilers. Compiler output proves nothing about C++.

    I've pointed at the relevant section in the C++ 11 standard and the behaviour of the compiler you've been using is not compliant with that standard. The same test with VS 2015 Community Edition gives,

    2393657685
    3738734694
    3818634013
    499726491

    which is more in line with the standard.

    I suggest you send your findings to the g++ community where it belongs. From what I can see, the version 11 support for g++ is still experimental. I assume people here want to learn C++ without having to worry about the short-comings of experimental compilers. But if you do use one you should be very worried. Your programs will be virtual minefields.

    You have misunderstood the difference between a hash table and a hashing function.
    I don't think so but you certainly have got the notion of a perfect hash function wrong. It's one that has no collisions. See the first paragraph here,

    https://en.wikipedia.org/wiki/Perfect_hash_function
    Last edited by tiliavirga; October 1st, 2015 at 01:58 AM.

  15. #15
    Join Date
    Jan 2006
    Location
    Singapore
    Posts
    6,762

    Re: How to design unordered_map's hash function for this case and another one?

    Quote Originally Posted by tiliavirga
    C++ is defined by a standard, not by compilers. Compiler output proves nothing about C++.

    I've pointed at the relevant section in the C++ 11 standard and the behaviour of the compiler you've been using is not compliant with that standard.
    Agreed: unless it can be shown to be a mistake in the standard or is an acceptable language/library extension, where a compiler differs from the standard, it is a bug in the compiler.

    Quote Originally Posted by tiliavirga
    I suggest you send your findings to the g++ community where it belongs. From what I can see, the version 11 support for g++ is still experimental. I assume people here want to learn C++ without having to worry about the short-comings of experimental compilers. But if you do use one you should be very worried. Your programs will be virtual minefields.
    I tried monarch_dodra's test program from post #13 with g++ 4.8.4 with -std=c++11 and received as the program's output:
    4294967297
    8589934593
    53021371269121
    53021371269127

    It seems that there was a bug here in the standard library implementation, but it has since been fixed, though monarch_dodra is presumably still using an older version. This can hardly be included in the "experimental" disclaimer of g++ that is meant to excuse implementations of components following drafts of C++11 before publication: a bad implementation is a bad implementation.

    Quote Originally Posted by tiliavirga
    I don't think so but you certainly have got the notion of a perfect hash function wrong. It's one that has no collisions.
    I think you missed the additional part that monarch_dodra wrote in post #8: "since most has table implementation use modulo to map the hash into a bucket, you may encounter certain patterns which will give you high chances of repeated collisions". That is, even though the hash function may be a perfect hash function, if the hash table implementation does not use it as-is, there may still be collisions.
    Last edited by laserlight; October 1st, 2015 at 02:21 AM.
    C + C++ Compiler: MinGW port of GCC
    Build + Version Control System: SCons + Bazaar

    Look up a C/C++ Reference and learn How To Ask Questions The Smart Way
    Kindly rate my posts if you found them useful

Page 1 of 2 12 LastLast

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •  


Windows Mobile Development Center


Click Here to Expand Forum to Full Width




On-Demand Webinars (sponsored)