How to design unordered_map's hash function for this case and another one?

Printable View

Show 50 post(s) from this thread on one page

September 28th, 2015, 11:25 PM
lucky6969b

How to design unordered_map's hash function for this case and another one?

1)

Code:

class AStarNodePair { public: int from; int to; public: AStarNodePair(int f, int t); void init(int f, int t) { from = f; to = t; } }; std::unordered_map<AStarNodePair, float, ???> m_actualCosts;

2)

Code:

class GridNode : public AStarNode { public: GridNode(int id, const quad::vec2* bb, const std::vector<int>& neis); bool operator==(const GridNode& other) { if (m_time_independent_id == other.m_time_independent_id) { return true; } return false; } std::vector<int>& neigbours() { return neis; } bool Intersect(const D3DXVECTOR3 &point) const; float distance(const GridNode* other) const; D3DXVECTOR3 GetPos() { return m_vPos; } protected: void SetPos() { m_vPos = center(); } D3DXVECTOR3 center(); private: quad::vec2 m_boundingBox[2]; Vector2D m_pivot; CComPtr<ID3DXMesh> m_pMesh; };

In this case, I want to write a hash function to find a particular element of GridNode in constant time
Currently, I am storing these GridNodes in a std::vector, and when I find the element,
I iterate thru the container like this

Code:

bool Coordinater::Convert_D3D_To_ID(D3DXVECTOR3 vec3, int& id) { for (auto& g : NavmeshSingleton::GetNavMesher()->GetWalkables()) { if (g->Intersect(vec3)) { id = g->m_time_independent_id; return true; } } return false; }

Now I want to replace this loop with a hash function.
Rather than begging for code, I want to know what is the method to generate a good hash function,
Just want to learn how to do it.

Update1:
is this a good thing/ to know at least?
https://www.gnu.org/software/gperf/manual/gperf.html

Just wondering whether VS 2013 has this sort of goodie?

Thank you
Jack
September 29th, 2015, 05:42 AM
Philip Nicoletti

Re: How to design unordered_map's hash function for this case and another one?

There are standard has functions for primitive types, you can just use them.
Something like the following:

Code:

struct AStarNodePair_hash { std::size_t operator () (const AStarNodePair & nodepair) const { return std::hash<int>()(nodepair.from); // or, if you want to use both variables: // return std::hash<int>()(nodepair.from) + std::hash<int>()(nodepair.to); } }; std::unordered_map<AStarNodePair, float, AStarNodePair_hash> m_actualCosts;

Note: you should also define operator == for your class.
September 29th, 2015, 05:45 AM
monarch_dodra

Re: How to design unordered_map's hash function for this case and another one?

Honestly, hashing things is more of an art than an exact science. Unless you have a PhD in numeric analysis, it's best to try to not do it on your own. I understand the "I want to know what is the method to generate a good hash function", but in this case, you want to use stuff that already exists. In particular, I'd go for boost::hash's hash_combine:

http://www.boost.org/doc/libs/1_37_0...h/combine.html
http://www.boost.org/doc/libs/1_37_0...t.hash_combine

Myself, I have never used nor needed a perfect hash*. AFAIK, they only make sense if you are hashing data that never changes, such as some sort of compile time constant container. And even then, you'd have to justify that a "normal" hash+hash_table doesn't fit your performance needs.

*Unless you consider indexing a array by object ID, where the ID is a naturally increasing size_t.
September 29th, 2015, 07:44 AM
superbonzo

Re: How to design unordered_map's hash function for this case and another one?

Quote:

Originally Posted by lucky6969b

Now I want to replace this loop with a hash function.

associative containers ( hash based or not ) look up elements based on some equivalence relation; generally speaking, you can't use them to look up elements that satisfy an arbitrary property as you're trying to do. So, hashing your elements won't help you in this use case.

instead, what you want are so called spatial trees/containers. There are many of them ( AFAIR boost implements the R-tree variant ) both tree and hash based, offering log and constant lookup, respectively. Google 'em for further details...
September 29th, 2015, 08:46 AM
monarch_dodra

Re: How to design unordered_map's hash function for this case and another one?

Quote:

Originally Posted by Philip Nicoletti

There are standard has functions for primitive types, you can just use them.
Something like the following:

Code:

// or, if you want to use both variables: // return std::hash<int>()(nodepair.from) + std::hash<int>()(nodepair.to);

.

I highly suggest you don't hash a pair of integers like this. If you do, all your symmetric pairs will collide (eg [1, 2] and [2, 1]). Any symmetric operator is subject to the same issue, such as * or ^.

If you don't use boost, you can still copy paste its (trivial) implementation. In particular, the hash function for an int is the identity function, so a good hash is:
size_t hash = lhs + 0x9e3779b9 + (rhs << 6) + (rhs >> 2);
September 29th, 2015, 11:07 PM
lucky6969b

Re: How to design unordered_map's hash function for this case and another one?

Hello guys,
Thanks for helping me with this, really appreciated.
I've got a related question to this one.
When I got a std::unordered_map container which contains the some data like what I described above.

And the data comes from a sqlite database.
The sqlite database returns data in void* format,
How can I quickly convert this void* into the unordered_map, I understand that casting from void* to unordered_map is unsafe. But are there any alternatives. My goal is to have it done in a short period of time.
Thanks
Jack
September 30th, 2015, 01:08 AM
tiliavirga

Re: How to design unordered_map's hash function for this case and another one?

Quote:

Originally Posted by Philip Nicoletti

Something like the following:

return std::hash<int>()(nodepair.from) + std::hash<int>()(nodepair.to);

This can be improved to avoid the symmetry of adding two ints. Instead the two ints are used to produce a number over the wider long long range which is then hashed using the standard hash function like,

Code:

return std::hash<long long>()(static_cast<long long>(nodepair.from)<<32 | static_cast<long long>(nodepair.to));

Also if the range of the ints is known to be say [0 .. N-1] then this would be a "perfect" hash function,

Code:

return (nodepair.from * N) + nodepair.to; // in principle the the index function of a 2D array

It produces a unique int for any combination of from,to within the [0 .. N*N-1] range which allows the objects to be stored in an ordinary array rather than a hash table. For practical purposes N cannot be too big of course. On the other hand the only reason to use hashing is when N is deemed too big. Hashing primarily is a range reducing measure, a mapping from a wider range onto a narrower.
September 30th, 2015, 03:24 AM
monarch_dodra

Re: How to design unordered_map's hash function for this case and another one?

Quote:

Originally Posted by tiliavirga

This can be improved to avoid the symmetry of adding two ints. Instead the two ints are used to produce a number over the wider long long range which is then hashed using the standard hash function like,

Code:

return std::hash<long long>()(static_cast<long long>(nodepair.from)<<32 | static_cast<long long>(nodepair.to));

One issue you might face with this code, is that the default implementation of hashing integrals for most major compiler vendors (GCC for a fact. MSVC too if memory serves) is simply the identity function (truncated to size_t). So in essence, that code snipet would be equivalent to "return nodepair.to".

You could argue that the long long hash implementation is not ideal to begin with, but that's what it is.

Quote:

Originally Posted by tiliavirga

Also if the range of the ints is known to be say [0 .. N-1] then this would be a "perfect" hash function,

Code:

return (nodepair.from * N) + nodepair.to; // in principle the the index function of a 2D array

It produces a unique int for any combination of from,to within the [0 .. N*N-1] range which allows the objects to be stored in an ordinary array rather than a hash table. For practical purposes N cannot be too big of course. On the other hand the only reason to use hashing is when N is deemed too big. Hashing primarily is a range reducing measure, a mapping from a wider range onto a narrower.

That would indeed be an example of a perfect hash function (or more generally, of a "Pairing Function"). It's issue though is that since most has table implementation use modulo to map the hash into a bucket, you may encounter certain patterns which will give you high chances of repeated collisions. For example, if you "paint" an entire column in your space, and N happens to be a multiple of the current hash table's size.

Cantor's pairing function (https://en.wikipedia.org/wiki/Pairin...iring_function) works on the same idea as yours, but reduces the odds of collisions due to simple patterns. There are still patterns that could cause high amounts of collisions, but these would arguably be less frequent.
September 30th, 2015, 05:00 AM
D_Drmmr

Re: How to design unordered_map's hash function for this case and another one?

Quote:

Originally Posted by lucky6969b

I've got a related question to this one.

I don't see the relation between your new question and the OP. Please start a new thread.
September 30th, 2015, 10:54 AM
tiliavirga

Re: How to design unordered_map's hash function for this case and another one?

Quote:

Originally Posted by monarch_dodra

One issue you might face with this code, is that the default implementation of hashing integrals for most major compiler vendors (GCC for a fact. MSVC too if memory serves) is simply the identity function (truncated to size_t). So in essence, that code snipet would be equivalent to "return nodepair.to".

Are you sure? I've tested this with VS 2015 Community Edition,

Code:

size_t hash(int from, int to) { return std::hash<long long>()(static_cast<long long>(from)<<32 | static_cast<long long>(to)); } // std::cout << hash(1,1) << std::endl; std::cout << hash(2,1) << std::endl;

and it gives two different hash values indeed as expected. They aren't equal and certainly not 1 as you implied.

In the C++ 11 standard, requirements on hash function objects are described in 17.6.3.4. It essentially states that if a hash function object is called with different numbers the probability that the hash values are equal should be very small (in fact 1 / (max of size_t)).

But you are suggesting that std::hash<long long> could produce equal hash values with probability 1 for certain numbers in the long long range. That would go against the standard and the test example shows otherwise too.

Given that I assume you are wrong.

Quote:

That would indeed be an example of a perfect hash function (or more generally, of a "Pairing Function"). It's issue though is that since most has table implementation use modulo to map the hash into a bucket, you may encounter certain patterns which will give you high chances of repeated collisions. For example, if you "paint" an entire column in your space, and N happens to be a multiple of the current hash table's size.

You must have misunderstood something because a perfect hash function is one that has no collisions. And that's the case with the function I suggested (which essentially is just the index function of a 2D array).
September 30th, 2015, 11:29 AM
monarch_dodra

Re: How to design unordered_map's hash function for this case and another one?

Quote:

Originally Posted by tiliavirga

Are you sure?

Yes.

Quote:

Originally Posted by tiliavirga

Given that I assume you are wrong.

Asterix. That node snippet *could* be equivalent to "return nodepair.to" on a 32 bit system.

Quote:

But you are suggesting that std::hash<long long> could produce equal hash values with probability 1 for certain numbers in the long long range. That would go against the standard and the test example shows otherwise too.

I think you should study up on probabilities, because what I suggested fills the requirement, even on 32 bit systems. Tests agree with me.

Quote:

You must have misunderstod something because a perfect hash function is one that has no collisions. And that's the case with the function I suggested (which essentially is just the index function of a 2D array).

You have misunderstood the difference between a hash table and a hashing function.
September 30th, 2015, 11:58 AM
tiliavirga

Re: How to design unordered_map's hash function for this case and another one?

Quote:

Originally Posted by monarch_dodra

Tests agree with me.

Rorschach possibly.

You've been proved wrong. Get it.
September 30th, 2015, 12:32 PM
monarch_dodra

Re: How to design unordered_map's hash function for this case and another one?

Quote:

Originally Posted by tiliavirga

Rorschach possibly.

You've been proved wrong. Get it.

Code:

g++ -std=c++11 -m32 -x c++ - << EOM && ./a.out #include <iostream> #include <utility> size_t hash(int from, int to) { return std::hash<long long>()(static_cast<long long>(from)<<32 | static_cast<long long>(to)); } int main() {// std::cout << hash(1,1) << std::endl; std::cout << hash(2,1) << std::endl; std::cout << hash(12345,1) << std::endl; std::cout << hash(12345,7) << std::endl; } EOM 1 1 1 7

Oops. So much for being proven wrong :/
October 1st, 2015, 12:31 AM
tiliavirga

Re: How to design unordered_map's hash function for this case and another one?

Quote:

Originally Posted by monarch_dodra

Oops. So much for being proven wrong

C++ is defined by a standard, not by compilers. Compiler output proves nothing about C++.

I've pointed at the relevant section in the C++ 11 standard and the behaviour of the compiler you've been using is not compliant with that standard. The same test with VS 2015 Community Edition gives,

2393657685
3738734694
3818634013
499726491

which is more in line with the standard.

I suggest you send your findings to the g++ community where it belongs. From what I can see, the version 11 support for g++ is still experimental. I assume people here want to learn C++ without having to worry about the short-comings of experimental compilers. But if you do use one you should be very worried. Your programs will be virtual minefields.

Quote:

You have misunderstood the difference between a hash table and a hashing function.

I don't think so but you certainly have got the notion of a perfect hash function wrong. It's one that has no collisions. See the first paragraph here,

https://en.wikipedia.org/wiki/Perfect_hash_function
October 1st, 2015, 02:18 AM
laserlight

Re: How to design unordered_map's hash function for this case and another one?

Quote:

Originally Posted by tiliavirga

C++ is defined by a standard, not by compilers. Compiler output proves nothing about C++.

I've pointed at the relevant section in the C++ 11 standard and the behaviour of the compiler you've been using is not compliant with that standard.

Agreed: unless it can be shown to be a mistake in the standard or is an acceptable language/library extension, where a compiler differs from the standard, it is a bug in the compiler.

Quote:

Originally Posted by tiliavirga

I suggest you send your findings to the g++ community where it belongs. From what I can see, the version 11 support for g++ is still experimental. I assume people here want to learn C++ without having to worry about the short-comings of experimental compilers. But if you do use one you should be very worried. Your programs will be virtual minefields.

I tried monarch_dodra's test program from post #13 with g++ 4.8.4 with -std=c++11 and received as the program's output:
4294967297
8589934593
53021371269121
53021371269127

It seems that there was a bug here in the standard library implementation, but it has since been fixed, though monarch_dodra is presumably still using an older version. This can hardly be included in the "experimental" disclaimer of g++ that is meant to excuse implementations of components following drafts of C++11 before publication: a bad implementation is a bad implementation.

Quote:

Originally Posted by tiliavirga

I don't think so but you certainly have got the notion of a perfect hash function wrong. It's one that has no collisions.

I think you missed the additional part that monarch_dodra wrote in post #8: "since most has table implementation use modulo to map the hash into a bucket, you may encounter certain patterns which will give you high chances of repeated collisions". That is, even though the hash function may be a perfect hash function, if the hash table implementation does not use it as-is, there may still be collisions.

Show 50 post(s) from this thread on one page