Click to See Complete Forum and Search --> : How to choose the best hash function?
Jasmine St clair
November 13th, 2001, 03:32 AM
Hi All
suppose i choose the function
h(k) = k mod m
where k is the key and m is the number of elements in table
i want to hash an ip address which is a long type value.
what will be a "good" m
thanx.
Jasmine St clair
November 13th, 2001, 03:32 AM
Hi All
suppose i choose the function
h(k) = k mod m
where k is the key and m is the number of elements in table
i want to hash an ip address which is a long type value.
what will be a "good" m
thanx.
James Curran
November 13th, 2001, 04:10 AM
Well, if we can assume that the IP addresses are evenly distributed over the range (which I'm going to assume, since you didn't say otherwise), you're probably best off with the modulo function (remainder after a divide). If your number of buckets is a power of 2, then it becomes just a simple AND operation.
Truth,
James
http://www.NJTheater.com
http://www.NovelTheory.com
I don't do it for the points (OK, maybe I do), but rating a post is a good way for me to know if I helped.
Jasmine St clair
November 13th, 2001, 06:35 AM
Hi
what you mean by:
"If your number of buckets is a power of 2, then it becomes just a simple AND operation."
Jasmine St clair
November 13th, 2001, 06:35 AM
Hi
what you mean by:
"If your number of buckets is a power of 2, then it becomes just a simple AND operation."
James Curran
November 13th, 2001, 08:44 AM
"Buckets" is the common term for the elements in the array used in a hash table.
If you have 4 buckets, you need to reduce the IP to a number between 0-3.
If you have 16 buckets, you need to reduce the IP to a number between 0-15.
// To reduce a number to the range of 0-3
n = IP & 3;
// To reduce a number to the range of 0-15
n = IP & 15;
That only works if the number of buckets is a power of 2 (2, 4, 8, 16, 32, 64 etc)
Truth,
James
http://www.NJTheater.com
http://www.NovelTheory.com
I don't do it for the points (OK, maybe I do), but rating a post is a good way for me to know if I helped.
Arild Fines
November 14th, 2001, 03:24 AM
Having a number of buckets that is divisible by 2 leads to clustering. The number of buckets should ideally be a prime number.
"My own view on religion is that of Lucretius. I regard it as a disease born of fear and as a source of untold misery to the human race. I cannot, however, deny that it has made some contributions to civilization. It helped in early days to fix the calendar, and it caused Egyptian priests to chronicle eclipses with such care that in time they became able to predict them. These two services I am prepared to acknowledge, but I do not know of any others." -- Bertrand Russell (From his essay Has Religion Made Useful Contributions to Civilization?, first published in 1930)
James Curran
November 14th, 2001, 07:07 AM
That depends on the nature of the range of items being hashed. IF the numbers are evenly distributed (which I listed as a presumption), then the buckets would be evenly filled.
Truth,
James
http://www.NJTheater.com
http://www.NovelTheory.com
I don't do it for the points (OK, maybe I do), but rating a post is a good way for me to know if I helped.
lslx
November 14th, 2001, 10:12 AM
I dont know how many adress you want to stock in your table. With Ip adresses, you can have a lot of 132.211.105.x for what Mod may create "grapes" you want to avoid for your collision management.
Here a cute method: the "Mid square"
h(k)= Extract4MiddleNumber(k^2);
h(23456)= Extract4MiddleNumber(550183936) = 1836
h(23457)= Extract4MiddleNumber(550230849) = 2308
h(23458)= Extract4MiddleNumber(550277764) = 2777
h(23459)= Extract4MiddleNumber(550324681) = 3246
For a larger range than [1,9999], do k^3 :)
lslx
Arild Fines
November 14th, 2001, 01:05 PM
A good hash function shouldnt have to rely on the keys being evenly distributed. The rule of thumb is to either choose a prime for the number of buckets or a number with no small divisors.
Choosing an even number of buckets is extremely bad.
"My own view on religion is that of Lucretius. I regard it as a disease born of fear and as a source of untold misery to the human race. I cannot, however, deny that it has made some contributions to civilization. It helped in early days to fix the calendar, and it caused Egyptian priests to chronicle eclipses with such care that in time they became able to predict them. These two services I am prepared to acknowledge, but I do not know of any others." -- Bertrand Russell (From his essay Has Religion Made Useful Contributions to Civilization?, first published in 1930)
James Curran
November 14th, 2001, 01:25 PM
It shouldn't rely on that fact, but, if a detail about the range is known to be true, then the function shouldn't ignore that. The original poster stated that he's hashing IP addresses. If the address come from visitors to an open WWW site there is every reason to believe that the lower order bits will be evenly distributed.
And remember, the whole point of a hash table is speed. The "mod to a power of 2" function is probably the fastest possible hash code. The "Middle four of the square" hash presented elsewhere in this thread would be orders of magnitude slower -- for every insert and seek! So, even if the even number of buckets leads to a occasional extra collusion (which I do not grant), the amortized time would still be less.
Truth,
James
http://www.NJTheater.com
http://www.NovelTheory.com
I don't do it for the points (OK, maybe I do), but rating a post is a good way for me to know if I helped.
codeguru.com
Copyright Internet.com Inc., All Rights Reserved.