Given the following class
public class Foo
{
public int FooId { get; set; }
public string FooName { get; set; }
public override bool Equals(object obj)
{
Foo fooItem = obj as Foo;
return fooItem.FooId == this.FooId;
}
public override int GetHashCode()
{
// Which is preferred?
return base.GetHashCode();
//return this.FooId.GetHashCode();
}
}
I have overridden the Equals method because Foo represent a row for the Foos table. Which is the preferred method for overriding the GetHashCode?
Why is it important to override GetHashCode?
Source: Tips4all
Yes, it is important if your item will be used as a key in a dictionary, or HashSet<T>, etc - since this is used (in the absense of a custom IEqualityComparer<T>) to group items into buckets. If the hash-code for two items does not match, they may never be considered equal (Equals will simply never be called).
ReplyDeleteThe GetHashCode() method should reflect the Equals logic; the rules are:
if two things are equal (Equals(...) == true) then they must return the same value for GetHashCode()
if the GetHashCode() is equal, it is not necessary for them to be the same; this is a collision, and Equals will be called to see if it is a real equality or not.
In this case, it looks like "return FooId;" is a suitable GetHashCode() implementation. If you are testing multiple properties, it is common to combine them using code like below, to reduce diagonal collisions (i.e. so that new Foo(3,5) has a different hash-code to new Foo(5,3)):
int hash = 13;
hash = (hash * 7) + field1.GetHashCode();
hash = (hash * 7) + field2.GethashCode();
...
return hash;
Oh - for convenience, you might also consider providing == and != operators when overriding Equals and GethashCode.
A demonstration of what happens when you get this wrong is here.
It's actually very hard to implement GetHashCode() correctly because, in addition to the rules Marc already mentioned, the hash code should not change during the lifetime of an object. Therefore the fields which are used to calculate the hash code must be immutable.
ReplyDeleteI finally found a solution to this problem when I was working with NHibernate.
My approach is to calculate the hash code from the ID of the object. The ID can only be set though the constructor so if you want to change the ID, which is very unlikely, you have to create a new object which has a new ID and therefore a new hash code. This approach works best with GUIDs because you can provide a parameterless constructor which randomly generates an ID.
By overriding Equals you're basically stating that you are the one who knows better how to compare two instances of a given type, so you're likely to be the best candidate to provide the best hash code.
ReplyDeleteThis is an example of how ReSharper writes a GetHashCode() function for you:
public override int GetHashCode()
{
unchecked
{
var result = 0;
result = (result * 397) ^ m_someVar1
result = (result * 397) ^ m_someVar2
result = (result * 397) ^ m_someVar3
result = (result * 397) ^ m_someVar4
return result;
}
}
As you can see it just tries to guess a good hash code based on all the fields in the class, but since you know your object's domain or value ranges you could still provide a better one.
It is because the framework requires that two objects that are the same must have the same hashcode. If you override the equals method to do a special comparison of two objects and the two objects are considered the same by the method, then the hash code of the two objects must also be the same. (Dictionaries and Hashtables rely on this principle).
ReplyDeleteHow about
ReplyDeletepublic override int GetHashCode()
{
return string.Format("{0}_{1}_{2}", prop1, prop2, prop3).GetHashCode();
}
Assuming performance is not an issue :)
Please don´t forget to check the obj paramter against null when overriding Equals().
ReplyDeleteAnd also compare the type.
public override bool Equals(object obj)
{
if (obj == null || GetType() != obj.GetType())
return false;
Foo fooItem = obj as Foo;
return fooItem.FooId == this.FooId;
}
The reason for this is: Equals must return false on compares to null. See also http://msdn.microsoft.com/en-us/library/bsc2ak47.aspx
It's not necessarily important; it depends on the size of your collections and your performance requirements and whether your class will be used in a library where you may not know the performance requirements. I frequently know my collection sizes are not very large and my time is more valuable than a few microseconds of performance gained by creating a perfect hash code; so (to get rid of the annoying warning by the compiler) I simply use:
ReplyDeletepublic override int GetHashCode()
{
return base.GetHashCode();
}
(Of course I could use a #pragma to turn off the warning as well but I prefer this way.)
When you are in the position that you do need the performance than all of the issues mentioned by others here apply, of course. Most important - otherwise you will get wrong results when retrieving items from a hash set or dictionary: the hash code must not vary with the life time of an object (more accurately, during the time whenever the hash code is needed, such as while being a key in a dictionary): for example, the following is wrong as Value is public and so can be changed externally to the class during the life time of the instance, so you must not use it as the basis for the hash code:
class A
{
public int Value;
public override int GetHashCode()
{
return Value.GetHashCode(); //WRONG! Value is not constant during the instance's life time
}
}
On the other hand, if Value can't be changed it's ok to use:
class A
{
public readonly int Value;
public override int GetHashCode()
{
return Value.GetHashCode(); //OK Value is read-only and can't be changed during the instance's life time
}
}
Hash code is used for hash-based collections like Dictionary, Hashtable, HashSet etc. The purpose of this code is to very quickly pre-sort specific object by putting it into specific group (bucket). This pre-sorting helps tremendously in finding this object when you need to retrieve it back from hash-collection because code has to search for your object in just one bucket instead of in all objects it contains. The better distribution of hash codes (better uniqueness) the faster retrieval. In ideal situation where each object has a unique hash code, finding it is an O(1) operation. In most cases it approaches O(1).
ReplyDelete