2.9k
views5
comments

Why do equivalent tuples return different hash codes?
For example:

1
2

Console.WriteLine((1, 2, 3).GetHashCode())
Console.WriteLine((1, 2, 3).GetHashCode())

returns two distinct values.

This can be very confusing when using tuples as key in System.Collection.Generic collections:

1
2
3

let dict = new Dictionary<int * int * int, string>()
dict.Add((1,2,3), "foo")
Console.WriteLine(dict.ContainsKey((1,2,3)))

prints out False.

Is this by design (some awkward compatability with OCaml) or is it a bug in the current implementation?

This is by design. F# distinguishes between "structural identity" and "object identity". For several reasons it is not particularly a good idea to rely on overriding Object.GetHashCode to implement structural identity. One reason is performance: hashing on structural data includes tradeoffs between depth of hashing v. accuracy of hashing, and to naively hash all components of a tuple would be wrong. Another reason is that F# structured data can include certain kinds of loops (i.e. the structured terms can be graphs - not in the case of tuples, but in the case of some other data types), so hashing on structured data in a naive way could lead to unexpected infinite loops.

As such F# does not override GetHashCode for structural types such as lists, options and user type definitions. This means you should provide an appropriate instance of IEqualityComparer when using structural keys with an instance of a generic .NET collection.

For example:

#light
open System
open System.Collections.Generic
let dict = new Dictionary<int * int * int, string>(Collections.HashIdentity.Structural())
dict.Add((1,2,3), "foo")
Console.WriteLine(dict.ContainsKey((1,2,3)))

You can find implementations of IEqualityComparer at Microsoft.FSharp.Collections.HashIdentity, in particular you will want to use Microsoft.FSharp.Collections.HashIdentity.Structural. There is also Microsoft.FSharp.Collections.HashIdentity.Reference and Microsoft.FSharp.Collections.HashIdentity.Object if you wish to be explicit about which form of hash identity you are using.

On the whole this issue stems from there not being particularly clear language-neutral techniques for structured hashing and comparison on .NET. One contribution of F# is to make distinctions between these kinds of identities and provide techniques that allow them to be mixed.

I agree more detail is needed in the books and in the manual about this important practical issue.

By dsyme on 9/16/2006 9:13 AM (permalink)

So a=b is structural identiry, but GetHashCode() reflects object identity.
And when a=b, a.GetHashCode() = b.GetHashCode() might not hold. Right?
What is the performance penalty of using Dictionary<int * int * int, string>(Structural ()) instead of Dictionary<int * int * int, string>()?

By Slavomir Kaslev on 9/16/2006 9:48 AM (permalink)

Yes, "a = b" is structural equality. The "hash" operation is the structural hash operation, so "hash a = hash b" will hold.

Dictionaries base on structural hashing on tuple keys are fast, since structural comparison and hashing on tuples is treated specially in the compiler by generting efficient code for these operations. But you should _absolutley_ do performance tests in any particular app, or at least minimal profiling runs - for example, nothing will beat implementation dictionaries where the key type has been hand-specialized to a particular known type and hash function, or there may be special properties of your keys that you can exploit by providing your own implementation of IEqualityComparer. For example, you may know that you only really have to hash/compare two of the integers.

Dictionaries based on reference identity are very fast (hashing and equality are essentially free), but this is only appropriate when you are certain that two keys are the same if the are precisely the same unique allocated object. In F# you should only rely on this when you have used "new" on a class type, or have deliberately constructed record values in a very controlled way. You can't rely on reference identity for tuple values, so it doesn't make sense to compare those two types you mention.

Don

By dsyme on 9/16/2006 5:03 PM (permalink)

Thanks, Don. That cleared things up.

So why isn't GetHashCode structural hash by default on tuples. Python (and IronPython for .net) makes this guarantee. Since there is no performance penalty involved I don't see any reasons for the extra Collections.HashIdentity.Structural() typing. This can be very confusing for newcomers from other languages and also can lead to subtle bugs in a larger code base.

By Slavomir Kaslev on 9/17/2006 3:17 PM (permalink)

Thanks for this feedback. For tuples this would be fine. We'll add it to our list of design topics to look at. For now you should either use the F# collection types (where structural comparison is the default) or be explicit about the use of structural hashing when using reference-typed keys in conjunction with the .NET collections.

One design change in .NET 2.0 (or was it 1.1?) helps this a lot: previously, Object.GetHashCode() was used by some critical parts of the system such as serialization. It essentially _had_ to implement reference identity if there was any chance of terms having loops, or else you simply couldn't serialize terms without additionally customizing serialization behaviour. This led to real problems. Gladly this has since been fixed.

Don

By dsyme on 9/17/2006 4:14 PM (permalink)

Topic tags

Built with WebSharper

Home

Answers

Events

Courses

Groups and Conferences

Blogs

Jobs

Developers

Topic tags