Hi David,

If you want complete representation independence and complete extensibility then what you have is correct. However, I personally prefer designs that "close-the-world" of possible storage types, and hide the representations away. This is because there are significant interactions when you do binary operations on different representations, e.g. adding sparse + dense. In my mind it's better to control the complexity of these interactions by having a limited set of storage types known to the library designer. It's possible to still permit extensibility through some "generic" storage view like yours, but the major storage types are typically known and fixed and if you internalize them then the complexities of their interactions can be handled and isolated. In other words, it is often better to sacrifice complete extensibility in the name of performance and localization of interactions between extensions.

Here's how that would apply to your situation. The set of possible representations is internalized, usually through a discriminated union whose constructors are hidden by a signature:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
 

type FloatVector =
  // these data constructors can be hidden by a signature, since we may want to later add to the
  // representation choices.
    | Basic of BasicFloatVector
    | Strided of StridedFloatVector
    with 
        static member CreateBasic(arr) = Basic(BasicFloatVector.Create(arr))
        static member CreateStrided(offset,stride,arr) = Strided(StridedFloatVector.Create(offset,stride,arr))

        member x.Sum() =
            match x with 
            | Basic(bfv) -> bfv.Sum()
            | Strided(sfv) -> sfv.Sum()
    end      

The representations are then implemented just as you would expect:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
 

/// A basic vector type using float[] for storage
type BasicFloatVector =
  { data : float[] }
  with 
     static member Create(arr) =
        { data = arr }
        
     member x.Sum()  =
            let mutable res = 0.0 in
            for i = 0 to x.data.Length do
                res <- res + x.data.[ i ]
            done
            res
  end

/// A strided vector type using float[] for storage
type StridedFloatVector =
  { data : float[] 
    offset: int;
    stride: int; }
  with 
     static member Create(offset,stride,arr) = failwith "fill this in"
     member x.Sum() = failwith "fill this in"
  end

You will probably eventually want to parameterize by element type too. You of course need a dictionary of operations for the element type to do this, best represented through a value of type Microsoft.FSharp.Math.INumeric<'elem> and its extensions. You can make users pass in values of this explicitly, or you can look up one through the table of numeric associations at Microsoft.FSharp.Math.GlobalAssociations. The F# Matrix library uses the latter technique.

Of course, invoking dictionaries of element operations come with costs you will absolutely have to avoid when the element type is "float". For this reason you may still want to have (internal) representations "BasicFloatVector" and "StridedFloatVector". Alternatively you can use the trick that the F# Matrix library uses and use a (cheap) runtime type test to detect when the element type is "float", and when it is switch over to an optimized implementation. Here's the outline of how this might look:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
 

/// A generic basic vector type using 'elem[] for storage with optimized implementations of operations
/// when 'elem is float
type BasicVector<'elem> =
  { ops: Math.INumeric<'elem>;
    data : 'elem [] }
  with 
     static member Create(arr) =
        // Note: in the next version of F# you will be able to use simply
        // let ops = Math.GlobalAssociations.GetNumericAssociation<'elem>() in
        let ops = Math.GlobalAssociations.getNumericAssociation(typeof() : ReifiedType<'elem>) |> Option.get in
        { ops = ops; data = arr }
        
     /// You now implement the basic functions here.
     /// Making the element type generic has non-trivial costs when the element type
     /// is "float", which is the common case, so we give optimized implementations 
     /// for floats by doing a type test. The cost of this type test is in practice very small.
     member x.Sum() : 'elem =
         match box(x.data) with 
         | :? (float []) as arr -> 
            let mutable res = 0.0 in
            for i = 0 to arr.Length do
                res <- res + arr.[ i ]
            done
            /// return the result - we know it has type "float" but now "prove" it has type "'elem".
            unbox(box(res))
         | _ -> 
            /// Here's the generic version of the algorithm
            let mutable res = x.ops.Zero in
            for i = 0 to x.data.Length do
                res <- x.ops.Add(res, x.data.[ i ])
            done
            res
     
  end

type StridedVector<'elem> =
    { ops: Math.INumeric<'elem>;
      offset: int;
      stride: int;
      data : 'elem array }
    with 
       static member Create(offset,stride,arr) = failwith "fill this in"
       member x.Sum() = failwith "fill this in"

    end
      
type Vector<'elem> =
  // these data constructors should be hidden by a signature
  | Basic of BasicVector<'elem>
  | Strided of StridedVector<'elem>
    with
        static member CreateBasic(arr) = 
            Basic(BasicVector.Create(arr))
        static member CreateStrided(offset,stride,arr) = 
            Strided(StridedVector.Create(offset,stride,arr))

        member x.Sum() =
            match x with 
            | Basic(bfv) -> bfv.Sum()
            | Strided(sfv) -> sfv.Sum()
    end

By on 2/24/2007 3:28 AM ()

Hi Don,

Thanks for your help. I agree it makes sense to limit the number of representations in this way - there are only a small number of choices after all - and pattern matching will help when adding a new one. My first attempt was actually similar to the one you listed. However it seemed that the performance profile for the 'generic' Sum only using Vector.Item() was slightly worse than the version I gave above. My guess is this is solely due to the overhead of the 'match x with' expression for each index. This is fine in the case of Sum because as you point out it's quite easy to implement it for each type. For more involved numerical codes it would be nice if these came for free, but this may now becoming a DSL rather than a F# library. It may be interesting to know that something similar is done in OOQP, a quadratic programming library, though their linear algebra layer is quite complex to support all the functionality that BLAS offers.

Cheers,

Dave

P.S. Great work on F# ! Being a fan of OCaml, it's great to see these kind of language features become 'mainstream' - and the .NET and VS integration is awesome...

By on 2/25/2007 3:05 AM ()
IntelliFactory Offices Copyright (c) 2011-2012 IntelliFactory. All rights reserved.
Home | Products | Consulting | Trainings | Blogs | Jobs | Contact Us | Terms of Use | Privacy Policy | Cookie Policy
Built with WebSharper