## Mapping SelectMany to Monads

This is just a quick post to write down one of my findings while I’m currently doing a bit of F# hacking.

The bind operation is the backbone of Monads. For the collection Monad (M<T> where M is IEnumerable) this is Enumerable.SelectMany for C# and Seq.concat or Seq.collect respectively. The last detail is important because in C# we have two overloads for these different operations (there are actually two more passing the index of the current element but I’ll ignore that for now).

In general, a bind operation looks like this (F# syntax):

*('T -> M<'U>) -> M<'T> -> M<'U>*

The bin operation takes a function that maps a value of type T into the monadic type M<U>. As a second argument it is passed a monadic type M<T> and it returns a monadic type M<U>. Given this signature, we can easily infer what the operation does: The value of type T is un-wrapped from its monadic type M<T>. The unwrapped value is passed as an argument to the function specified as a first argument of the bind operation. The result of calling the specified function with the value of type T is the monadic type M wrapping a value of type U, which is also the return value of the bind operation.

The SelectMany function looks accordingly (think M=IEnumerable):

IEnumerable<TResult> SelectMany<TSource, TResult>(IEnumerable<TSource> source , Func<TSource, IEnumerable<TResult>> selector)

Ignoring that the order of parameters is different, we can see that the SelectMany function corresponds to the bind operation. It is noteworthy however, that different semantics for this operation are possible given the same signature. Un-wrapping the value of type T from our IEnumerable monad corresponds to enumerating over the sequence. Therefore we invoke the selector on each element and it returns our value mapped to TResult wrapped inside an IEnumerable monad (M<U>). Now, the problem is that, in order to statisfy the method signature it would be possible for the SelectMany implementation to simply return the result of the first invocation of the selector. Nonetheless, this is not what we expect the bind operation for IEnumerables to do, we expect it to flatten the returned sequence. So we have a bunch of IEnumerbale<U> that we need to bring into a single IEnumerable<U>. To do this, we simply concat all the sequences together.

So there are two distinct steps happening here: Mapping each value T into a new M<U> and then perform a selection on all returned M<U>’s to return a single M<U>.

This is why there’s also a second overload of SelectMany:

public static IEnumerable<TResult> SelectMany<TSource, TCollection, TResult>(this IEnumerable<TSource> source, Func<TSource, IEnumerable<TCollection>> collectionSelector, Func<TSource, TCollection, TResult> resultSelector)

This second overload allows you to customize the last step of the operation. From a theoretical point of view, this second overload is not necessary but it certainly makes things easier in C#.

Consider the following example, we have two lists of numbers from 1 to 3 and want to select pairs from it where the numbers match (using F# with LINQ):

open System.Linq let numbers = [1..3]; numbers.SelectMany(fun n -> numbers.SelectMany( fun m -> Seq.ofList( if (n = m) then [(m, n)] // sequence with a single tuple (m,n) else [] // empty sequence ))) // Prints (F# Interactive): // val it : seq<int * int> = seq [(1, 1); (2, 2); (3, 3)]

A similar implementation in C# would look like this:

var numbers = Enumerable.Range(1, 3); var q = numbers.SelectMany(n => numbers.SelectMany(m => { if (n == m) return new { n, m }.Return(); else return new { n, m }.Ignore(); }));

In order to get the same expressiveness as in F# we needed to add to extension methods. Return is the second important operation on monads, it simply takes a value and wraps it inside the monad. In this case it returns an IEnumerable with a single element. Returning an empty sequence is not so easy because we do not have the same type inference capabilities as in F#. The Ignore operation returns an empty sequence (or empty monad) and is only invoked on an instance of the anonymous type so we know the type of sequence to return.

Doing it this way is a pretty cumbersome, so when the C# team decided to implement LINQ they used a different approach to translate a query as our example above. Section 7.15.2. of the C# Spec specifies this.

var numbers = Enumerable.Range(1, 3); var q = from n in numbers from m in numbers where n==m select new {n, m};

This translates to using the second overload of SelectMany to produce a cross join of both sequences and then filtering the result:

var numbers = Enumerable.Range(1, 3); var q5 = numbers .SelectMany(n => numbers, (n, m) => new { n, m }) .Where(x => x.n == x.m);

I might not be 100% correct on this, but I think this the only reason the second SelectMany overload exists is to compensate for the missing type inference and make query expression translation easier. From the monad point of view, it’s not necessary, I believe.

Hi,

Actually what really defines a monad is the join operation. You go from one layer to an outer one.

My problem with C# is its type inference is not so good as Haskell’s one.