Home > .NET, CLR > .NET Generics Implementation

.NET Generics Implementation

A really interesting question on stackoverflow caused me to do some research on how exactly the Common Language Runtime (CLR) implements parametric polymorphism – generics.  How does the CLR actually implement generics? My initial thought was, that specialized code must be generated somewhere to support a concrete generic instance. This code would need to be different for each generic type. Let’s look at two common concepts to implement Generics:

The Java Way : Type Erasure

Java generics are compile time only generics. In essence they are just syntactic sugar on top of a run-time environment that only has no notion for a generic type. Therefore, no runtime operations are possible that depend on the type of a generic argument. For every generic type, the Java compiler will generate a raw type. Inside a generic class or method, variables of the generic type will be substituted by the compiler with the closest matching type: In case an extends constraint is specified, the type generic arguments are constrained to is used, otherwise object. This process is called Type Erasure, no information about the generic type argument is preserved in the raw type. This means that List<String> and List<Object> can and will both map to the same raw type since the process of Type Erasure will yield identical representation for instances of both generic arguments. The compiler will insert runtime casts at call sites to ensure the runtime types are correct.

The advantage of this approach is binary compatibility to previous version of the JVM. As always, the price for backwards compatibility is high. It’s a serious disadvantage introspecting the type of a generic argument at runtime is impossible. It’s easy to break the type system with a few casts. Performance for value types is bad because boxing operations are needed everywhere. Since Java generics don’t avoid the burden of runtime casting, there are no performance gains over traditional object collections.

The .NET Way : Reified Generic Arguments

.NET is said to reify parametrized types, which means it has a notion of generic types at the IL level. When a generic type is declared, the compiler generates IL that contains placeholders for the generic type and metadata about the constraints. When the generic class is used, the compiler  substitutes the placeholders on each invocation for the concrete generic type argument and uses the supplied metadata to enforce the type system. It is important to notice that the compiler does only request an instantiation of the generic class or method, that is the actual to be code executed. It does not generate that code. The compiler is only responsible for passing the right generic type arguments at invocation. Let me explain what the somewhat ambiguous term “instantiation” means here.

A given generic type might require different instantiations for each different generic type argument it is used with. You can think of it as “code instantiation”, not “object instantiation”. “Instantiation” refers to the native image, the executable code here.

Whenever a parametrized class or method is invoked, the corresponding instantiation is generated by the JIT compiler at runtime. Looking at the generic type argument, the loader checks if an instantiation of a compatible type has already been generated and returns the compatible instantiation or generates a new one. Compatible means, the code executed for a given parametrized class or method can be shared among different generic arguments. This is the case for all reference types because the pointers used for their storage are fixed size regardless of the concrete reference type. For value types, the JIT compiler will generate a specialized instantiation for each generic type argument used. Let’s look at a short example:

    public class SomeType<T> where T : new()
    {
        T field;

        public SomeType()
        {
            field = new T();
        }
    }
    class Program
    {
        static void Main(string[] args)
        {
            new SomeType<int>();
            new SomeType<float>();
            new SomeType<object>();
            new SomeType<Exception>();
        }
    }

Due to the first and second line in Main(), two different, specialized instantiations of SomeType will be generated by the JIT compiler. This is because they are value types. For the third and fourth line, only a single instantiation will be generated because the representation for reference types is binary compatible. The instantiation will be shared. The details that make this and preservation of the runtime type possible are very low level. Described shortly: While the exectuable code (instantiation, the native image) is shared among all reference types, the vtable associated with an instance of the instantiation is unique to the concrete parameter type. Information on the details can be found in a paper by Andrew Kennedy and Don Syme from Microsoft Research titled “Design and Implementation of Generics for the.NET Common Language Runtime”, which also is where most information for this blog post is from.

What are the advantages of this approach? Well, instantiation sharing is clever in terms of reduced code size and less just-in-time compilation, which is very expensive usually. Not sharing instantiations on value types means that boxing operations are unnecessary. The performance of a true generic collection of value types will therefore be superior to a collection implementation that uses the “object-idiom”.

Compared to Java’s generics implementation, the .NET implementation has many advantages in terms of performance and enforcing consistent type system because the complete type information is available at runtime. I won’t repeat the same arguments over and over again why this is advantageous, instead I’ll leave you with the recommendation to read Jonathan Pryor’s excellent blog post on this topic.

Here’s another good link to an interview with Anders Hejlsberg, talking about the C#, Java and C++ implementation of generics (or templates) respectively.

About these ads
Categories: .NET, CLR

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: