Archive

Archive for the ‘CLR’ Category

.NET Metadata Tokens

August 3, 2010 Leave a comment

Have you ever wondered why System.Type derives from System.Reflection.MemberInfo? Why does its inheritance hierachy look this way?

System.Object
System.Reflection.MemberInfo
System.Reflection.EventInfo
System.Reflection.FieldInfo
System.Reflection.MethodBase
System.Reflection.PropertyInfo
System.Type

We can find the answer to that by looking at the MSIL representation of members. When we write MSIL by hand and want to reference a member (that is a type, a field, a property, an event…) we do so by providing it’s fully qualified name as a string. But this is not how those references get actually stored inside the assembly when we put it through ILAsm (just imagine how inefficient it would be). What ILAsm does under the hood is generating token values for each unique reference we make and make a corresponding entry in one of the numerous Metadata tables.

Tools like Reflector or ILDASM resolve these tokens for the readers convenience, so usually you never get to see them.

Here’s an example C# code snippet:

static void Main(string[] args)
{
	var x = new DateTime();
	var y = new StringBuilder();
}

When we look at it in ILDASM (or reflector for that matter) we get the following output:

.method private hidebysig static void  Main(string[] args) cil managed
{
  .entrypoint
  // Code size       16 (0x10)
  .maxstack  1
  .locals init ([0] valuetype [mscorlib]System.DateTime x,
           [1] class [mscorlib]System.Text.StringBuilder y)
  IL_0000:  nop
  IL_0001:  ldloca.s   x
  IL_0003:  initobj    [mscorlib]System.DateTime
  IL_0009:  newobj     instance void [mscorlib]System.Text.StringBuilder::.ctor()
  IL_000e:  stloc.1
  IL_000f:  ret
} // end of method Program::Main

But that’s only the default configuration. You can make ILDASM output the token values using the View->Show Token Values option.

.method /*06000001*/ private hidebysig static
        void  Main(string[] args) cil managed
{
  .entrypoint
  // Code size       16 (0x10)
  .maxstack  1
  .locals /*11000001*/ init ([0] valuetype [mscorlib/*23000001*/]System.DateTime/*01000013*/ x,
           [1] class [mscorlib/*23000001*/]System.Text.StringBuilder/*01000014*/ y)
  IL_0000:  nop
  IL_0001:  ldloca.s   x
  IL_0003:  initobj    [mscorlib/*23000001*/]System.DateTime/*01000013*/
  IL_0009:  newobj     instance void [mscorlib/*23000001*/]System.Text.StringBuilder/*01000014*/::.ctor() /* 0A000011 */
  IL_000e:  stloc.1
  IL_000f:  ret
} // end of method Program::Main

As we can see, our method is identified by the token 11000001, the assembly mscorlib by 23000001 and System.DateTime by 01000013 and System.Text.StringBuilder by 01000014.
Each token consists of 4 bytes, while the most significant byte identifies the metadata table where the reference is stored. In the case of System.DateTime this is 0x01. The other three bytes store a RID, a record identifier in that table. The RID is a simple zero based sequence number and is used like a primary key in a database table. The entry for System.DateTime is 0x000013.
We can confirm that ILDAsm did its job of displaying us human friendly names by looking at the Metadata tables that carry the reference information (View->MetaInfo->Show, or simply Ctrl+M).

TypeRef #19 (01000013)
——————————————————-
Token: 0x01000013
ResolutionScope: 0x23000001
TypeRefName: System.DateTime

The Resolution scope is mscorlib, as we can easily infer from the entry at 0x23000001.

Ok, now that we understand how member references are stored it is time to return to the inheritance hierachy of abstract class System.Reflection.MemberInfo. Since tokens represent member references in a uniform manner, it makes sense to built the reflection APIs around the notion of an abstract MemberInfo that can carry abribtrary, you guessed it, member information. Similar to the first byte of the token, MemberInfo has a property called MemberType that indicates the actual type this MemberInfo is. MemberInfo therefore streamline working with binary MSIL such as when directly manipulating a MSIL stream returned from via MethodInfo.GetMethodBody.GetILAsByteStream().

The APIs to resolve a token we encounter in the MSIL stream are provided on the Module class, e.g. Module.ResolveMemberInfo(). ResolveMemberInfo() is useful to resolve a token regardless of it’s type, or when you do not know the type of the token in advance.

More information can be found on MSDN and in the ECMA-335 standard.

Categories: .NET, CLR

.NET Generics Implementation

April 21, 2010 1 comment

A really interesting question on stackoverflow caused me to do some research on how exactly the Common Language Runtime (CLR) implements parametric polymorphism – generics.  How does the CLR actually implement generics? My initial thought was, that specialized code must be generated somewhere to support a concrete generic instance. This code would need to be different for each generic type. Let’s look at two common concepts to implement Generics:

The Java Way : Type Erasure

Java generics are compile time only generics. In essence they are just syntactic sugar on top of a run-time environment that only has no notion for a generic type. Therefore, no runtime operations are possible that depend on the type of a generic argument. For every generic type, the Java compiler will generate a raw type. Inside a generic class or method, variables of the generic type will be substituted by the compiler with the closest matching type: In case an extends constraint is specified, the type generic arguments are constrained to is used, otherwise object. This process is called Type Erasure, no information about the generic type argument is preserved in the raw type. This means that List<String> and List<Object> can and will both map to the same raw type since the process of Type Erasure will yield identical representation for instances of both generic arguments. The compiler will insert runtime casts at call sites to ensure the runtime types are correct.

The advantage of this approach is binary compatibility to previous version of the JVM. As always, the price for backwards compatibility is high. It’s a serious disadvantage introspecting the type of a generic argument at runtime is impossible. It’s easy to break the type system with a few casts. Performance for value types is bad because boxing operations are needed everywhere. Since Java generics don’t avoid the burden of runtime casting, there are no performance gains over traditional object collections.

The .NET Way : Reified Generic Arguments

.NET is said to reify parametrized types, which means it has a notion of generic types at the IL level. When a generic type is declared, the compiler generates IL that contains placeholders for the generic type and metadata about the constraints. When the generic class is used, the compiler  substitutes the placeholders on each invocation for the concrete generic type argument and uses the supplied metadata to enforce the type system. It is important to notice that the compiler does only request an instantiation of the generic class or method, that is the actual to be code executed. It does not generate that code. The compiler is only responsible for passing the right generic type arguments at invocation. Let me explain what the somewhat ambiguous term “instantiation” means here.

A given generic type might require different instantiations for each different generic type argument it is used with. You can think of it as “code instantiation”, not “object instantiation”. “Instantiation” refers to the native image, the executable code here.

Whenever a parametrized class or method is invoked, the corresponding instantiation is generated by the JIT compiler at runtime. Looking at the generic type argument, the loader checks if an instantiation of a compatible type has already been generated and returns the compatible instantiation or generates a new one. Compatible means, the code executed for a given parametrized class or method can be shared among different generic arguments. This is the case for all reference types because the pointers used for their storage are fixed size regardless of the concrete reference type. For value types, the JIT compiler will generate a specialized instantiation for each generic type argument used. Let’s look at a short example:

    public class SomeType<T> where T : new()
    {
        T field;

        public SomeType()
        {
            field = new T();
        }
    }
    class Program
    {
        static void Main(string[] args)
        {
            new SomeType<int>();
            new SomeType<float>();
            new SomeType<object>();
            new SomeType<Exception>();
        }
    }

Due to the first and second line in Main(), two different, specialized instantiations of SomeType will be generated by the JIT compiler. This is because they are value types. For the third and fourth line, only a single instantiation will be generated because the representation for reference types is binary compatible. The instantiation will be shared. The details that make this and preservation of the runtime type possible are very low level. Described shortly: While the exectuable code (instantiation, the native image) is shared among all reference types, the vtable associated with an instance of the instantiation is unique to the concrete parameter type. Information on the details can be found in a paper by Andrew Kennedy and Don Syme from Microsoft Research titled “Design and Implementation of Generics for the.NET Common Language Runtime”, which also is where most information for this blog post is from.

What are the advantages of this approach? Well, instantiation sharing is clever in terms of reduced code size and less just-in-time compilation, which is very expensive usually. Not sharing instantiations on value types means that boxing operations are unnecessary. The performance of a true generic collection of value types will therefore be superior to a collection implementation that uses the “object-idiom”.

Compared to Java’s generics implementation, the .NET implementation has many advantages in terms of performance and enforcing consistent type system because the complete type information is available at runtime. I won’t repeat the same arguments over and over again why this is advantageous, instead I’ll leave you with the recommendation to read Jonathan Pryor’s excellent blog post on this topic.

Here’s another good link to an interview with Anders Hejlsberg, talking about the C#, Java and C++ implementation of generics (or templates) respectively.

Categories: .NET, CLR

Solution to Ayende’s Challenge

March 30, 2010 Leave a comment

I have always wanted to solve this puzzle Ayende posted on his blog, but haven’t got around it until today. It was a really hard puzzle, but with a little hint from Marc Gravell in the comments I was able to find a solution strategy based on .NET Remoting. Because Ayende’s original challenge was quite a bit harder (and actually impossible to solve, I wish I knew before) I tried messing with the typeof/new operators first. Sadly this isn’t possible and modifying any IL wasn’t allowed. On the other hand I learned a lot about the CLRs type system.

.NET Remoting is really interesting stuff, but to be honest I wouldn’t want to deal with it’s huge complexity in my every day job. I am not really sure which blank it fills in, besides interacting with ugly COM servers. If I compare all this stuff to Objective-C, I do now know about the benefits of  late-bound, message-passing style function calls. Solving this challenge in Objective-C would have been substantially easier (a simple NSProxy plus Method Swizzling, just what OCMock does).

Anyway, you can find my solution on github.

Categories: .NET, CLR
Follow

Get every new post delivered to your Inbox.

%d bloggers like this: