Sunday, May 11, 2008

C♯ - Fast Array Initialization

I've been researching the .NET Common Intermediate Language (CIL) lately, and I noticed a few funny things in how C♯ exports array initializers.

If you use a primitive type, such as Int32, Int64, Byte, SByte, and so on, instead of creating a new array and initializing each element individually, it merely creates a new array and pushes a RuntimeFieldHandle of a static field stored privately.

Now to help illustrate, lets use an example, starting with a sample C♯ Array:

byte[, ,] o0 = new byte[,,] {
{
{ 0x01, 0x02, 0x03, 0x1C },
{ 0x04, 0x05, 0x06, 0x1D },
{ 0x07, 0x08, 0x09, 0x1E }
},
{
{ 0x0A, 0x0B, 0x0C, 0x1F },
{ 0x0D, 0x0E, 0x0F, 0x20 },
{ 0x10, 0x11, 0x12, 0x21 }
},
{
{ 0x13, 0x14, 0x15, 0x22 },
{ 0x16, 0x17, 0x18, 0x23 },
{ 0x19, 0x1A, 0x1B, 0x24 }
}
};

The C# Compiler translates this into a class that contains two things:
  1. A struct with a StructLayout attribute applied. The size is the length of the overall set of data; the packing = 1, and the LayoutKind is Explicit. This instructs the CLR to manage the data of the struct at that size, specifically. No members are defined in this 'filler' type. Since it's merely used as a placeholder for data.
  2. A field with the Field Type as the struct from above. Instead of initializing like most C# fields do, it uses a CIL feature that C# doesn't explicitly allow, data 'at' a named location.

From the example above, it would define the field like so:
.field assembly static valuetype  '<PrivateImplementationDetails>{17D2CA44-BFFB-4117-B3DF-49EC5806703D}'/'__StaticArrayInitTypeSize=36' '$$method0x6000001-1' at I_000020D0
The data would be defined:
.data cil I_000020D0 = bytearray (01 02 03 1C 04 05 06 1D 07 08 09 1E 0A 0B 0C 1F 0D 0E 0F 20 10 11 12 21 13 14 15 22 16 17 18 23 19 1A 1B 24)

When you want to load the information and initialize the array, you would do something like so:
.method public static hidebysig void main() cil managed
{
.entrypoint
.maxstack 3
.locals init (
[0] uint8[0...] ja)

//Load the constant 4-byte integer '36' onto the stack
ldc.i4.s 36
//Create a new instance of an array and push it onto the stack.
newobj instance void uint8[0...]::.ctor(int32)
//Duplicate the reference on the stack, since we'll be passing
//it to a method that doesn't return a value.

dup
//Load the RuntimeFieldHandle of the data blob onto the stack.
ldtoken field valuetype '<PrivateImplementationDetails>{17D2CA44-BFFB-4117-B3DF-49EC5806703D}'/'__StaticArrayInitTypeSize=36' '<PrivateImplementationDetails>{17D2CA44-BFFB-4117-B3DF-49EC5806703D}'::'$$method0x6000001-1'
//Initialize the array.
call void [mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [mscorlib]System.Array, valuetype [mscorlib]System.RuntimeFieldHandle)
//Store the array
stloc.0

}


So effectively to do the same in the Abstraction Project on translation into CIL, all that would need done is:
  1. A check of the data-type for the array, once confirmed...
  2. Enumerate the elements of the array using enumeration methods, regardless of the number of dimensions.
  3. Encoding the data relative to bit-size per element and making the relationships accordingly.

Fairly simple setup, but it took a while to figure out, .NET Reflector doesn't expose all the details, a few were left out. I contacted Lutz Roeder and he said he was busy at the moment and to re-send the message in two weeks.

No comments: