Objectified Intermediate Language: May 2008

Wednesday, May 14, 2008

Constructor Arguments on Generic Type Parameters

This is something I've wanted on Generics since they first introduced them. It didn't come in version 3.0 or 3.5; however because I'm writing my own Objectified Intermediate Language with the goal of translating the objects into Common Intermediate Language, I can add features similarly to how C#'s compiler adds support for lambda expressions, anonymous methods, closures, and so on.

I decided to write this post after talking to Peli about this and confusing the hell out of him by not defining the problem/solution in full. I even provided a bad example of how such functionality would benefit a coder.

Let's start out with what this would mean. The primary reason you would even want this is if you wanted to create instances of some arbitrary type not explicitly defined in your code. That's what the original new() came in for. The problem with this is not all code elements are properly initializable through a zero-parameter constructor additionally they don't expose a zero-parameter constructor for this reason.

I've written a small usage example, with background code, to illustrate how the functionality will work. Using the standard 'Widget' sample, it illustrates using a list that contains multiple widget providers. Basically allowing you to define a list that is restricted to a base type, also allowing you to arbitrarily instantiate a widget by using its System.Type using much quicker instantiation methods than using a ConstructorInfo instance's Invoke method.

This would be useful for cases where you have an extensibility model that handles things through similar providers and has a need for type-parameter constructors with arguments. If, for instance, you had a project infrastructure and you allowed add-ins to define projects, and the elements that define them, through attributes or other means. You'd need a model that would enable construction of the project and its elements.

Sure you could argue that you could hard-code the specifics based upon the type requested on the provider, but that misses the point entirely.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace WidgetWorks
{
    class Program
    {
        internal static void Main()
        {
            Type[] widgetTypes = new Type[] { typeof(MyWidget), typeof(YourWidget), typeof(TheirWidget) };
            WidgetList k = new WidgetList(new WidgetProvider<MyWidget>(), new WidgetProvider<YourWidget>(), new WidgetProvider<TheirWidget>());
            Random u = new Random();
            for (int i = 0; i < 6; i++)
            {
                int l = u.Next(1, 4);
                k.Add(string.Format("Test Widget {0}", i + 1), widgetTypes[l - 1]);
            }
            foreach (var w in k)
            {
                Console.Write("{0}\n\t", w.Name);
                w.DoSomething();
                Console.WriteLine();
            }

            Console.ReadKey(true);
        }
    }
}

As you can see it's pretty straightforward; however, as before it doesn't utilize the instance method ConstructorInfo.Invoke. The idea is the expansion of new constructors with parameters would be like so:

    public class WidgetProvider<[GenericParamCtorSignatures(typeof(WidgetProvider<>.TCtorData))] T> :
        IWidgetProvider
        where T :
            Widget/*,
            new(string)
            */
    {

        #region Generic Type Constructor Code

        #region T

        private delegate T CreateWidget1(string name);
        private static RuntimeMethodHandle _widgetCtor1Ref;
        private static CreateWidget1 _widgetCtor1;

        private static CreateWidget1 WidgetCtor1
        {
            get
            {
                if (_widgetCtor1 == null)
                {
                    _widgetCtor1 = ((ConstructorInfo)(MethodBase.GetMethodFromHandle(_widgetCtor1Ref, typeof(T).TypeHandle))).BuildOptimizedConstructorDelegateEx<CreateWidget1>(typeof(CreateWidget1).Module);
                    _widgetCtor1Ref = default(RuntimeMethodHandle);
                }
                return _widgetCtor1;
            }
        }

        [StructLayout(LayoutKind.Sequential, Size = 0)]
        private struct TCtorData
        {
            public TCtorData(string name) { }
        }

        #endregion

        private static void VerifyGenericParamConstructors()
        {
            Type t = typeof(T);
            ConstructorInfo[] tCtors = t.GetConstructors();
            ConstructorInfo widgetCtor1 = tCtors.FindConstructor(typeof(string));
            if (widgetCtor1 == null)
                throw new GenericParamCtorFailureException("T", t, typeof(string));
            _widgetCtor1Ref = widgetCtor1.MethodHandle;
        }
        #endregion

        static WidgetProvider()
        {
            VerifyGenericParamConstructors();
        }

        #region IWidgetProvider Members

        public Widget GetWidget(string name)
        {
            return WidgetCtor1(name);
        }

        public Type WidgetType
        {
            get { return typeof(T); }
        }

        #endregion

    }

If there is no static method it will add a simplistic one that invokes a VerifyGenericParamConstructors. Since it's not a CLR implementation, I'll have to make libraries expressly wanting this functionality rely on a small library that emits quick constructor invoke methods. It handles the quick invoke through the following code:

public static partial class LanguageMetaHelper
{
    #region Type Conversion Info

    private static Dictionary<TypeCode, Dictionary<TypeCode, bool>> conversionInfo = GetConversionInfo();
    private static Dictionary<TypeCode, Dictionary<TypeCode, bool>> GetConversionInfo()
    {
        Dictionary<TypeCode, Dictionary<TypeCode, bool>> conversionInfo = new Dictionary<TypeCode, Dictionary<TypeCode, bool>>();
        TypeCode[] supportedTypeCodes = new TypeCode[] { TypeCode.Byte, TypeCode.SByte, TypeCode.Single, TypeCode.Double, TypeCode.Char, TypeCode.UInt16, TypeCode.UInt32, TypeCode.UInt64, TypeCode.Int16, TypeCode.Int32, TypeCode.Int64 };
        foreach (TypeCode tc in supportedTypeCodes)
        {
            Dictionary<TypeCode, bool> current = new Dictionary<TypeCode, bool>();
            switch (tc)
            {
                case TypeCode.Char:
                    current[TypeCode.UInt16] = true;
                    current[TypeCode.UInt32] = true;
                    current[TypeCode.UInt64] = true;
                    current[TypeCode.Int32] = true;
                    current[TypeCode.Int64] = true;
                    current[TypeCode.Single] = true;
                    current[TypeCode.Double] = true;
                    break;
                case TypeCode.Byte:
                    current[TypeCode.Char] = true;
                    current[TypeCode.UInt16] = true;
                    current[TypeCode.UInt32] = true;
                    current[TypeCode.UInt64] = true;
                    goto case TypeCode.SByte;
                case TypeCode.SByte:
                    current[TypeCode.Int16] = true;
                    current[TypeCode.Int32] = true;
                    current[TypeCode.Int64] = true;
                    current[TypeCode.Single] = true;
                    current[TypeCode.Double] = true;
                    break;
                case TypeCode.UInt16:
                    current[TypeCode.UInt32] = true;
                    current[TypeCode.UInt64] = true;
                    goto case TypeCode.Int16;
                case TypeCode.Int16:
                    current[TypeCode.Int32] = true;
                    current[TypeCode.Int64] = true;
                    current[TypeCode.Single] = true;
                    current[TypeCode.Double] = true;
                    break;
                case TypeCode.UInt32:
                    current[TypeCode.UInt64] = true;
                    goto case TypeCode.Int32;
                case TypeCode.Int32:
                    current[TypeCode.Int64] = true;
                    current[TypeCode.Single] = true;
                    current[TypeCode.Double] = true;
                    break;
                case TypeCode.UInt64:
                case TypeCode.Int64:
                    current[TypeCode.Single] = true;
                    current[TypeCode.Double] = true;
                    break;
                case TypeCode.Single:
                    current[TypeCode.Double] = true;
                    break;
            }
            conversionInfo[tc] = current;
        }
        return conversionInfo;
    }
    /// <summary>
    /// Checks to see if you can go <paramref name="from"/> one type <paramref name="to"/> another.
    /// </summary>
    /// <param name="from">The type to check conversion of.</param>
    /// <param name="to">The type to see if <paramref name="from"/> can go to.</param>
    /// <returns>True if <paramref name="from"/> can be cast/converted <paramref name="to"/>; otherwise false.</returns>
    public static bool CanConvertFrom(this Type from, Type to)
    {
        TypeCode fromTC = Type.GetTypeCode(from);
        TypeCode toTC = Type.GetTypeCode(to);
        try
        {
            if (fromTC != toTC)
                return conversionInfo[fromTC][toTC];
            else if (fromTC == TypeCode.Object)
                return (to.IsAssignableFrom(from));
            else
                return true;
        }
        catch
        {
            return false;
        }
    }

    #endregion

    #region CtorBinding
    public static ConstructorInfo FindConstructor(this ConstructorInfo[] list, params Type[] binding)
    {
        if (list == null)
            throw new ArgumentNullException("list");
        if (binding == null)
            throw new ArgumentNullException("binding");
        var match = new bool[list.Length];
        var deviations = new Dictionary<ConstructorInfo, int>();
        for (int i = 0; i < list.Length; i++)
        {
            var current = list[i];
            var paramsInfo = current.GetParameters();
            if (paramsInfo.Length != binding.Length)
                continue;
            match[i] = true;
            for (int j = 0; j < paramsInfo.Length; j++)
            {
                if (paramsInfo[j].ParameterType != binding[j])
                    if (paramsInfo[j].ParameterType.CanConvertFrom(binding[j]))
                        if (deviations.ContainsKey(list[i]))
                            deviations[list[i]]++;
                        else
                            deviations.Add(list[i], 1);
                    else
                    {
                        match[i] = false;
                        break;
                    }
            }
        }
        int index1 = 0;
        try
        {
            return (from constructor in list
                    where match[index1++]
                    orderby deviations.ContainsKey(constructor) ? deviations[constructor] : 0
                    select constructor).First();
        }
        catch (InvalidOperationException)
        {
            return null;
        }
    }
    #endregion

    #region BuildOptimizedConstructorDelegate
    #region Helpers
    private delegate void FuncV<T>(T arg);

    private static IEnumerable<TCallResult> OnAll<TItem, TCallResult>(this IEnumerable<TItem> e, Func<TItem, TCallResult> f)
    {
        foreach (TItem t in e)
            yield return f(t);
        yield break;
    }

    private static void OnAll<TItem>(this IEnumerable<TItem> e, FuncV<TItem> f)
    {
        foreach (TItem t in e)
            f(t);
    }

    private static string GetStringFormSignature(ParameterInfo[] parameters)
    {
        bool firstMember = true;
        StringBuilder sb = new StringBuilder();
        foreach (ParameterInfo paramInfo in parameters)
        {
            if (firstMember)
                firstMember = false;
            else
                sb.Append(", ");
            sb.Append(paramInfo.ParameterType.FullName == null ? paramInfo.ParameterType.Name : paramInfo.ParameterType.FullName);
            sb.Append(" ");
            sb.Append(paramInfo.Name);
        }
        return sb.ToString();
    }
    #endregion

    /// <summary>
    /// Returns an optimized delegate which invokes a constructor described through <paramref name="ctor"/>.
    /// </summary>
    /// <typeparam name="T">The type of object created by the constructor.</typeparam>
    /// <param name="ctor">The constructor to build the dynamic delegate off of.</param>
    /// <returns>A new <see cref="CreateObjectInvoke{T}"/> which wraps around the 
    /// <paramref name="ctor"/> provided.</returns>
    public static T BuildOptimizedConstructorDelegateEx<T>(this ConstructorInfo ctor, Module m)
    {
        Type u = typeof(T);
        if (!u.IsSubclassOf(typeof(Delegate)))
            throw new ArgumentException("T");
        MethodInfo delegateInvoke = u.GetMethod("Invoke");
        Type[] delegateTypes = delegateInvoke.GetParameters().OnAll(param => param.ParameterType).ToArray();
        ParameterInfo[] ctorParameters = ctor.GetParameters();
        ILGenerator interLangGenerator = null;
        if (delegateTypes.Length != ctorParameters.Length)
            throw new ArgumentException("ctor");
        DynamicMethod optimizedCtor = new DynamicMethod(string.Format(".ctor@{0}({1})", ctor.DeclaringType.Name, GetStringFormSignature(ctorParameters)), delegateInvoke.ReturnType, delegateTypes, m);
        interLangGenerator = optimizedCtor.GetILGenerator();
        List<LocalBuilder> paramLocals = new List<LocalBuilder>();

        int argIndex = 0;
        ctorParameters.OnAll(parameter =>
        {
            if (parameter.IsOut || parameter.ParameterType.IsByRef)
            {
                if (argIndex < 128)
                {
                    interLangGenerator.Emit(OpCodes.Ldarga_S, argIndex);
                }
                else
                    interLangGenerator.Emit(OpCodes.Ldarga, argIndex);
            }
            else
            {
                switch (argIndex)
                {
                    case 0:
                        interLangGenerator.Emit(OpCodes.Ldarg_0, argIndex);
                        break;
                    case 1:
                        interLangGenerator.Emit(OpCodes.Ldarg_1, argIndex);
                        break;
                    case 2:
                        interLangGenerator.Emit(OpCodes.Ldarg_2, argIndex);
                        break;
                    case 3:
                        interLangGenerator.Emit(OpCodes.Ldarg_3, argIndex);
                        break;
                    default:
                        if (argIndex < 128)
                            interLangGenerator.Emit(OpCodes.Ldarg_S, argIndex);
                        else
                            interLangGenerator.Emit(OpCodes.Ldarg, argIndex);
                        break;
                }
            }
            argIndex++;
        });

        interLangGenerator.Emit(OpCodes.Newobj, ctor);
        if (ctor.DeclaringType.IsValueType)
            interLangGenerator.Emit(OpCodes.Box, ctor.DeclaringType);

        interLangGenerator.Emit(OpCodes.Ret);
        return (T)(object)optimizedCtor.CreateDelegate(typeof(T));
    }
    #endregion

    internal static void Init()
    {
    }
}

Simply speaking, it merely alters the static constructor. This works because each closed generic type is a new type and thus calls the static constructor for each unique set of type parameters. Later on I'll further the functionality for obtaining the constraint information, for parametered constructors, on type arguments. That's what the placeholder struct TCtorData is for, and why there's a generic attribute pointing to it for. Basically it'll be used as a data source for the constraint information, since there is no built in functionality for it.

Download the WidgetWorks code

Monday, May 12, 2008

Type-Inference - Part 1

Well,

I finally got to a point where I can start the fun challenging aspects involved in the new languages. Smart type inference. I'm just posting a few notes about it for others to either laugh about or reply to. First I'll start with Lambda expressions and how to properly discern a good match from a not-so-good match.

Figure 1

Basically a lambda signature type, if encountered as a parameter of a call, shall yield explicit or implicit T_n elements and a T_r element for the result. Each signature will only match with a relative RoughSignatureMatch (see: Fig. 1) series if the delegate type assigned has the same parameter count, some or all of the type-paramters will be filled in initially based upon other known data, such as the source of the call (if referring to an extension method), and the data-types associated to the non-lambda call parameters. In cases where a match cannot be found this is when you come to the conclusion: "Type parameters cannot be inferred, try defining them explicitly."

I'll try to make the constructors of the lambda expressions as straight forward as possible. The specifics towards the type inference here will require a certain bit of analysis of the statement/expression associated to the lambda expression, based upon the yielded result.

There's a lot of work to be done, this might actually require a later revisit once the statement/expression model is further. I still have to rewrite a large portion of the internal auto-linking system to enable a more realistic view of how it should be done. Especially since the contextual information needed to do some of the inference won't be immediately available.

The other reason for the rewrite of the linking system is with a compiler context provided, I can actually emit error messages/warnings as needed, presently it throws errors, halting the current linker state and generally causing difficulty handling things, especially since I didn't add the appropriate code to handle the errors when they are thrown.

A good example of this is how it relates to the namespace includes (using) of the current context. Since this project is aimed at hobbyist lingual theorists, they'll need something more than a single using reference, so the linking methods will need to provide a compiler context that can give them this contextual information.

A backwards look of the current expression tree might also provide more information that might not be immediately obvious, but this is something I'll investigate as I go further into the type inference. The extra data mostly applies to closures and lifting them into appropriate generated display classes on translation into CIL.

Sunday, May 11, 2008

C♯ - Fast Array Initialization

I've been researching the .NET Common Intermediate Language (CIL) lately, and I noticed a few funny things in how C♯ exports array initializers.

If you use a primitive type, such as Int32, Int64, Byte, SByte, and so on, instead of creating a new array and initializing each element individually, it merely creates a new array and pushes a RuntimeFieldHandle of a static field stored privately.

Now to help illustrate, lets use an example, starting with a sample C♯ Array:

byte[, ,] o0 = new byte[,,] {
   {
       { 0x01, 0x02, 0x03, 0x1C },
       { 0x04, 0x05, 0x06, 0x1D },
       { 0x07, 0x08, 0x09, 0x1E }
   },
   {
       { 0x0A, 0x0B, 0x0C, 0x1F },
       { 0x0D, 0x0E, 0x0F, 0x20 },
       { 0x10, 0x11, 0x12, 0x21 }
   },
   {
       { 0x13, 0x14, 0x15, 0x22 },
       { 0x16, 0x17, 0x18, 0x23 },
       { 0x19, 0x1A, 0x1B, 0x24 }
   }
};

The C# Compiler translates this into a class that contains two things:

A struct with a StructLayout attribute applied. The size is the length of the overall set of data; the packing = 1, and the LayoutKind is Explicit. This instructs the CLR to manage the data of the struct at that size, specifically. No members are defined in this 'filler' type. Since it's merely used as a placeholder for data.
A field with the Field Type as the struct from above. Instead of initializing like most C# fields do, it uses a CIL feature that C# doesn't explicitly allow, data 'at' a named location.

From the example above, it would define the field like so:

.field assembly static valuetype  '<PrivateImplementationDetails>{17D2CA44-BFFB-4117-B3DF-49EC5806703D}'/'__StaticArrayInitTypeSize=36' '$$method0x6000001-1' at I_000020D0

The data would be defined:

.data cil I_000020D0 = bytearray (01 02 03 1C 04 05 06 1D 07 08 09 1E 0A 0B 0C 1F 0D 0E 0F 20 10 11 12 21 13 14 15 22 16 17 18 23 19 1A 1B 24)

When you want to load the information and initialize the array, you would do something like so:

.method public static hidebysig void main() cil managed
{
   .entrypoint
   .maxstack 3
   .locals init (
       [0] uint8[0...] ja)

   //Load the constant 4-byte integer '36' onto the stack
   ldc.i4.s 36
   //Create a new instance of an array and push it onto the stack.
   newobj instance void uint8[0...]::.ctor(int32)
   //Duplicate the reference on the stack, since we'll be passing
   //it to a method that doesn't return a value.
   dup
   //Load the RuntimeFieldHandle of the data blob onto the stack.
   ldtoken field valuetype '<PrivateImplementationDetails>{17D2CA44-BFFB-4117-B3DF-49EC5806703D}'/'__StaticArrayInitTypeSize=36' '<PrivateImplementationDetails>{17D2CA44-BFFB-4117-B3DF-49EC5806703D}'::'$$method0x6000001-1'
   //Initialize the array.
   call void [mscorlib]System.Runtime.CompilerServices.RuntimeHelpers::InitializeArray(class [mscorlib]System.Array, valuetype [mscorlib]System.RuntimeFieldHandle)
   //Store the array
   stloc.0
  
}

So effectively to do the same in the Abstraction Project on translation into CIL, all that would need done is:

A check of the data-type for the array, once confirmed...
Enumerate the elements of the array using enumeration methods, regardless of the number of dimensions.
Encoding the data relative to bit-size per element and making the relationships accordingly.

Fairly simple setup, but it took a while to figure out, .NET Reflector doesn't expose all the details, a few were left out. I contacted Lutz Roeder and he said he was busy at the moment and to re-send the message in two weeks.

Compiled Types - Part II

Finished the limited load aspect of the Compiled types. Now even if you access a property like so:
console.Members["WindowHeight"] <-- the 'Members' will only encapsulate the WindowHeight property. The difference there is the indexer on the Members returns a MasterDictionaryEntry<IMember> instance instead of a property. The reason for this is in order to make the members dictionary work, I used a master/subordinate system where the Members Dictionary is the master dictionary and the sub-types (method, property, indexer, field, constructor, et al.) are subordinates of the master. When the Members property is requested, it is propagated with the reflection instances (along with a pair of the subordinate that contains that instance, so it can properly link back to the kind of element, i.e. property or field or so on). When a specific instance is requested, it connects with the appropriate subordinate and requests the fully qualified IMember, giving the subordinate the appropriate reflection object.

The appropriate dictionary helper properties, Keys and Values respectively, are also in on the game and appropriately only request the necessary information when needed. This ensures that the least load is placed at any given time. There are only two drawbacks to this, one is subsequent calls are minutely slower, but it's a decent trade-off as opposed to creating a wrapper object for every reflection element just by requesting the .Members of a type; the other is it really means you can't use the Debugger Visualizers to drop down and view the properties of a type and expect to be given a list of the members on that type. All you'll get is a series of reflection objects and the Keys/Values will present a series of translated nulls, except for the areas where you've actually requested a specific member.

Expressions - Primitive Reduction

Just thought I'd post on the latest changes to the OIL framework, part of the Abstraction framework.

Recently I wrote a mini evaluator for the primitive expressions inside the system. This way if you had a statement that, for whatever reason, utilized literal mathematical expressions, it would properly reduce them in such a way that the resulted code would contain the least possible execution to get the same result.

I'll probably have to go and rewrite it, but here's an example:

-(((float)8) / 9 * (33 + 4) * 9 / 99 * -(3 + 9))

The code above is pretty trivial, naturally someone would be really silly to do that in code, and even the C# Compiler reduces that to the following: 35.87879F

The process to reduce the expression is simple, since I've structured OIL to mimic the order operations found in most languages. To handle the operations, it merely goes through and checks if each expression and sub-expression can be reduced. If it can, it reduces.

There's a loop to handle multiple passes that pegs out at 100 runs, in case there's a bug in the code. Once I've verified that it works solidly, I'll remove the cap.

Look familiar? 35.87879.

Compiled Type Optimization

I was going through the Compiled Type system and was irritated because the system auto-loaded every member, always, whenever you merely accessed the property associated to them.

For example, if I have a reference to Console, if I accessed its 'Properties'; before, it would load all the properties (which is fine) and then encapsulate every member with the framework's equivalent. While that's fine, it might make more sense to load the MemberInfo relative to the properties, and only instantiate the wrapper classes when they're called for. So that's exactly what I did.

The example below:

ICompiledClassType console = typeof(Console).GetTypeReference<ICompiledClassType>();
ITypeReferenceExpression itre = console.GetTypeExpression();
IMethodInvokeExpression imie = itre.GetMethod("WriteLine").Invoke((ExpressionBase)"Console Dimensions: ({0}, {1})", itre.GetProperty("WindowWidth"), itre.GetProperty("WindowHeight"));
IType readKeyResult = imie.ForwardType;
IMethodInvokeExpression imie2 = itre.GetMethod("ReadKey").Invoke((ExpressionBase)true);
IType readKeyResult2 = imie2.ForwardType;

When 'ForwardType' on a given expression is accessed, it auto-links the expression and any sub-expressions that are dependent upon it.

In the case above, IMethodInvokeExpression imie = . . . properly resolves to 'String' and 'Int32', 'Int32', and accordingly links to the associated member:

void Console.WriteLine(System.String, System.Object, System.Object)

The reason such indirect and lengthy optimizations are important, is if you're dealing in very large scale code generation, if it instantiated a property instance for all of the Console's properties to access its WindowWidth property, then things would suddenly start to take a long time.

As for the actual linking of methods, there wasn't really that much that could be done. In order to properly perform the lookups I went ahead and used the framework I put in place. It still does the on-access instantiation, but the iterative nature of the filtering methods ends up loading all of the elements anyway, to ensure the most accurate match is provided.

Introduction

The Objectified Intermediate Language is a framework I'm writing to help people generate code in three of the .NET's primary target languages. That is:

Visual Basic
Visual C♯
Common Intermediate Language (CIL)

To do this, there are a few things I must do, such as provide functionality for the transitory steps necessary to generate CIL for operator overloads, short circuiting, extension methods, lambda expressions, anonymous types, literal arrays, yield enumerators, and so on. I think the primary way I'll enable such functionality is through an abstract framework for defining extended functionality. Because this way I can define new functionality and merely query the translator on support, if none exists, then the 'feature' is responsible for creating the necessary code to manage the information relative to the limitations of that specific language. This will, of course, require a certain subset of functionality, in the event that minimal support for translation is given.

There are other concerns with the project. One such concern is the necessary code needed to do all of this. Presently the framework merely defines what is what, and doesn't yet actually do anything beyond simple expression translation with minimal short circuiting support. Even still the framework builds to around ~600KB and the .xml doc comments is ~1.52MB, and I've only just begun.

Once the OIL Framework is completed, I'll move onto other things, such as the Scripting Language Foundation (SLF) that will utilize the OIL framework for working its magic.

If anyone has any suggestions, questions or comments about this project, post a reply.

Blog Archive

Projects

Associates