Monday, September 29, 2008

CompiledQuery and Enumerating Query Results

More fun with compiled queries, this time when processing the results.  Firstly, the simple non-compiled query:

using (DataClasses1DataContext context = new DataClasses1DataContext())


{


   var results = context.Employees.Where(e => e.EmployeeID == 1);


 


   Console.WriteLine("Number of employees: {0}", results.Count());


   Console.WriteLine("First ID: {0}", results.First().EmployeeID);


}




Simple stuff, and it works as you'd expect.  We get both the number of employees and the Id of the first one.  Note that under the covers, SQL gets executed twice.  Perhaps not quite what you'd expect ;)



Here's the compiled version:





var compiledQuery = CompiledQuery.Compile((DataClasses1DataContext context) => context.Employees.Where(e => e.EmployeeID == 1));


 


using (DataClasses1DataContext context = new DataClasses1DataContext())


{


   var results = compiledQuery(context);


 


   Console.WriteLine("Number of employees: {0}", results.Count());


   Console.WriteLine("First ID: {0}", results.First().EmployeeID);


}




 



This one doesn't do what you'd expect. On the second Console.WriteLine(), instead of executing a suitable query, you get an InvalidOperationException saying that "The query results cannot be enumerated more than once".  It's pretty clear what it means, and the fix is simple - make sure you only enumerate the results once, using something like the ToList() method:





var results = compiledQuery(context).ToList();




With this, you only hit the DB once, and you can then look at the list of results as much as you like. 



The difference between the non-compiled and the compiled query is down to how the two different queries are processed.  In the non-compiled version, there's an expression tree floating around, which is lazily evaluated when the results are enumerated.  Because LINQ to SQL has the expression tree available, it can generate different SQL on each hit, so results.Count() generates a "select count(*) ..." statement and results.First() generates a "select top 1 ..." statement.



For compiled queries, the SQL is determined at the point you call CompiledQuery.Compile().  When the results are enumerated the first time, this pre-prepared SQL is executed and the results processed.  Note that since the SQL is already built, the thing that you are doing with the results doesn't influence the SQL.  So the call to "results.Count()" will execute a select of the entire dataset, which will then get enumerated and counted in the client.



Since repeatedly issuing the same SQL is unlikely to be what you want your app to be doing, the designers of LINQ to SQL quite wisely throw an exception if you try to do so.  Instead, you need to stick in the explicit ToList() to make it clear that you understand the behaviour.



At first glance it seems a shame that you can't just swap normal queries & compiled queries but hopefully you can see that the semantics are quite different between the two, at which point having different client code which respects these differences is more acceptable.



Last point - before you start getting excited about the number of SQL queries that the non-compiled version may be executing, and start scattering ToList() calls everywhere, make sure that you understand the consequences.  For the code above, ToList() is going to be a good thing, but if you consider a scenario where the Where() clause doesn't identity a single entry but instead hits perhaps many thousands of rows, then it may not be so good.  The first time we look at the results, we just want the count which gets mapped to count() in SQL, and on the second time the First() method gets mapped to a TOP 1 clause in the SQL.  Although it would mean two hits on the database, it would likely be far better to do that than to bring back thousands of records for processing in the client.



As with most abstraction layers, LINQ offers a lot of benefits but it cannot be used without considered thought as to what is happening underneath.  I thoroughly recommend that when testing your LINQ queries you have the SQL profiler running so that you can see what's going on.  Also, don't forget the DataContext.Log property which lets you dump the SQL out to a TextWriter.  Using this, it would be quite possible to check within your unit tests that the DB interaction is running the way you expect, and also to spot when changes cause unexpected interactions.

Passing Predicates into Compiled Queries

I've recently been looking at generating LINQ predicates on the fly in a mapping layer between a set of business domain entities and a set of related, but different, database entities.  One of the problems that I've encountered is to do with the way in which predicates are handled when using CompiledQueries.

To start off, lets consider the easy non-compiled version.  Here's a method:

// Get an employee by a predicate


static void GetEmployee(Expression<Func<Employee, bool>> predicate)


{


   // Just perform the select, and output the results


   using (DataClasses1DataContext context = new DataClasses1DataContext())


   {


      var results = context.Employees.Where(predicate);


 


      Console.WriteLine("Number of employees: {0}", results.Count());


   }


}




The usage of this is nice and simple, and the sort of thing in LINQ examples all over the web:





GetEmployee(e => e.EmployeeID == 1);




Calling this does exactly what you'd expect.  My next step was to look at how this approach could be used with compiled queries.  I started with a simple method:





// Get an employee by a predicate, using a compiled expression


static void GetEmployeeCompiled(Expression<Func<Employee, bool>> predicate)


{


   // Compile the query


   var compiledQuery =


      CompiledQuery.Compile((DataClasses1DataContext context) => context.Employees.Where(predicate));


 


   // and using the compiled query, output the results.  This crashes :(


   using (DataClasses1DataContext context = new DataClasses1DataContext())


   {


      var results = compiledQuery(context);


 


      Console.WriteLine("Number of employees: {0}", results.Count());


   }


}




Obviously, this is pointless since it just recompiles the query every time.  But let's ignore that small fact - it should, after all, still work.  Alas, it doesn't. 



At the point where the results are enumerated, it explodes with a "NotSupportedException".  Specifically, it fails due to an "Unsupported overload used for query operator 'Where'.".  Looking at the expression tree that is being compiled, in conjunction with some help from Reflector to look at what LINQ is doing under the cover, it can be seen that the issue is down to how the predicate is included in the final query expression. 



Remember, the compiler is not generating executable code here, it is just building a lambda expression.  When it sees the parameter to the Where() method, it has little choice but to be "lift" this variable its own class, and it is a property on this lifted class that is passed as a parameter to the Where() method.  Although the non-compiled version handles this just fine, it causes the CompiledQuery object to barf.  This is just the same as any other query that uses a local variable or parameter.



I've experimented with a number of ways of constructing the query that I'm trying to compile, but all ultimately end up with the same problem.  The solution I've found is a little nasty, but it does work.  If I've missed a cleaner way, then I'd love to hear about it!



Anyhow, the solution.  It is based on the fact that it is the act of "passing" the predicate into the Where() method that is the problem.  So the solution is to not pass in the predicate, but instead pass in some dummy predicate.  Then do some expression tree walking to swap out the dummy predicate for the real one.  The code looks like this:





// Get an employee by a predicate, using a compiled expression


static void GetEmployeeCompiled2(Expression<Func<Employee, bool>> predicate)


{


   // Setup the required query, using a dummy predicate (c => true)


   Expression<Func<DataClasses1DataContext, IEnumerable<Employee>>> compiledExpression =


      context => context.Employees.Where(c => true);


 


   // Dig out the dummy predicate from the expression tree created above


   Expression template = ((UnaryExpression)((MethodCallExpression)(compiledExpression.Body)).Arguments[1]).Operand;


 


   // Swap out the template for the predicate


   compiledExpression = (Expression<Func<DataClasses1DataContext, IEnumerable<Employee>>>) 


                                 ExpressionRewriter.Replace(compiledExpression, template, predicate);


 


   // Compile the query


   var compiledQuery = CompiledQuery.Compile(compiledExpression);


 


   // and using the compiled query, output the results.  This works :)


   using (DataClasses1DataContext context = new DataClasses1DataContext())


   {


      var results = compiledQuery(context);


 


      Console.WriteLine("Number of employees: {0}", results.Count());


   }


}




So the required query is itself stored as an expression, with a dummy predicate (c => true) used to get the correct "shape" of tree.  This predicate is then located and the expression tree is rewritten, swapping out the dummy predicate for the real one.  This new query expression then compiles and executes just fine.



For completeness, the ExpressionRewriter class is defined as:





class ExpressionRewriter : ExpressionVisitor


{


   static public Expression Replace(Expression tree, Expression toReplace, Expression replaceWith)


   {


      ExpressionRewriter rewriter = new ExpressionRewriter(toReplace, replaceWith);


 


      return rewriter.Visit(tree);


   }


 


   private readonly Expression _toReplace;


   private readonly Expression _replaceWith;


 


   private ExpressionRewriter(Expression toReplace, Expression replaceWith)


   {


      _toReplace = toReplace;


      _replaceWith = replaceWith;


   }


 


   protected override Expression Visit(Expression exp)


   {


      if (exp == _toReplace)


      {


         return _replaceWith;


      }


      return base.Visit(exp);


   }


}




where the ExpressionVisitor base class can be found on MSDN

Tuesday, September 23, 2008

Unsafe code without the Unsafe keyword

I've been playing around with some code lately that uses dynamic method generation fairly extensively.  In the course of doing so, I've written the odd dodgy bit of IL out.  Interestingly, a couple of time I got some very strange results when assigning fields from one object to another - specifically, if I got the types mismatched I just got garbage in the destination rather than some form of Cast exception (which I'd expect the runtime to generate during execution) or Verification exception (which I'd expect when I finally surface my generated method through a call to DynamicMethod.CreateDelegate()).

Finally had some time today to take a closer look, and the results are very interesting and not at all clear from the documentation.  Specifically, if you create a dynamic method using the following constructor:

public DynamicMethod(


    string name,


    Type returnType,


    Type[] parameterTypes,


    Module m


)




and pass in "Assembly.GetExecutingAssembly().ManifestModule" for the module, then it appears that all type safety within the generated code is turned off.  i.e., you can pretty much assign anything to anything.  The following code, for example, enables you to dump the memory address of any reference type:





/// <summary>


/// Return a method that gives the memory address of any object


/// </summary>


static Func<object, int> Get_GetAddress_Method()


{


   DynamicMethod d = new DynamicMethod("", typeof (int), new Type[] {typeof (Object)},


                                       Assembly.GetExecutingAssembly().ManifestModule);


 


   ILGenerator ilGen = d.GetILGenerator();


 


   ilGen.Emit(OpCodes.Ldarg_0); // Load arg_0 onto the stack (of type object)


   ilGen.Emit(OpCodes.Ret);     // And return - note that the return type is an int...


 


   return (Func<object, int>)d.CreateDelegate(typeof(Func<object, int>));


}




You can use this in the following way:





Func<object, int> getAddress = Get_GetAddress_Method();


const string greeting = "Hello";


 


// Get the address of the "Hello" string


int x = getAddress(greeting);




x now contains the memory address of the string "Hello".  So what?  Well, you can also write a method like this:





/// <summary>


/// Return a method that "maps" any type to a particular memory location


/// </summary>


static Func<int, T> Get_ObjectAtAddress_Method<T>()


{


   DynamicMethod d = new DynamicMethod("", typeof (T), new Type[] {typeof (int)},


                                       Assembly.GetExecutingAssembly().ManifestModule);


 


   ILGenerator ilGen = d.GetILGenerator();


 


   ilGen.Emit(OpCodes.Ldarg_0);  // Load arg_0 onto the stack (of type int)


   ilGen.Emit(OpCodes.Ret);      // And return - note that the return type is T


 


   return (Func<int, T>)d.CreateDelegate(typeof(Func<int, T>));


}




This chap lets you take any memory address, and "pretend" that an object of type T resides there.  So you can do something like this:





Func<int, byte[]> getData = Get_ObjectAtAddress_Method<byte[]>();


 


// Get a byte array on the same location


byte[] data = getData(x);




where x is a memory location that you've acquired previously.  It doesn't matter if the type that really resides at address x is a byte[] or not.  This basically lets you get access to the whole address space within your AppDomain (and possibly the whole Win32 process) and write whatever you like into it. 



This seems plain wrong to me - I haven't specified the "unsafe" keyword anywhere, nor is this code built with the "Allow unsafe code" box checked.  Without jumping through those hoops, I should not be able to write code like this.  I'll concede that this only works in a full trust environment, but it still smells like a very serious hole in the type safety of .Net.  Interestingly, if you use the DynamicMethod constructor that doesn't take a Module parameter, then everything works as you'd expect - you are politely served a VerficationException when you try to compile the method.  According to the docs, the constructor overload that takes a module is only supposed to allow access to internals of the specified module, not to skip type safety.  I wonder if the implementation of DynamicMethod in that scenario is flawed.



Below is a big lump of code - it compiles and shows the issue quite clearly.  I'd be interested in your views on whether this is a bug or "by design". If the latter, what exactly was the scenario that they were designing for?





using System;


using System.Reflection;


using System.Reflection.Emit;


using System.Text;


 


namespace ConsoleApplication1


{


   class Program


   {


      static void Main()


      {


         // Get some methods generated...


         Func<object, int> getAddress = Get_GetAddress_Method();


         Func<int, byte[]> getData = Get_ObjectAtAddress_Method<byte[]>();


 


         const string greeting = "Hello";


 


         // Print the greeting


         Console.WriteLine(greeting);


 


         // Get the address of the "Hello" string


         int x = getAddress(greeting);


 


         // Get a byte array on the same location


         byte[] data = getData(x);


 


         // Change some data...


         SetString("Bye!!", data);


 


         // And display the greeting again (remember, strings are immutable...)


         Console.WriteLine(greeting);


 


         // And just to show it against other bits of the framework...


         Console.WriteLine(Assembly.GetExecutingAssembly().FullName);


 


         SetString("Hacked!", getData(getAddress(Assembly.GetExecutingAssembly().FullName)));


 


         Console.WriteLine(Assembly.GetExecutingAssembly().FullName);


      }


 


      /// <summary>


      /// Return a method that gives the memory address of any object


      /// </summary>


      static Func<object, int> Get_GetAddress_Method()


      {


         DynamicMethod d = new DynamicMethod("", typeof (int), new Type[] {typeof (Object)},


                                             Assembly.GetExecutingAssembly().ManifestModule);


 


         ILGenerator ilGen = d.GetILGenerator();


 


         ilGen.Emit(OpCodes.Ldarg_0); // Load arg_0 onto the stack (of type object)


         ilGen.Emit(OpCodes.Ret);     // And return - note that the return type is an int...


 


         return (Func<object, int>)d.CreateDelegate(typeof(Func<object, int>));


      }


 


      /// <summary>


      /// Return a method that "maps" any type to a particular memory location


      /// </summary>


      static Func<int, T> Get_ObjectAtAddress_Method<T>()


      {


         DynamicMethod d = new DynamicMethod("", typeof (T), new Type[] {typeof (int)},


                                             Assembly.GetExecutingAssembly().ManifestModule);


 


         ILGenerator ilGen = d.GetILGenerator();


 


         ilGen.Emit(OpCodes.Ldarg_0);  // Load arg_0 onto the stack (of type int)


         ilGen.Emit(OpCodes.Ret);      // And return - note that the return type is T


 


         return (Func<int, T>)d.CreateDelegate(typeof(Func<int, T>));


      }


 


      /// <summary>


      /// Little helper method to copy a string into a byte[]


      /// </summary>


      static void SetString(string requiredString, byte[] dest)


      {


         UnicodeEncoding encoder = new UnicodeEncoding();


         byte[] requiredBytes = encoder.GetBytes(requiredString);


 


         // Need to do the copy by hand, since Array.Copy bleats


         // about the dimensions of the destination.  No surprise really,


         // since the destination isn't really an array...


         for (int i = 0; i < requiredBytes.Length; i++)


         {


            dest[i] = requiredBytes[i];


         }


      }


   }


}