Monday, September 29, 2008

CompiledQuery and Enumerating Query Results

More fun with compiled queries, this time when processing the results.  Firstly, the simple non-compiled query:

using (DataClasses1DataContext context = new DataClasses1DataContext())


{


   var results = context.Employees.Where(e => e.EmployeeID == 1);


 


   Console.WriteLine("Number of employees: {0}", results.Count());


   Console.WriteLine("First ID: {0}", results.First().EmployeeID);


}




Simple stuff, and it works as you'd expect.  We get both the number of employees and the Id of the first one.  Note that under the covers, SQL gets executed twice.  Perhaps not quite what you'd expect ;)



Here's the compiled version:





var compiledQuery = CompiledQuery.Compile((DataClasses1DataContext context) => context.Employees.Where(e => e.EmployeeID == 1));


 


using (DataClasses1DataContext context = new DataClasses1DataContext())


{


   var results = compiledQuery(context);


 


   Console.WriteLine("Number of employees: {0}", results.Count());


   Console.WriteLine("First ID: {0}", results.First().EmployeeID);


}




 



This one doesn't do what you'd expect. On the second Console.WriteLine(), instead of executing a suitable query, you get an InvalidOperationException saying that "The query results cannot be enumerated more than once".  It's pretty clear what it means, and the fix is simple - make sure you only enumerate the results once, using something like the ToList() method:





var results = compiledQuery(context).ToList();




With this, you only hit the DB once, and you can then look at the list of results as much as you like. 



The difference between the non-compiled and the compiled query is down to how the two different queries are processed.  In the non-compiled version, there's an expression tree floating around, which is lazily evaluated when the results are enumerated.  Because LINQ to SQL has the expression tree available, it can generate different SQL on each hit, so results.Count() generates a "select count(*) ..." statement and results.First() generates a "select top 1 ..." statement.



For compiled queries, the SQL is determined at the point you call CompiledQuery.Compile().  When the results are enumerated the first time, this pre-prepared SQL is executed and the results processed.  Note that since the SQL is already built, the thing that you are doing with the results doesn't influence the SQL.  So the call to "results.Count()" will execute a select of the entire dataset, which will then get enumerated and counted in the client.



Since repeatedly issuing the same SQL is unlikely to be what you want your app to be doing, the designers of LINQ to SQL quite wisely throw an exception if you try to do so.  Instead, you need to stick in the explicit ToList() to make it clear that you understand the behaviour.



At first glance it seems a shame that you can't just swap normal queries & compiled queries but hopefully you can see that the semantics are quite different between the two, at which point having different client code which respects these differences is more acceptable.



Last point - before you start getting excited about the number of SQL queries that the non-compiled version may be executing, and start scattering ToList() calls everywhere, make sure that you understand the consequences.  For the code above, ToList() is going to be a good thing, but if you consider a scenario where the Where() clause doesn't identity a single entry but instead hits perhaps many thousands of rows, then it may not be so good.  The first time we look at the results, we just want the count which gets mapped to count() in SQL, and on the second time the First() method gets mapped to a TOP 1 clause in the SQL.  Although it would mean two hits on the database, it would likely be far better to do that than to bring back thousands of records for processing in the client.



As with most abstraction layers, LINQ offers a lot of benefits but it cannot be used without considered thought as to what is happening underneath.  I thoroughly recommend that when testing your LINQ queries you have the SQL profiler running so that you can see what's going on.  Also, don't forget the DataContext.Log property which lets you dump the SQL out to a TextWriter.  Using this, it would be quite possible to check within your unit tests that the DB interaction is running the way you expect, and also to spot when changes cause unexpected interactions.

2 comments:

sql2008 said...

Thanks Steve,

You saved my time.

I had the enumerating error and solved it by using the ToList() extension.

Thanks,
Eralper
ASP.NET Programming

Dave said...

Thanks. good explaining.