Wednesday, December 03, 2008

On Agile

Following a couple of other posts relating to agile (here & here), I thought I'd give my thoughts on the subject. 

There are numerous books and posts on the subject of Agile - searching for "Agile Development" on Google gives well over 1 million hits.  As someone who is hugely interested in how to improve software development, I've read a good deal of material on the subject (although, being fair, it would only be a fraction of a percent of those Google hits!).  What I've found very interesting is the number of different opinions as to what Agile means - it's pretty hard to find two books on the subject that agree on a definition of the term.

One thing that is for sure - Agile (whatever definition you may have) is no silver bullet.  Just being "Agile" does not guarantee success.  It does not guarantee the early delivery of rock-solid software to delirious customers. It does not guarantee that your staff will become highly motivated software ninjas overnight. Given that, what's all the fuss about?  The fuss is that previous software methodologies have got a woeful delivery record.  I'm not going to quote some specific number here, since there are a ton of research articles available which give differing statistics depending on exactly what is defined as failure.  However, the general range is anything from 30% - 85%.  Even if you take the best number there, it's clearly an unacceptable state for a multi-billion dollar industry.

Given the lack of consistency for what Agile means, and given the proliferation of processes that claim to be Agile (XP, SCRUM, Crystal etc.), what is the best way for you to move forward?  Alas, I don't have a silver bullet here either.  The best I can do is describe the process that I have gone through to get this stuff clear in my head.

The first step is to clearly understand what the key requirements are for your business.  Ignore software, ignore the process, just focus on what matters to the business.  For me, I came up with a single overriding principal:

To make a decent, long term, sustainable profit

I'm guessing that's not particularly contentious - any business for which that isn't a key principal is unlikely to last long.  However on it's own that's not really enough.  So I then pondered what aspects of the business would best support the primary principal.  This gave the following, in no particular order:

  • To create solutions that give the customer the solution that he needs, when he needs it, and at a cost he is happy with
    • Happy customers mean repeat business and referrals, which are the best and most profitable form of sales. Plus I'd much rather be working with a customer who's happy rather than one who's not.

  • To have control over the costs and timescales for each customer project
    • Not having control means that the customer doesn't know how much he is paying or when he's going to receive the goods.  Given that the software is probably only a part of a bigger project, the timescales in particular are critical - without confidence on delivery dates, how is the customer meant to schedule other aspects of the wider project such as training, marketing, manufacturing etc?  In addition, if we don't know when projects are due to complete it makes it very hard to commit to start dates new customers.

  • To have a motivated, skilled team that share the same values
    • Motivated staff take care and pride in their work, which gives rich rewards in terms of the quality of the product.  They also tend to enjoy their work which reduces expensive staff turnover, and just makes the workplace a more enjoyable place to spend what amounts to a significant percentage of your life.

 

From this, I could then extract the aspect of Agile from the mass of books that I've read to provide, in effect, the essence.  These are the areas that I consider important, and why:

  • Accept and embrace change.
    • Anyone who thinks that they can prevent change from occurring during a project of any size is, quite frankly, living on another planet.  Accept that change is essential to enable the software to meet the needs of the customer, and adopt a process that makes change as painless as possible.  The change may be requirements, it may be technical, it may be staffing, but whatever it is, it's going to happen.
  • Develop in short iterations.
    • There's a load of important things about the iterative approach, so I'll expand on this below.
  • Don't attempt up-front detailed design
    • Again, there's lots of empirical evidence that this doesn't provide any benefit.  By all means, do up front high-level architectural design - indeed, for the key structural aspects of the project (scalability, security, disaster recovery etc.), this is pretty much essential; getting those wrong or trying to reverse them into an existing code base can be very expensive and not something the "right click / refactor" is going to help with.  But the low level stuff is best done with the compiler.
  • Make sure that testing is a first class citizen
    • Testing should being as soon as coding begins (indeed, if you want to do TDD, then it starts before the coding).  As far as possible, make the tests automated so that you can frequently run the full test suite.  It's inevitable (regardless of whether you attempt up-front detailed design or not) that, at times, you'll need to refactor parts of the code base to support new functionality.  At times like this, a large test suite gives a great safety net.  In addition, the tests (if well written) also act as a form of executable (and hence up to date) documentation.  Finally, and most importantly, testing as early as possible tends to promote a testable code base and gets quality in there from day one. 
  • Empower your team to use their brains.
    • If you've any sense, you spend a lot of time recruiting the very best staff.  Recognize that, and let them shine.  If they are committed to the business, then trust them to make sane choices and don't try to micro-manage tem.  If they're not committed to the business, politely but firmly point them in the direction of the door.

Expanding on iterations, I think the following aspects are essential:

  • At the start of an iteration, plan in detail what you are going to achieve.  For that plan to have any teeth, it is essential that changes are not allowed during an iteration.  For this reason, iterations should also not be too long - my experience suggests that between 2 and 4 weeks works well.
  • During the iteration planning, ensure that the tasks being worked on are the most important to the project at this moment in time.  Don't do the trivial stuff whilst there are important things to be done (important can either be those items that give most business value, or those items that present the most technical risk)
  • At the end of each iteration, deliver demonstrable, working software.  This keeps the team focused and gives a clear view of progress to date.  In addition if, god forbid, you fail to complete all the development tasks you will at least have a system that the customer could take.  And, since you worked in priority order, it should include most of the stuff that the customer considers important.  Telling the customer that you're not finished is never an easy conversation, but "We're not done, we estimate that we're about 80% of the way there, but here's a system in which all of the following function is complete and ready to go" is a much better chat than "We're not done, we estimate that we're about 80% of the way there.  Sorry, but there's nothing you can take yet because until we've done the other 20% nothing will work"
  • At the end of each iteration, tasks are either done or not done.  It's notoriously hard to determine how much work is left on a task when it's not yet complete (how often have you heard the phrase "it's 80% done", only to then find it takes another 100% of the elapsed time so far to finish?).  In addition, done needs to be done.  Code written, all functionality complete, all tests written and passing.  Anything less is not done, and should hence be deferred to a future iteration.
  • At the end of each iteration, evaluate what has gone well and what has gone badly.   Do more of the good things, and make changes to prevent the bad things from happening again.  I have seen a number of teams running iterations who recognize that they are not getting things done as quickly as they need, and whose response is "ok, we recognize that things aren't going well.  We'll try harder in the next iteration".  Trying harder at something that isn't working is unlikely to yield the results that you want.
  • Don't queue up bugs.  The tasks that you've worked on to date are, by definition, the most important ones.  Bugs mean that they are not finished.  Fix the bugs.  If you don't, then at the end of the project you'll have a pile of important stuff that's not done.  The customer is not going to like that.

The key thing that I've observed with well-run iterations is that they tend to surface problems early in the project.  Pain is going to happen (what - you really think that nothing will go wrong?), and Early Pain is considerably more desirable than Late Pain.  Early Pain means that there's time to take corrective action. Late Pain is what kills projects.

Those are the things that I see as the essence of agile - it's not rocket science, it's just working smarter.  It's understanding that change happens, and making sure that you can handle it.  It's understanding that things go wrong, and making sure that they can be spotted and fixed as early as possible.  It's understanding that you have a hugely talented team, and using them.  It's no silver bullet though.

What else can I add?  Well, there are a few things that spring to mind:

  • Requirements - up-front or iteratively?
  • Contracts - fixed price or T&M?
  • What methodology?  SCRUM?  XP?  Lean?
  • We're not Agile, but want to be - how do we change?
  • My customer doesn't want Agile - what are my options?

This has been a pretty long post, so although I've got things to say on those, I'll leave them for another day.

As a final remark, all of the above is just my opinion.  I've intentionally not put in references to books etc. - for each reference I find that says one thing, I've no doubt you can find a reference that says the opposite.  Such is the nature of our imprecise world.  I hope, however, that this does give some food for thought, and perhaps helps you through your own thought processes around how (or indeed if) to adopt Agile.

I would certainly appreciate any comments that folk might have...

Monday, November 10, 2008

This should have been sent last Friday, but other stuff got the better of me. Anyhow, here's last weeks "interesting stuff":



  • The reason why we have iterations: Early Pain





  • An example of a simple TODO language implemented in Mg. For those use to parsers, BNFs etc, this will be an easy read. For those that haven't come across such things, it's a nice introduction.




  • How SanDisk's ExtremeFFS works. Bets on how long it is before Joe gets one :)




  • A nice summary of PDC. Thanks, John - saves me writing it!




  • Access to the ETW APIs from .Net using NTrace. Not used it yet, but suspect this is worth keeping an eye on.




  • A new TFS Power Tools release. Surprised that Obama got quite so much press coverage considering that this came out the same week.




  • Code Contracts in C# (well, any .Net language in fact). It's due out with .Net 4.0, but it also looks like it works with VS2008. Getting into the habit of declaring the intent of your code in this way can only be a good thing - it helps out with documentation, static analysis, runtime checks and gives Pex something to get its teeth into.




  • A good reality check on some of the hype going on about Oslo right now.




  • Hadi is twittering. Hell has also frozen over.




  • For those of you with a media center and an iPhone / iPod touch, this is cool. I've been running it for a week or so (before this review came out), and although at $6 it's one of the most expensive iPhone apps I've bought, it's a load cheaper than the wireless mouse / keyboard that I would have bought otherwise.




  • Should we put comments in our code? Neal Ford has shares his views here - not sure I entirely agree with him, but I do respect his opinion and it's hard to refute his reasoning. Some of the comments are pretty decent, so it's worth reading the whole page.

Friday, October 17, 2008

Weekly digest of interesting stuff

  • News that the D language that is part of the up-coming Oslo release has been renamed to M.  Wonder what that stands for?
  • Silverlight 2 goes RTM.  Various useful links: 
  • Changing extensions in IIS 7 - the post is about removing the .svc extension, but the method used works for pretty much any type of URL renaming that you want to perform
  • A Channel9 video discussing the provider model in LINQ.  I've not watched it yet, but Erik Meyer is a smart guy and normally worth watching.  Beware of his shirts though, they are normally pretty shocking!
  • Apple released updates to their MacBook & MacBook Pro lines.  These are nice looking machines, and with bootcamp they will run Vista just fine.  They aren't the cheapest laptops around, but they do compare reasonably well on price with other manufacturers for similar specs / designs.

Visual Studio Tests and "The location of the file or directory xxx is not trusted"

If you've ever downloaded stuff from the internet (from XP SP2 onwards), you'll know that the OS marks the file as un-trusted and that to use it you need to right-click, go to properties and click "Unblock".  If you download a zip and extract it's contents without having unblocked the zip, then you probably also know that every extracted file will also be marked as un-trusted. 

Referencing a file from an un-trusted source within Visual Studio (for example, adding a reference to the latest library that you've just downloaded) will work just fine, but when you try to run your unit tests they will barf with a "location of the file or directory is not trusted" error.

The top tip is to remember to unblock the zip file before doing the extract; that way, all the files that you extract will also be unblocked and the world will be a happy place (assuming that none of them have viruses, trojans etc!)

If you forget, and if you've now got files scattered all over your project such that deleting them and re-extracting is a pain, then here's another option - ZoneStripper is a handy command-line tool that can run recursively over a directory and get rid of the blocks.  Works like a treat.

Tuesday, October 14, 2008

Running 32 bit .Net applications on a 64 bit machine

Just been having a problem trying to get a .Net application running on my 64bit Vista machine.  The app was compiled using the "Any CPU" flag, which means that it can target both 32 or 64 bit machines.  With this flag set, the .Net bootstrap process loads up the appropriate CLR based upon your machine architecture - if you're running a 32 bit OS, then the 32 bit CLR is loaded and the JIT compiler generates 32 bit code.  If you're running a 64 bit OS, then you can guess what happens - 64 bit CLR and 64 bit code.

For most apps, this is probably exactly the behaviour that you want (although do remember that you need to test on both environments).  If there is a reason why you need to target a specific architecture, then you can change the build settings to force either 32 or 64 bit.  Why would you want to force things?  A good reason would be if you are loading up 32 bit native DLLs for some reason (perhaps your database vendor only ships 32 bit client DLLs, for example).  In this case, quite clearly you need to make sure that your app is also running 32 bit - if not, then when launched on a 64 bit OS, it'll throw a BadImageFormatException at the point that it tries to load the native DLL. 

That's all fine, but what if you've been given an app that needs to run in 32 bit mode, but was compiled with the "Any CPU" flag?  If you've got the code then you could recompile, but what if that's not an option?  Turns out that there's a handy tool called corflags.exe which comes with the SDK.  Using this, you can flip the 32 bit flag in the application without requiring access to the source.  For example:

corflags /32BIT+ /Force TheApplication.exe

the /Force flag is needed if the application is strong name signed - if you omit that flag, then it will fail when run against such assemblies.  Obviously, once the bit has been flipped the strong name is no longer valid.  If you've got access to the private key then you can re-sign.  If you don't have the private key and you need to keep the strong name then I'm afraid you're out of luck.

Friday, October 10, 2008

Weekly digest of interesting stuff

I read quite a few blogs and web sites over the course of each week, and thought it might be useful to do a weekly blog on the things that I've found interesting.  If nothing else, it gives me a way of keeping track of things - but hopefully some of you will find it useful as well...

 

    • The Austrian's now have the first computer network protected by quantum encryption. This stuff has been in the labs for a while, but this is the first (reported) implementation "in the wild".
    • A largely speculative post, but it's one of the first to discuss the new name from Microsoft: Windows Strata.
    • Eager loading in the Entity Framework, and how it differs from LINQ to SQL.
    • New MSBuild extensions library released.
    • "Library Oriented Programming" - is this what we're all doing now?
    • A list of links that details the new stuff, and the broken stuff, in .Net 3.5 SP1. I know it's been out a while, but it's nice to have one place to refer to :)
    • A list of the talks that Don Box wants to see at PDC.  And if Don's interested, then you probably should be too.
    • Some VSTS 2010 features.  It's a while until we'll see this, so it's nice to see what's coming our way.
    • For anyone into encryption,I think you'll find the Silverlight Enigma Machine kind of cute.
    • I've not watched it myself yet, but any video of Anders talking about language design is likely to be worth watching.  If you're into that sort of thing, of course!
    • A really good article on what makes a good software architect.  If you think you are, or think you'd like to be, or work in a team with one, or have the misfortune to be married to one (or, worse, to be married to someone who thinks they are!), then it's worth a read.

Friday, October 03, 2008

WPF Localization, part II

Following on from the previous blog, I'll briefly look here at how to localise strings that you build within your application as opposed to resources that are within the XAML.  There are a variety of options here, but this approach seems pretty simple and plays nicely with the locabaml / csv approach described earlier.

Step 1:

Add a new item to your project, of type Resource Dictionary.  By default, VS will name it Dictionary1.xaml.

Step 2:

Edit the App.xaml file to reference the new resource dictionary.  The markup you'll need is:

<Application.Resources>
   <ResourceDictionary x:Uid="ResourceDictionary_1">
      <ResourceDictionary.MergedDictionaries>
         <ResourceDictionary x:Uid="ResourceDictionary_2" Source="Dictionary1.xaml" />
      </ResourceDictionary.MergedDictionaries>
   </ResourceDictionary>
</Application.Resources>

Step 3:

Let's now add some code and content.  Assuming that you are using the "app" we built in the previous blog, double click on the button in the designer to add a Click event and the corresponding method in the code behind file.  Within the method, add the following code:

MessageBox.Show((string)Application.Current.FindResource("buttonMessage"));

"buttonMessage" is the name of the resource that we want to display - edit the Dictionary1.xaml file to include it:

<ResourceDictionary
    xmlns="http://schemas.microsoft.com/winfx/2006/xaml/presentation"
    xmlns:x="http://schemas.microsoft.com/winfx/2006/xaml"
    xmlns:system="clr-namespace:System;assembly=mscorlib">

   <system:String x:Key="buttonMessage">Hello World!</system:String>
</ResourceDictionary>

If you are running in the default locale ("en-GB" in our previous example), the app should run and you should get a message box with the text "Hello World" when you click the button.

Step 4:

Back to our friends from before.  Do the following:

  • run "msbuild /t:updateuid" from your project directory
  • build the project
  • run "locbaml /parse" from your bin\debug directory
  • translate the corresponding csv file
  • run "locbaml /generate" to build the new resource dll

(I've not put in the full command lines here, since they are exactly as in the previous blog).

Once you've done this, if you switch culture to your new culture, you should find that the message box now displays the translated text. 

As before, this isn't too invasive on the project - for each string that you handle in code, add a resource to the resource dictionary and use the FindResource() method to retrieve the string.  That's pretty much it.

WPF Localization

There are a good number of articles out there on how to localize WPF resources, including the main one on MSDN.  However, they are all quite long-winded and having now gone through all the necessary steps I thought I'd just document exactly what you have to do.  Hopefully this will be of some help to others.

The support for localization within WPF is actually pretty good, but unfortunately it is not quite as integrated with the IDE as one might like, and does involve some work on the command line.  For all you "point'n'click" warriors, I think the best I can do is to send you here.

Step 1:

Download and compile locbaml.  You can find it here.  Once you've downloaded and "installed" it, head to the directory you chose and go into the "csharp" folder.  In there, you'll see a csproj file.  Either open that in Visual Studio and build, or run msbuild from the command prompt.  Either way, you'll end up with a nice new locbaml.exe just a couple of seconds later.

Step 2:

Create yourself a new WPF project within Visual Studio, and the close the IDE straight away.  Open up the new .csproj file in your favorite text editor, and add the following property to the first PropertyGroup section:

<UICulture>en-GB</UICulture>

where "en-GB" is the culture that you are developing in.  For a full list of culture codes, head here.

Open up the project again, and open the AssemblyInfo.cs file.  Add the following line to the bottom:

[assembly: NeutralResourcesLanguage("en-GB", UltimateResourceFallbackLocation.Satellite)]

If you build & run now, your app should work just fine.  If you take a peek in the bin\debug directory, you'll notice that VS has created an en-BG directory and popped a new dll in there.  This dll is a satellite assembly containing just the resources for that culture. 

Step 3:

Before localizing, let's add some content to our application. Open up Window1.xaml, and add the following as a child of the Grid element:

<Button>Hello</Button>

You can look at all of Fran's blogs for details on how to make it prettier :)

Step 4:

Here's where we drop down to the command line.  Spin up a VS Command Prompt, and cd to the directory that contains your project.  Run the following:

msbuild /t:updateuid

This will walk all the XAML in your project and add Uid attributes to pretty much everything.   It does mess the XAML up a smidge, but that's just how it goes.

Step 5:

Copy the locbaml.exe that you created in step 1 to the bin/debug directory.  Then run:

LocBaml.exe /parse en-GB\WpfLocalization.resources.dll

If all goes well, you should now have a .CSV file containing all the resource information which locbaml has extracted from the default resources dll.  Mine looks something like this:

WpfLocalization1.g.en-GB.resources:window1.baml,Window_1:WpfLocalization.Window1.$Content,None,True,True,,#Grid_1;
WpfLocalization1.g.en-GB.resources:window1.baml,Window_1:System.Windows.Window.Title,Title,True,True,,Window1
WpfLocalization1.g.en-GB.resources:window1.baml,Window_1:System.Windows.FrameworkElement.Height,None,False,True,,300
WpfLocalization1.g.en-GB.resources:window1.baml,Window_1:System.Windows.FrameworkElement.Width,None,False,True,,300
WpfLocalization1.g.en-GB.resources:window1.baml,Button_1:System.Windows.Controls.Button.$Content,Button,True,True,,Hello

The columns in this file are:

  • Baml Name - the name of the compiled XAML stream that is held in the resource dll
  • Resource Key - maps back to the Uid attributes that we created in Step 4.  Actually, it's even better than that, since it doesn't just have the content of the element, it also allows for the modification of some of the other properties.  So in the example above, on the element with the Uid of Window_1, I can change the $Content, Title, Height and Width properties.
  • Localization Category - A value from the LocalizationCategory enumeration
  • Readable: Is the value visible for translation
  • Modifiable: Is the value modifiable during the translation
  • Comments: Any comments - more on this later
  • Value: The value of the property.  This is the chap that needs translating

Step 6:

Translate the file.  Can't help you there :)

Step 7:

So you've now got a new version of the CSV file in the new culture (let's say it's es-ES).  From this, we need to build a new satellite assembly.  Drop the file into the bin\debug directory alongside the original, and then run this:

locbaml /generate en-GB\WpfLocalization.resources.dll /trans:WpfLocalization.resources.CSV /out:es-ES /cul:es-ES

It should be pretty obvious what's happening here - from the information present in the original resource dll, and with the updated values in the CSV, generate a new resource dll in the es-ES directory, for the culture es-ES.  Note that you need to create the output directory first.

Step 8:

Test it.  Either change your locale in the control panel, or add some code to the constructor of your App class (in App.xaml.cs):

public App()
{
   CultureInfo ci = new CultureInfo("es-ES");
   Thread.CurrentThread.CurrentCulture = ci;
   Thread.CurrentThread.CurrentUICulture = ci;
}

Note that you do need to do this pretty early on in the application lifecycle; once resource start being read, then you are fixed in whatever locale you were at the time.  You can't use this approach to switch languages dynamically once the app is loaded - if that's a requirement, then I'm afraid you need to head back to google.

Conclusion

That's it.  It's a bit of an effort first time round, but once you get used to things it's pretty simple.  To summarise:

  • You can pretty much develop your WPF app as normal. (just add the UICulture to the .csproj and the NeutralResourceLanguage attribute to AssemblyInfo.cs)
  • At suitable points, run "msbuild /t:updateuids" to refresh your Uids, and run "locbaml /parse" to get your latest CSV
  • Once translated, run "locbaml /generate" to build a new resource DLL.

What I like is that it doesn't really impact day-to-day development, and it doesn't require a recompile of the main application to add additional languages.

I'll do another blog shortly that extends this to include the localization of strings that are used within code (e.g., exception messages etc.)

Wednesday, October 01, 2008

Apple drop iPhone NDA

It appears that Cupertino does listen to what's going on in the wild...

http://www.appleinsider.com/articles/08/10/01/apple_drops_iphone_nda_for_released_software.html

It's only for for released software, but considerably better than the previous position.

There's a new Process Monitor

It's even had a major version number change - you can get Process Monitor V2.0 here.

Monday, September 29, 2008

CompiledQuery and Enumerating Query Results

More fun with compiled queries, this time when processing the results.  Firstly, the simple non-compiled query:

using (DataClasses1DataContext context = new DataClasses1DataContext())


{


   var results = context.Employees.Where(e => e.EmployeeID == 1);


 


   Console.WriteLine("Number of employees: {0}", results.Count());


   Console.WriteLine("First ID: {0}", results.First().EmployeeID);


}




Simple stuff, and it works as you'd expect.  We get both the number of employees and the Id of the first one.  Note that under the covers, SQL gets executed twice.  Perhaps not quite what you'd expect ;)



Here's the compiled version:





var compiledQuery = CompiledQuery.Compile((DataClasses1DataContext context) => context.Employees.Where(e => e.EmployeeID == 1));


 


using (DataClasses1DataContext context = new DataClasses1DataContext())


{


   var results = compiledQuery(context);


 


   Console.WriteLine("Number of employees: {0}", results.Count());


   Console.WriteLine("First ID: {0}", results.First().EmployeeID);


}




 



This one doesn't do what you'd expect. On the second Console.WriteLine(), instead of executing a suitable query, you get an InvalidOperationException saying that "The query results cannot be enumerated more than once".  It's pretty clear what it means, and the fix is simple - make sure you only enumerate the results once, using something like the ToList() method:





var results = compiledQuery(context).ToList();




With this, you only hit the DB once, and you can then look at the list of results as much as you like. 



The difference between the non-compiled and the compiled query is down to how the two different queries are processed.  In the non-compiled version, there's an expression tree floating around, which is lazily evaluated when the results are enumerated.  Because LINQ to SQL has the expression tree available, it can generate different SQL on each hit, so results.Count() generates a "select count(*) ..." statement and results.First() generates a "select top 1 ..." statement.



For compiled queries, the SQL is determined at the point you call CompiledQuery.Compile().  When the results are enumerated the first time, this pre-prepared SQL is executed and the results processed.  Note that since the SQL is already built, the thing that you are doing with the results doesn't influence the SQL.  So the call to "results.Count()" will execute a select of the entire dataset, which will then get enumerated and counted in the client.



Since repeatedly issuing the same SQL is unlikely to be what you want your app to be doing, the designers of LINQ to SQL quite wisely throw an exception if you try to do so.  Instead, you need to stick in the explicit ToList() to make it clear that you understand the behaviour.



At first glance it seems a shame that you can't just swap normal queries & compiled queries but hopefully you can see that the semantics are quite different between the two, at which point having different client code which respects these differences is more acceptable.



Last point - before you start getting excited about the number of SQL queries that the non-compiled version may be executing, and start scattering ToList() calls everywhere, make sure that you understand the consequences.  For the code above, ToList() is going to be a good thing, but if you consider a scenario where the Where() clause doesn't identity a single entry but instead hits perhaps many thousands of rows, then it may not be so good.  The first time we look at the results, we just want the count which gets mapped to count() in SQL, and on the second time the First() method gets mapped to a TOP 1 clause in the SQL.  Although it would mean two hits on the database, it would likely be far better to do that than to bring back thousands of records for processing in the client.



As with most abstraction layers, LINQ offers a lot of benefits but it cannot be used without considered thought as to what is happening underneath.  I thoroughly recommend that when testing your LINQ queries you have the SQL profiler running so that you can see what's going on.  Also, don't forget the DataContext.Log property which lets you dump the SQL out to a TextWriter.  Using this, it would be quite possible to check within your unit tests that the DB interaction is running the way you expect, and also to spot when changes cause unexpected interactions.

Passing Predicates into Compiled Queries

I've recently been looking at generating LINQ predicates on the fly in a mapping layer between a set of business domain entities and a set of related, but different, database entities.  One of the problems that I've encountered is to do with the way in which predicates are handled when using CompiledQueries.

To start off, lets consider the easy non-compiled version.  Here's a method:

// Get an employee by a predicate


static void GetEmployee(Expression<Func<Employee, bool>> predicate)


{


   // Just perform the select, and output the results


   using (DataClasses1DataContext context = new DataClasses1DataContext())


   {


      var results = context.Employees.Where(predicate);


 


      Console.WriteLine("Number of employees: {0}", results.Count());


   }


}




The usage of this is nice and simple, and the sort of thing in LINQ examples all over the web:





GetEmployee(e => e.EmployeeID == 1);




Calling this does exactly what you'd expect.  My next step was to look at how this approach could be used with compiled queries.  I started with a simple method:





// Get an employee by a predicate, using a compiled expression


static void GetEmployeeCompiled(Expression<Func<Employee, bool>> predicate)


{


   // Compile the query


   var compiledQuery =


      CompiledQuery.Compile((DataClasses1DataContext context) => context.Employees.Where(predicate));


 


   // and using the compiled query, output the results.  This crashes :(


   using (DataClasses1DataContext context = new DataClasses1DataContext())


   {


      var results = compiledQuery(context);


 


      Console.WriteLine("Number of employees: {0}", results.Count());


   }


}




Obviously, this is pointless since it just recompiles the query every time.  But let's ignore that small fact - it should, after all, still work.  Alas, it doesn't. 



At the point where the results are enumerated, it explodes with a "NotSupportedException".  Specifically, it fails due to an "Unsupported overload used for query operator 'Where'.".  Looking at the expression tree that is being compiled, in conjunction with some help from Reflector to look at what LINQ is doing under the cover, it can be seen that the issue is down to how the predicate is included in the final query expression. 



Remember, the compiler is not generating executable code here, it is just building a lambda expression.  When it sees the parameter to the Where() method, it has little choice but to be "lift" this variable its own class, and it is a property on this lifted class that is passed as a parameter to the Where() method.  Although the non-compiled version handles this just fine, it causes the CompiledQuery object to barf.  This is just the same as any other query that uses a local variable or parameter.



I've experimented with a number of ways of constructing the query that I'm trying to compile, but all ultimately end up with the same problem.  The solution I've found is a little nasty, but it does work.  If I've missed a cleaner way, then I'd love to hear about it!



Anyhow, the solution.  It is based on the fact that it is the act of "passing" the predicate into the Where() method that is the problem.  So the solution is to not pass in the predicate, but instead pass in some dummy predicate.  Then do some expression tree walking to swap out the dummy predicate for the real one.  The code looks like this:





// Get an employee by a predicate, using a compiled expression


static void GetEmployeeCompiled2(Expression<Func<Employee, bool>> predicate)


{


   // Setup the required query, using a dummy predicate (c => true)


   Expression<Func<DataClasses1DataContext, IEnumerable<Employee>>> compiledExpression =


      context => context.Employees.Where(c => true);


 


   // Dig out the dummy predicate from the expression tree created above


   Expression template = ((UnaryExpression)((MethodCallExpression)(compiledExpression.Body)).Arguments[1]).Operand;


 


   // Swap out the template for the predicate


   compiledExpression = (Expression<Func<DataClasses1DataContext, IEnumerable<Employee>>>) 


                                 ExpressionRewriter.Replace(compiledExpression, template, predicate);


 


   // Compile the query


   var compiledQuery = CompiledQuery.Compile(compiledExpression);


 


   // and using the compiled query, output the results.  This works :)


   using (DataClasses1DataContext context = new DataClasses1DataContext())


   {


      var results = compiledQuery(context);


 


      Console.WriteLine("Number of employees: {0}", results.Count());


   }


}




So the required query is itself stored as an expression, with a dummy predicate (c => true) used to get the correct "shape" of tree.  This predicate is then located and the expression tree is rewritten, swapping out the dummy predicate for the real one.  This new query expression then compiles and executes just fine.



For completeness, the ExpressionRewriter class is defined as:





class ExpressionRewriter : ExpressionVisitor


{


   static public Expression Replace(Expression tree, Expression toReplace, Expression replaceWith)


   {


      ExpressionRewriter rewriter = new ExpressionRewriter(toReplace, replaceWith);


 


      return rewriter.Visit(tree);


   }


 


   private readonly Expression _toReplace;


   private readonly Expression _replaceWith;


 


   private ExpressionRewriter(Expression toReplace, Expression replaceWith)


   {


      _toReplace = toReplace;


      _replaceWith = replaceWith;


   }


 


   protected override Expression Visit(Expression exp)


   {


      if (exp == _toReplace)


      {


         return _replaceWith;


      }


      return base.Visit(exp);


   }


}




where the ExpressionVisitor base class can be found on MSDN

Tuesday, September 23, 2008

Unsafe code without the Unsafe keyword

I've been playing around with some code lately that uses dynamic method generation fairly extensively.  In the course of doing so, I've written the odd dodgy bit of IL out.  Interestingly, a couple of time I got some very strange results when assigning fields from one object to another - specifically, if I got the types mismatched I just got garbage in the destination rather than some form of Cast exception (which I'd expect the runtime to generate during execution) or Verification exception (which I'd expect when I finally surface my generated method through a call to DynamicMethod.CreateDelegate()).

Finally had some time today to take a closer look, and the results are very interesting and not at all clear from the documentation.  Specifically, if you create a dynamic method using the following constructor:

public DynamicMethod(


    string name,


    Type returnType,


    Type[] parameterTypes,


    Module m


)




and pass in "Assembly.GetExecutingAssembly().ManifestModule" for the module, then it appears that all type safety within the generated code is turned off.  i.e., you can pretty much assign anything to anything.  The following code, for example, enables you to dump the memory address of any reference type:





/// <summary>


/// Return a method that gives the memory address of any object


/// </summary>


static Func<object, int> Get_GetAddress_Method()


{


   DynamicMethod d = new DynamicMethod("", typeof (int), new Type[] {typeof (Object)},


                                       Assembly.GetExecutingAssembly().ManifestModule);


 


   ILGenerator ilGen = d.GetILGenerator();


 


   ilGen.Emit(OpCodes.Ldarg_0); // Load arg_0 onto the stack (of type object)


   ilGen.Emit(OpCodes.Ret);     // And return - note that the return type is an int...


 


   return (Func<object, int>)d.CreateDelegate(typeof(Func<object, int>));


}




You can use this in the following way:





Func<object, int> getAddress = Get_GetAddress_Method();


const string greeting = "Hello";


 


// Get the address of the "Hello" string


int x = getAddress(greeting);




x now contains the memory address of the string "Hello".  So what?  Well, you can also write a method like this:





/// <summary>


/// Return a method that "maps" any type to a particular memory location


/// </summary>


static Func<int, T> Get_ObjectAtAddress_Method<T>()


{


   DynamicMethod d = new DynamicMethod("", typeof (T), new Type[] {typeof (int)},


                                       Assembly.GetExecutingAssembly().ManifestModule);


 


   ILGenerator ilGen = d.GetILGenerator();


 


   ilGen.Emit(OpCodes.Ldarg_0);  // Load arg_0 onto the stack (of type int)


   ilGen.Emit(OpCodes.Ret);      // And return - note that the return type is T


 


   return (Func<int, T>)d.CreateDelegate(typeof(Func<int, T>));


}




This chap lets you take any memory address, and "pretend" that an object of type T resides there.  So you can do something like this:





Func<int, byte[]> getData = Get_ObjectAtAddress_Method<byte[]>();


 


// Get a byte array on the same location


byte[] data = getData(x);




where x is a memory location that you've acquired previously.  It doesn't matter if the type that really resides at address x is a byte[] or not.  This basically lets you get access to the whole address space within your AppDomain (and possibly the whole Win32 process) and write whatever you like into it. 



This seems plain wrong to me - I haven't specified the "unsafe" keyword anywhere, nor is this code built with the "Allow unsafe code" box checked.  Without jumping through those hoops, I should not be able to write code like this.  I'll concede that this only works in a full trust environment, but it still smells like a very serious hole in the type safety of .Net.  Interestingly, if you use the DynamicMethod constructor that doesn't take a Module parameter, then everything works as you'd expect - you are politely served a VerficationException when you try to compile the method.  According to the docs, the constructor overload that takes a module is only supposed to allow access to internals of the specified module, not to skip type safety.  I wonder if the implementation of DynamicMethod in that scenario is flawed.



Below is a big lump of code - it compiles and shows the issue quite clearly.  I'd be interested in your views on whether this is a bug or "by design". If the latter, what exactly was the scenario that they were designing for?





using System;


using System.Reflection;


using System.Reflection.Emit;


using System.Text;


 


namespace ConsoleApplication1


{


   class Program


   {


      static void Main()


      {


         // Get some methods generated...


         Func<object, int> getAddress = Get_GetAddress_Method();


         Func<int, byte[]> getData = Get_ObjectAtAddress_Method<byte[]>();


 


         const string greeting = "Hello";


 


         // Print the greeting


         Console.WriteLine(greeting);


 


         // Get the address of the "Hello" string


         int x = getAddress(greeting);


 


         // Get a byte array on the same location


         byte[] data = getData(x);


 


         // Change some data...


         SetString("Bye!!", data);


 


         // And display the greeting again (remember, strings are immutable...)


         Console.WriteLine(greeting);


 


         // And just to show it against other bits of the framework...


         Console.WriteLine(Assembly.GetExecutingAssembly().FullName);


 


         SetString("Hacked!", getData(getAddress(Assembly.GetExecutingAssembly().FullName)));


 


         Console.WriteLine(Assembly.GetExecutingAssembly().FullName);


      }


 


      /// <summary>


      /// Return a method that gives the memory address of any object


      /// </summary>


      static Func<object, int> Get_GetAddress_Method()


      {


         DynamicMethod d = new DynamicMethod("", typeof (int), new Type[] {typeof (Object)},


                                             Assembly.GetExecutingAssembly().ManifestModule);


 


         ILGenerator ilGen = d.GetILGenerator();


 


         ilGen.Emit(OpCodes.Ldarg_0); // Load arg_0 onto the stack (of type object)


         ilGen.Emit(OpCodes.Ret);     // And return - note that the return type is an int...


 


         return (Func<object, int>)d.CreateDelegate(typeof(Func<object, int>));


      }


 


      /// <summary>


      /// Return a method that "maps" any type to a particular memory location


      /// </summary>


      static Func<int, T> Get_ObjectAtAddress_Method<T>()


      {


         DynamicMethod d = new DynamicMethod("", typeof (T), new Type[] {typeof (int)},


                                             Assembly.GetExecutingAssembly().ManifestModule);


 


         ILGenerator ilGen = d.GetILGenerator();


 


         ilGen.Emit(OpCodes.Ldarg_0);  // Load arg_0 onto the stack (of type int)


         ilGen.Emit(OpCodes.Ret);      // And return - note that the return type is T


 


         return (Func<int, T>)d.CreateDelegate(typeof(Func<int, T>));


      }


 


      /// <summary>


      /// Little helper method to copy a string into a byte[]


      /// </summary>


      static void SetString(string requiredString, byte[] dest)


      {


         UnicodeEncoding encoder = new UnicodeEncoding();


         byte[] requiredBytes = encoder.GetBytes(requiredString);


 


         // Need to do the copy by hand, since Array.Copy bleats


         // about the dimensions of the destination.  No surprise really,


         // since the destination isn't really an array...


         for (int i = 0; i < requiredBytes.Length; i++)


         {


            dest[i] = requiredBytes[i];


         }


      }


   }


}