An elegant LINQ to db4o provider, and a few LINQ tricks 06 Feb 2008


Statue Last year, when I was working at db4objects on db4o, I insisted on the need for db4o to act as a LINQ provider, and with that in mind, prototyped one. I'am happy today to see a LINQ provider making it to db4o's SVN, especially as I had the chance to influence its design, as I was also researching LINQ for Mono. For that matters, I find LINQ to db4o's design particularly elegant. I'll use that occasion to describe a few LINQ tricks. Hopefully, the folks at db4o won't mind if I steal the show. Actually I think it will even unburden my good friend Rodrigo, who's apparently tired of blogging. h3. Making a type queryable with LINQ Let's take db4o's example. What you do first is getting an IObjectContainer, with the idea to store and retrieve objects.
using (var container = Db4oFactory.OpenFile ("data.db4o")) {
    // ...
}
So the goal was to be able to write somethings either like:
var peoples = from container.Query ()
             where p.Name == "jb" && p.Age == 25
             select p
or like:
var peoples = from Person p in container
             where p.Name == "jb" && p.Age == 25
             select p
There's a few things that we didn't like with the first method. First an IObjectContainer already have a Query<T> method, which is used for native queries, and secondly, aesthetically, it doesn't look as good as the second one. So we went for the second possibility. The issue is that IObjectContainer doesn't implement any interface that the compiler could use to perform LINQ queries on it. The trick here is that when the compiler encounters the form:
from Person p in container
It will try to compile it as:
container.Cast ()
And expect that this Cast<T> returns an IEnumerable<T>, on which he can perform LINQ query. The good news is that this Cast method can very well be an extension method. With that in mind, the entry point of LINQ to db4o is really simple:
public static IDb4oLinqQuery Cast(this IObjectContainer container)
{
	return new Db4oQuery(container);
}
Now comes the question of what is this IDb4oLinqQuery<T>. h3. Implementing a lightweight LINQ provider LINQ is still a pretty new technology, so it's not always easy to find good advices from people that are already experienced in the fine art of writing LINQ providers. If I don't have the pretension to be one of those experienced guys, I know of at least one folk who used the following trick, and which is happy with it. So today, the standard way of writing a LINQ provider seems to be implementing the interface System.Linq.IQueryable, and to implement a full fledged LINQ provider. Matt Warren have written an excellent series of posts about it. LINQ to db4o uses some code from his posts. Bart de Smet also wrote a nice one. My very first prototype was also following this path, but at that time, we changed it quickly for an excellent reason. Thing is that the IQueryable path is an all or nothing solution. It keeps building an expression tree, until you decide to work on its result, either by calling GetEnumerator, or by calling AsEnumerable on it. Another thing is that db4o is only able to optimize a (comparatively with something like LINQ to SQL) very little subset of the possible LINQ operations. Actually, it can really optimize only Where, Count, OrderBy and ThenBy. And again, here comes the magic of the extension methods. All LINQ operations are actually extension methods on IEnumerable<T>, or on a child of it, like IQueryable<T>. The trick here, was simply to re-define the supported LINQ operations, on a custom interface. And that's what the IDb4oLinqQuery interface we talked before is. It's simply a marker interface:
public interface IDb4oLinqQuery : IEnumerable
{
}
And thanks to the C#3 magic, we can redefine the LINQ operations on it:
public static class Db4oLinqQueryExtensions
{
	public static IDb4oLinqQuery Where(this IDb4oLinqQuery self, Expression> expression)
	{
		// ..
	}

	public static int Count(this IDb4oLinqQuery self)
	{
		// ..
	}
}
The major advantage of this solution, is that it allows the LINQ query to fallback to LINQ to Objects, anytime a LINQ operations is not supported. So by default, is something cannot be optimized, the LINQ to db4o will return all the types for a certain type, and as we fall back to LINQ to Objects, the user will at least get the result it wanted, even if it doesn't run at full speed. I think it's something that makes this solution really elegant. Let's take a detailed example:
var peoples = from Person p in container
             where p.Name == "jb"
             orderby p.Age
             select new { p.Age }
The compiler here sees four steps. * First it use the first Cast trick, that creates an IDb4oLinqQuery of T, * Secondly, it call the Where methods defined for IDb4oLinqQuery, * Then it does the same for OrderBy, which can be optimized, * And then only, as db4o cannot optimize the creation of an anonymous type, it will fall back to LINQ to Objects, but only for the select part, while the two first parts run completely optimized, and almost as fast as the db4o low level query mechanism. That's really nice if you ask me. h3. Using an expression tree as a dictionary key That one is simple. LINQ to db4o caches how a query is created, depending on the expression tree it gets. Thing is that System.Linq.Expression doesn't override Equals and GetHashCode, and as the compiler re-creates a whole new tree every time it sees one, hence Expressions are poor candidate for being keys in Dictionary. The trick here is simply to have an IEqualityComparer for Expressions. It's seems that it's not trivial to implement, but they managed to get a pretty nice one. h3. Analyzing properties body's This trick is also a simple one, and it's the same one that db4o uses for its Native Query functionality, LINQ to db4o uses the Cecil.FlowAnalysis library to read the body of the properties, so they know which fields are impacted by the LINQ query. h3. That's all folks Kuddos to db4o for such a nice addition, hopefully it'll run on Mono soon!